short tandem repeats

15
Am. J. Hum. Genet. 55:175-189, 1994 Evaluation of 13 Short Tandem Repeat Loci for Use in Personal Identification Applications Holly A. Hammond,' Li Jin,3 Y. Zhong,3 C. Thomas Caskey, '"2 and Ranajit Chakraborty3 'Department of Molecular and Human Genetics and 2Howard Hughes Medical Institute, Baylor College of Medicine, and 'Center for Demographic and Population Genetics, University of Texas Graduate School of Biomedical Sciences, Houston Summary Personal identification by using DNA typing methodolo- gies has been an issue in the popular and scientific press for several years. We present a PCR-based DNA-typing method using 13 unlinked short tandem repeat (STR) loci. Validation of the loci and methodology has been per- formed to meet standards set by the forensic community and the accrediting organization for parentage testing. Extensive statistical analysis has addressed the issues sur- rounding the presentation of "match" statistics. We have found STR loci to provide a rapid, sensitive, and reliable method of DNA typing for parentage testing, forensic identification, and medical diagnostics. Valid statistical analysis is generally simpler than similar analysis of RFLP-VNTR results and provides powerful statistical ev- idence of the low frequency of random multilocus geno- type matching. Introduction Identification of individuals is necessary in many social, medical, and forensic circumstances. The past 10 years of experience have demonstrated that DNA typing method- ologies using a battery of highly polymorphic RFLP- VNTR probes (Jeffreys et al. 1985; Nakamura et al. 1987) and Southern-blot technology (Southern 1975) are far more efficient than the earlier protein typing methods, for resolving personal identification cases. We have developed a PCR-based DNA typing method that uses common DNA polymorphisms to provide a rapid and reliable typing system for personal identification. Re- peat sequences of two, three, four, and five bases are valu- able polymorphic markers that can be used in identifica- tion and genetic linkage (Litt and Luty 1989; Tautz 1989; Weber and May 1989; Boylan et al. 1990; Weber 1990; Received September 21, 1993; accepted for publication March 11, 1994. Address correspondence and reprints: Holly A. Hammond, Depart- ment of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Room T809, Houston, TX 77030. 3 1994 by The American Society of Human Genetics. All rights reserved. 0002-9297/94/5501-0024$02.00 Chakraborty and Kidd 1991; Edwards et al. 1991a, 1992; Huang et al. 1991; Caskey and Hammond 1992; Roewer and Epplen 1992). Since they are smaller than VNTR re- peats, they can be informative when partially degraded DNA and/or limited DNA is available. These are fre- quently the limiting factors for VNTR analysis. We present the development of a battery of short tan- dem repeat (STR) loci for use as a highly discriminatory system of genetic markers in personal identification, spe- cifically for parentage testing, forensic identification, and medical applications. As validation of the utility of STR loci in these applications, we demonstrate conformation to Hardy-Weinberg expectations of genotype frequency data for samples obtained from four major population groups residing in Houston. We investigate the assump- tion of allelic independence across the loci, for validity in these populations, and present summary statistics indicat- ing their potential for personal identification, along with indirect estimates of mutation rates for each locus. Material and Methods Population Groups DNA was extracted (Patel et al. 1984) from blood sam- ples collected from unrelated individuals presenting to a Houston area blood bank. Blood donors were visually des- ignated as White, Black, or "other," by blood bank per- sonnel. Mexican-American and Asian samples were iden- tified from the "other" group by common ethnic first names and surnames. Ninety-seven to 386 chromosomes were examined for each of the four population groups. Genetic Loci The STR locus names, Genome Database (GDB) desig- nations, chromosomal locations, PCR primers, and allele product sizes observed are given in table 1, for the 13 loci available in our personal identification system. We have designated the STR loci by their GenBank locus name and lowest alphabetical representation of the repeat region, following the lead of Edwards et al. (1991a), and/or by their GDB locus assignment. Alleles are designated by the number of full tandem repeats of the STR sequence. The number of repeats was determined by sequencing of PCR products for at least two alleles at each locus (authors' un- published data). 175

description

hdrgsdgrs azdsfsz argvzgvz aerfbzrfbr sdg arga fdga vf

Transcript of short tandem repeats

Page 1: short tandem repeats

Am. J. Hum. Genet. 55:175-189, 1994

Evaluation of 13 Short Tandem Repeat Loci for Use in PersonalIdentification ApplicationsHolly A. Hammond,' Li Jin,3 Y. Zhong,3 C. Thomas Caskey, '"2 and Ranajit Chakraborty3

'Department of Molecular and Human Genetics and 2Howard Hughes Medical Institute, Baylor College of Medicine, and 'Center for Demographic andPopulation Genetics, University of Texas Graduate School of Biomedical Sciences, Houston

Summary

Personal identification by using DNA typing methodolo-gies has been an issue in the popular and scientific pressfor several years. We present a PCR-based DNA-typingmethod using 13 unlinked short tandem repeat (STR) loci.Validation of the loci and methodology has been per-formed to meet standards set by the forensic communityand the accrediting organization for parentage testing.Extensive statistical analysis has addressed the issues sur-rounding the presentation of "match" statistics. We havefound STR loci to provide a rapid, sensitive, and reliablemethod of DNA typing for parentage testing, forensicidentification, and medical diagnostics. Valid statisticalanalysis is generally simpler than similar analysis ofRFLP-VNTR results and provides powerful statistical ev-idence of the low frequency of random multilocus geno-type matching.

Introduction

Identification of individuals is necessary in many social,medical, and forensic circumstances. The past 10 years ofexperience have demonstrated that DNA typing method-ologies using a battery of highly polymorphic RFLP-VNTR probes (Jeffreys et al. 1985; Nakamura et al. 1987)and Southern-blot technology (Southern 1975) are farmore efficient than the earlier protein typing methods, forresolving personal identification cases.We have developed a PCR-based DNA typing method

that uses common DNA polymorphisms to provide a rapidand reliable typing system for personal identification. Re-peat sequences of two, three, four, and five bases are valu-able polymorphic markers that can be used in identifica-tion and genetic linkage (Litt and Luty 1989; Tautz 1989;Weber and May 1989; Boylan et al. 1990; Weber 1990;

Received September 21, 1993; accepted for publication March 11,1994.Address correspondence and reprints: Holly A. Hammond, Depart-

ment of Molecular and Human Genetics, Baylor College of Medicine,One Baylor Plaza, Room T809, Houston, TX 77030.3 1994 by The American Society of Human Genetics. All rights reserved.0002-9297/94/5501-0024$02.00

Chakraborty and Kidd 1991; Edwards et al. 1991a, 1992;Huang et al. 1991; Caskey and Hammond 1992; Roewerand Epplen 1992). Since they are smaller than VNTR re-peats, they can be informative when partially degradedDNA and/or limited DNA is available. These are fre-quently the limiting factors for VNTR analysis.We present the development of a battery of short tan-

dem repeat (STR) loci for use as a highly discriminatorysystem of genetic markers in personal identification, spe-cifically for parentage testing, forensic identification, andmedical applications. As validation of the utility of STRloci in these applications, we demonstrate conformationto Hardy-Weinberg expectations of genotype frequencydata for samples obtained from four major populationgroups residing in Houston. We investigate the assump-tion of allelic independence across the loci, for validity inthese populations, and present summary statistics indicat-ing their potential for personal identification, along withindirect estimates of mutation rates for each locus.

Material and Methods

Population GroupsDNA was extracted (Patel et al. 1984) from blood sam-

ples collected from unrelated individuals presenting to aHouston area blood bank. Blood donors were visually des-ignated as White, Black, or "other," by blood bank per-sonnel. Mexican-American and Asian samples were iden-tified from the "other" group by common ethnic firstnames and surnames. Ninety-seven to 386 chromosomeswere examined for each of the four population groups.

Genetic LociThe STR locus names, Genome Database (GDB) desig-

nations, chromosomal locations, PCR primers, and alleleproduct sizes observed are given in table 1, for the 13 lociavailable in our personal identification system. We havedesignated the STR loci by their GenBank locus name andlowest alphabetical representation of the repeat region,following the lead of Edwards et al. (1991a), and/or bytheir GDB locus assignment. Alleles are designated by thenumber of full tandem repeats of the STR sequence. Thenumber of repeats was determined by sequencing of PCRproducts for at least two alleles at each locus (authors' un-published data).

175

Page 2: short tandem repeats

Am. J. Hum. Genet. 55:175-189, 1994

Table I

STR Loci Studied

Genbank Locus[STRJn/GDB Gene Product LengthDesignation (chromosome location) PCR Primers (bp)

HUMHPRTB[AGATJn/HPRT ........... Hypoxanthine tA: atg cca cag ata ata cac atc cccl 259-299phosphoribosyltransferase B: ctc tcc aga ata gtt aga tgt agg J(Xq26)

HUMFABP[AAT]n/FABP2 ................ Intestinal fatty acid-binding fA: gta gta tca gtt tca tag ggt cac ci 199-220protein (4q28-q31) B: cag ttc gtt tcc att gtc tgt ccg J

HUMCD4[AAAAG]n/CD4 ................ Recognition/surface antigen fA: ttg gag tcg caa gct gaa cta gcgl 125-175(cd4) (12p12-pter) B: cca gga agt tga ggc tgc agt gaaJ

HUMCSF1 PO[AGAT]n/CSF1R ......... c-fms proto-oncogene for A: aac ctg agt ctg cca agg act agc 295-327CSF-1 receptor (5q33.3-q34) B: ttc cac aca cca ctg gcc atc ttc

HUMTHOl[AATG]n/TH ................... Tyrosine hydroxylase (l p1 5.5) A: gtg ggc tga aaa gct ccc gat tat 179-203iB: att caa agg gta tct ggg ctc tggJHUMPLA2Al[AATJn/PLA2A ........... Pancreatic phospholipase A-2 A: ggt tgt aag ctc cat gag gtt aga 118-139

(12q23-qter) B: ttg agc act tac tat gtg cca ggc tJHUM F13AO1[AAAGJn/F13A1 .......... Coagulation factor XIII fA: gag gtt gca ctc gag cct ttg caai 281-331

(6p24-p25) B: ttc ctg aat cat ccc aga gcc acaJ

HUMCYAR04[AAATJn/CYP19 ........ Aromatase cytochrome P-450 tA: ggt aag cag gta ctt agt tag cta ci 173-201(15q21.1) (B: gtt aca gtg agc caa ggt cgt gag J

HUMLIPOL[AAAT] /LPL ................ Lipoprotein lipase (8p22) fA: ctg acc aag gat agt ggg ata tagl 125-175B: ggt aac tga gcg aga ctg tgt ct J

HUMRENA4[ACAG] /REN ............. Renin (lq32) tA: aga gta cct tcc ctc ctc tac tca 255-275B: ctc tat gga gct ggt aga acc tgaJ

HUM FESFPS[AAAT]n/FES ................ c-fes/fps proto-oncogene fA: gct tgt taa ttc atg tag gga agg ci 222-250(15q25-qter) [B: gta gtc cca gct act tgg cta ctc J

No GenBank entry/D6S366 ............... Unknown (6q21-qter) fA: aga ggt tac agt gag ccg aga ttg 138-162[B: gaa gtc cta aca gaa tgg aag gtc cJ

HUMARA[AGC]n/AR ....................... Androgen receptor (Xcen-q13) BA: tcc aga atc tgt tcc aga gcg tgca 255-315[B: gct gtg aag gtt gct gtt cct catJ

Genotype DeterminationsAllele identification was achieved by PCR amplification

of 10-50 ng of genomic DNA by using Perkin Elmer Cetusthermocyclers (models TC1 and 480), Amplitaq enzyme,

and standard Cetus buffer conditions (10 mM Tris-HCI pH8.3, 50 mM KCI, 1.5 mM MgCl2, 0.01% gelatin) in 15-jlvolumes. The PCR products were radiolabeled by inclusionof 2 jCi [alpha-32P]dCTP (3,000 Ci/mmol) in the PCR.Primer concentrations for all single-locus reactions andmost arbitrary multiplexes were 1 jM for each primer. Stan-dard sets of three loci were routinely analyzed in balancedtriplex reactions. Concentrations for the balanced triplexeswere determined experimentally. The goal was to achieveapproximately equal radioactive signals on an autoradio-graph, for all nine loci typed in three triplex PCR reactions.The final primer concentrations for these triplexes are as

follows: multiplex 1-HUMHPRTB[AGAT]n, 2.66 jM;HUMFABP[AAT]n, 1.06 jM; and HUMCD4[AAAAG]n,0.67 jiM; multiplex 2-HUMCSFlPO[AGAT]n, 0.27 FM;HUMTH01[AATG]n, 0.27 jM; and HUMPLA2A1-

[AAT]n, 1.40 gM; and multiplex 3-HUMF13AO1-[AAAG], 0.27 jiM; HUMCYAR04[AAAT]n, 1.27 gM;and HUMLIPOL[AAAT]n, 1.47 FM. The PCR condi-tions for all multiplexes and single-locus PCR reactionsare as follows: 950C, 2 min, 1 cycle; 950C, 45 s, then620C, 30 s, then 720C, 30 s, 28 cycles; and 720C, 10 min,1 cycle.

Allele separation of the radiolabeled PCR products was

achieved by electrophoresis through a 4% (39:1) acryl-amide-bisacrylamide denaturing (8 M urea) sequencing gel.The PCR products were diluted 1:1 with standard se-

quencing loading buffer. Two microliters of the dilutedreactions were loaded onto a vertical PAGE gel system (IBISTS45 vertical gel apparatus) and electrophoresed at con-

stant wattage (80 W) for 2-21/2 h. After electrophoresis thegel plates were separated, and the gel was transferred to

filter paper. The gel was exposed to Kodak XAR imagingfilm (with or without vacuum drying of the gel) for 4 h to

overnight. Allele designations were made by comparison

176

Page 3: short tandem repeats

Hammond et al.: STRs for Personal Identification

to known alleles from two or more samples run in lanesadjacent to unknown samples.

Statistical MethodsGene count estimates of allele frequencies, unbiased es-

timates of heterozygosity, and their standard errors (SEs)were calculated as described by Edwards et al. (1992). Theexpected number of heterozygotes for a locus-populationcombination was compared with the observed number bymeans of a X2 statistic to evaluate conformation to Hardy-Weinberg predictions. The significance was determined by5,000 permutations of genotype shuffling to determine theproportion of times the resultant %2 exceeded the observedvalue. The expected number of distinct homozygous andheterozygous genotypes and their SEs were determinedby previously published methods (Chakraborty 1993a,1993b). Overall discordance of genotype frequencies fromtheir Hardy-Weinberg expectations can be detected bythese summary statistics. A likelihood-ratio (L) test crite-rion (G-statistic) was used to contrast observed and ex-

pected frequencies of all genotypes. This test criterion wascomputed by the methods of Sokal and Rohlf (1969) andWeir (1991). Its empirical significance was evaluated by a

permutation method detailed by Chakraborty et al. (1991)and Deka et al. (1991). The exact (E) test (Guo andThompson 1992) was also performed, to check the con-

formity of the observed genotype frequencies with theirrespective Hardy-Weinberg expectations.

Locus-specific fixation indices within individuals (i.e.,inbreeding coefficients [4]) for each population group andthe pooled population were computed by the method ofYasuda (1968) using a maximum-likelihood procedure.The SE of these estimates, as well as tests of significantdeparture from X = 0 (Hardy-Weinberg equilibrium[HWE]), were conducted based on the entire multiethnicgenotype data.

Pairwise independence of genotype frequencies for eachcombination of loci was tested by a procedure suggestedby Risch and Devlin (1992) and Morton et al. (1993). Theoccurrence of matches at both loci, a match at one of theloci, or no match for two specific loci (A and B), was eval-uated for all pairs of individuals, in combinations of twoloci, in the observed multilocus data. Each pair of loci gen-erated a 2 X 2 contingency table with observation nAB,nAs, nAB, and nAB, where nAB represents the number ofpairs of individuals whose genotypes for loci A and B are

the same (match); nAB, represents the number of pairs thatmatched at locus A but not at locus B; nAB represents thenumber of pairs that matched at locus B but not at locusA; and nAB represents the number of pairs that did not

match at either locus. Traditional 2 X 2 contingency X2analysis (Sokal and Rohlf 1969) indicates departure fromindependence. The significance of this x2 value was testedby genotype shuffling in the entire database. The empiricallevel of significance was judged by counting the proportion

of times the X2 values in the shuffled data (2,000 replica-tions) exceeded the observed %2. This test was performedfor each pair of loci, resulting in 78 tests for each popula-tion group and the pooled data.An approximate test of random association of alleles at

different loci was performed as described by Edwards etal. (1992). The summary statistic Sk (observed variance ofthe number of heterozygous loci) and its 95% confidenceintervals were determined for the samples from each pop-ulation group (Brown et al. 1980; Chakraborty 1984).

All of these tests presume that the individuals who ex-hibit only one allele product at a locus are truly homozy-gous. This may not necessarily be correct, since in PCR-based methods "null" alleles may not be detected, becauseof differential amplification or nonamplification caused bysequence polymorphisms within the primer sequence.While such possibilities have been empirically shown tooccur at the dinucleotide repeat loci (Callen et al. 1993;Koorey et al. 1993), there is no direct molecular evidencethat null alleles are indeed present at the STR loci studiedhere. The presence of null alleles will cause pseudodepen-dence of alleles within, as well as across, loci in genotypesof individuals. Using modifications of a theory originallyproposed by Gart and Nam (1984), we developed amethod for estimating the null-allele frequency (r) basedon the observed genotype (phenotype) frequency data(Chakraborty et al. 1994). Revised empirical levels of sig-nificance, incorporating the presence of null alleles, are cal-culated to examine whether either the significant devia-tions from HWE or the significant inbreeding coefficientscould be attributed to null alleles alone.The power of exclusion for each locus, as well as the

combined powers of exclusion, were calculated as de-scribed by Chakravarti and Li (1983), for multiple-alleleloci. In forensic science, a convenient measurement of theusefulness of a genetic marker is the individualization po-tential (PI), which may be expressed as the sum of thesquares of all genotype frequencies in a polymorphic sys-tem (Sensabaugh 1982). We present a comparison of the P1values of all loci in four population groups, calculated un-der HWE and in the presence of inbreeding within a pop-ulation. Indirect estimates of mutation rates for each ofthe 13 loci were obtained from the distribution of thenumber of segregating alleles at each locus in the popula-tions sampled, using the maximum-likelihood methodproposed by Chakraborty and Neel (1989).

Results

PCR AssayEach locus described contains an STR. The polymor-

phisms are due to the variable numbers of repeats of three,four, and five nucleotides sequences (see table 1). We havefound that three-, four-, and five-base repeats amplifymore authentically and provide more easily interpretable

177

Page 4: short tandem repeats

Am.J. Hum. Genet. 55:175-189, 1994

results than do dinucleotide repeat loci (Edwards et al.1991a; Huang et al. 1991).

All repeat regions, with the exceptions of those withinthe HUMHPRTB, D6S366, and HUMCD4 loci, wereidentified by a computer search of GenBank sequences.The search criteria identified regions containing five ormore repeats of each of the 44 unique three- and four-basesequences. The HUMHPRTB STR was identified duringthe sequencing of the hypoxanthine phosphoribosyltrans-ferase gene in this laboratory (Edwards et al. 1990).D6S366 was identified by screening of a lambda bacterio-phage library of human genomic DNA with oligonucleo-tides containing 10 repeats of the sequence AAT. Regionsadjacent to the repeat were sequenced by previously pub-lished methods (Edwards et al. 1991a) and by direct se-quencing of lambda bacteriophage clones (Panzer et al.1993). The STR in the HUMCD4 locus was identified dur-ing sequencing of the CD4 gene (Edwards et al. 199ib).Numerous loci identified as having five or more STR re-

peats were examined to find those that were highly poly-morphic and that provided robust and easily interpretedsignals (Edwards et al. 1991a). Some of these loci havebeen independently reported as polymorphic, while ourpopulation studies have been ongoing (Polymeropoulos etal. 1990, 1991a, 1991b, 1991c, 1991d, 1991e; Zuliani andHobbs 1990; Edwards et al. 1991b; Hearne and Todd1991; Ahn et al. 1992; Sleddens et al. 1992; Panzer et al.1993). We have previously published database informationand HWE determination for five of these loci (Edwards etal. 1992; Puers et al. 1993). A database of DNA samplesfrom -200 Caucasian,, 200 Black, 200 Mexican-American, and -80 Asian samples were typed for eachlocus. Amplifications and typing were performed in singleand multiplex reactions. All loci amplified with 1 gM con-centrations of the primers, at the conditions previouslygiven.

Multiplex reactions were established to simplify analy-sis. Since primers were designed with -24 bases and-50% GC content, all amplification reactions have similarmelting temperatures. Thus, all reactions can be per-formed under identical amplification conditions. Seven ofthe loci have been successfully amplified in one reaction.For utility, multiplexes were restricted to three loci. Theamplification product size determined the groups of lociand their alleles that could be coamplified and resolved bysequencing gel separation.Nine of the loci have been developed into three stan-

dard multiplex reactions of three loci each. The results ofallele separation for these multiplexes were balanced withregard to the intensity of allele bands on the autoradio-graphs. Approximately equal autoradiography results wereachieved by adjusting primer concentrations. Differentprimer concentrations are required for fluorescent or sil-ver-staining detection. We report our conditions for auto-radiography detection of deoxycytidine-5'-triphosphate,

tetratriethylammonium salt, (alpha-32P) incorporated intothe PCR products.

Figure 1 shows a representative sample of the results forthe three multiplex reactions and single-locus amplifica-tions of alleles for the loci HUMFESFPS, and D6S366. Re-sults for HUMARA and HUMRENA4 have been shownby Edwards et al. (1991a).

Validation and NomenclatureAll STR loci have been typed in DNA extracted from

various tissues, body fluids, and cultured cells, includingblood, cultured lymphoblasts, vaginal epithelial cells, se-men, and skin tissue. Results of samples from the sameindividual gave identical allele typing for all tissues tested.

Inheritance patterns of the loci were tested on samplesfrom families presenting to our diagnostic laboratory andfrom at least three three-generation CEPH families. Allloci show codominant inheritance consistent with the ex-pected pattern for the chromosome to which the locus hasbeen assigned.

Allele designation follows the lead of Edwards et al.(1991a) in using the number of reiterations of the repeatsequences as the allele number. At least two alleles in eachlocus have been sequenced. Designations of the allelenumbers assigned to the database and family studies weremade by comparison to at least two external cell line con-trols, RJK 1094 and RJK 1258, electrophoresed simulta-neously with samples of unknown allele designation. Partof the databases for the HUMF13AO1 and HUMTH01(Puers et al. 1993) loci were typed using an allele ladderprovided by Promega Corporation. Several loci haveshown variant alleles that do not correspond to a changein the number of reiterations of the core sequence. Wherethe sequence of these alleles is known, they have been des-ignated per the recommendation on nomenclature stan-dardization of STR loci, made by the DNA commissionmembers at the meeting of the International Society of Fo-rensic Haemogenetics, in Venice, Italy, in October 1993.In the two cases in which the sequence difference of thevariant allele has not been determined (HUMHPRTB [Ed-wards et al. 1992] and HUMCYAR04), the variant allelehas been designated as "NA," whereN is the allele numberof the closest, smaller allele.

Allele Frequency EstimatesA complete enumeration of all genotype frequencies for

such hypervariable loci requires substantial space, andsuch data are not easily comprehensible without appropri-ate summarization. As an illustration, table 2 shows thegenotype frequencies, for the Caucasian data, at theD6S366 locus. This locus exhibits a moderate heterozy-gosity (70%-80%) and has a large number of genotypes.This is typical of all of the loci. The gene count estimatesof allele frequencies for eight of the loci are given in Ap-

178

Page 5: short tandem repeats

Hammond et al.: STRs for Personal Identification

15-13 -=11 -=

97 -=

15-13 -

11-z9 -=

15-

13-

11 -

9-

7-

A

15 -

7

5 -=

4-2

11-97-5

D

12-

10-

C

15-13 -Z

11-=

11-

7-

5-

17-15-13-

B II-

13-11-

17-15-

13-11-I

E

Figure I Examples of autoradiographs showing separation ofPCR products from three three-locus multiplexes and two single-locusPCR reactions, on a denaturing sequencing gel. Representative alleles are

shown for (A) multiplex 1-HUMHPRTB[AGAT]n (top), HUMFAB-P[AAT]n (middle), and HUMCD4[AAAAG]J (bottom); (B) multiplex 2-HUMCSFlPO[AGATJn (top), HUMTH01[AATG]n (middle), and HUM-PLA2A1[AAT]n (bottom); (C) multiplex 3-HUMF13AO[AAAG]n (top),HUMCYAR04[AAAT]n (middle), and HUMLIPOL[AAATJn (bottom);(D) HUMFESFPS[AAATJn; and (E) D6S366. The first two lanes of eachset of reactions contain the controls RJK 1094 and RJK 1258. The alleledesignations are shown as a scale next to each set of alleles. The nonuni-form tick marks on the scales in panel B (middle) and panel C (top) desig-nate the variant alleles that are shown in the photograph.

Table 2

Genotype Frequencies of D6S366 Caucasians

ALLELE NUMBER

10 11 12 13 14 15 16 17 18

10 . . .. ... ... ... 1 1 2 ... 1 ...

11 1 2 3 5 2 4 1 ...

12 20 12 7 10 15 4 ...

13 8 10 9 13 3 114 1 6 6 1 ...

15 2 10 ... ...

16 6 3 ...

17. ... ...

18 ...

NOTE.-Total no. of individuals = 170.

pendix A. Similar data on the other five loci have been pub-lished previously (Edwards et al. 1992; Puers et al. 1993).

Tests ofHWEHWE depends on the existence of large randomly mat-

ing (panmictic) populations. These characteristics are notalways found in natural populations. Deviations from thepredictions of the square law, (Xpi )2 = 1, may be caused bya variety of factors including nonrandom mating, ascer-tainment bias, distortions in the contribution of gametesto offspring, and faulty data interpretation (Milkman andBeatty 1970; Bell et al. 1982). The calculations of genotypefrequency from allele frequencies require approximateagreement with the square law.

Three tests were performed to meet this requirement:the X2 test based on overall heterozygosity, the L test, andthe E test. These tests were designed to indicate deviationof the observed genotype frequencies from those expectedunder HWE. The observed and expected number of het-erozygotes, on which the X2 test is constructed, and theobserved and expected number of distinct genotypes (datanot shown) provide summary statistics that detect grossdeviation from equilibrium. The L test (Weir 1991) wasused to contrast the observed and expected frequencies ofall possible genotypes. Since the values of these test statis-tics (x2 or L) are, by themselves, not indicative of departurefrom HWE, and since the E-test procedure of Guo andThompson (1992) does not provide a comparable test sta-tistic, the empirical levels of significance for each test andlocus-population combination are contained in AppendixB. The levels of significance for X2 and L were obtained bythe permutation method as described by Chakraborty etal. (1991). Significant deviation from the predictions ofHardy-Weinberg for each test-locus-population combina-tion are indicated in Appendix B.

Deviations were detected in 8 of the 52 possible locus-population combinations, by comparison of the observed

179

Page 6: short tandem repeats

Am. J. Hum. Genet. 55:175-189, 1994

Table 3

Estimates of Fixation Indices within Individuals, and their SEs

POPULATION

Locus Caucasian Black Mexican-American Asian Pooled

HUMHPRTB .................. .116 ± .046a .062 ± .040 .096 ± .065 .092 ± .087 .056 ± .0212HUMFABP .................. -.003 ± .031 -.037 ± .027 .038 ± .031 .088 ± .047 .032 ± .015'HUMCD4 .................. -.005 ±.032 -.003 ±.024 .012 ±.032 .149 ±.043a .026 ±.0128HUMCSF1PO .................. -.001 ± .031 -.025 ±.026 -.038 ±.028 -.024 ±.053 -.010 ±.014HUMTH01 .................. .059 ± .030a -.047 ± .033 .051 ± .032 .063 ± .047 .058 ± .015HUMPLA2A1 .................. -.030 ± .030 -.021 ± .028 -.005 ± .028 .173 ± .051l .027 ± .015HUMF13AO1 .................. .033 ±.027 .016 ±.022 -.024 ±.028 .010 ±.051 .035 ±.011lHUMCYAR04 .................. .036 ± .027 .033 ± .029 -.006 ± .028 -.047 ± .047 .032 ± .014aHUMLIPOL .................. -.044 ±.036 .003 ±.031 -.018 ±.033 -.049 ±.051 -.002 ±.016HUMRENA4 .................. -.010 ±.043 .014 ±.033 -.005 ±.042 .039 ±.067 .017 ±.018HUMFESFPS .................. -.021 ± .033 .058 ± .032 .017 ± .032 .077 ±.050 .046 ± .0168D6S366 ........... ....... .015 ± .027 .018 ± .026 -.016 ± .026 .063 ± .050 .026 ± .014HUMARA .................. .012 ± .032 .067 ± .027 -.008 ±.040 .121 ± .050a .048 ± .015aWeighted average ........... .011 ± .032 .008 ± .028 -.001 ± .032 .060 ± .051 .029 ± .014

a Signifiant deviation from 4 = 0.

and expected number of heterozygotes (X2). The devi-ations occurred in the HUMHPRTB(AGATV-Caucasian,HUMHPRTB[AGATJ-Black, HUMTH01[AATGL-Asian,HUMPLA2A1[AAT]n-Asian, HUMF13A01[AAAGJ-Cau-casian, HUMCYAR04[AAATL-Black, HUMFESFPS[AA-AT]L-Asian, and HUMARA[AGCL-Black populations. TheL test showed only three deviations, at HUMTH01[AATGL-Black, HUMF13A01[AAAGJL-Black, and HUMCYA-R04[AAAT]n-Black, in 52 possible combinations. The L-test statistic was also barely insignificant (P = 5.2%, with5,000 replications of permutations) for the D6S366-Blackcombination. The E test revealed significant departurefrom Hardy-Weinberg proportions in 7 of the 52 combi-nations examined: HUMTH01[AATG]J-Black, HUM-TH01[AATG]n-Asian, HUMARA[AGC]n-Black, HUM-PLA2A1[AATjn-Asian, HUMFESFPS[AAAT]n-Asian,D6S366-Black, and HUMCYAR04[AAAT]n-Black. In ad-dition, comparisons of the observed and expected numberof distinct genotypes (data not shown) detected only onedeviation (HUMCD4[AAAAG]J-Asian). When each ofthese tests was performed on the pooled sample, signifi-cant deviations were noted by one or more of the tests, for10 of the 13 loci (last column of Appendix B).

It should be noted that the deviating locus-populationcombinations were not the same for each of the three tests.Deviation from the expectations of Hardy-Weinberg wasobserved in most of the pooled population where devia-tion would be expected. Only two of the locus-populationcombinations (HUMARA[AGC]n-Black and HUMCYA-R04[AAAT]n-Black) demonstrated consistent deviationfrom the expectations of Hardy-Weinberg proportions.These data indicate, that with the possible exception ofthe HUMARA[AGC]n-Black and HUMCYAR04-

[AAAT]n-Black combinations, the square law may be usedto calculate genotype frequencies from the allele fre-quencies.

These conclusions hold true when certain fractions ofthe homozygotes are assumed to be from nondetectablenull-allele heterozygosity. Gart and Nam's (1984) estimateof r, incorporated in allele shuffling across individuals, pro-duced no significant deviation from HWE, by any of thethree tests. The estimated r and the revised levels of sig-nificance, for originally significant results, are indicated inAppendix B.

4) and Population SubstructureEstimates of 4 and their SEs for each locus-population

combination are presented in table 3. Examination ofthese values allows evaluation of whether the significantdeparture from HWE (Appendix B) is due to populationsubstructure (Yasuda 1968). Footnotes to the table indi-cate significant departure of 4) from HWE. The last row ofthe table contains the weighted averages of values acrossloci, for each population. Of the 52 estimates in the foursamples, 6 are significantly different from zero (2 in Cauca-sians, 1 in Blacks, and 3 in Asians). There is no correlationbetween locus-population samples showing deviationfrom HWE in Appendix B and the data in table 3. Eight ofthe 13 loci show significant fixation indices in the pooledsample. Although the average degree of inbreeding (acrossloci) is not significant in any of the ethnic samples, theAsian sample shows the largest average fixation index (4= 5.96%). Average fixation indices in the Caucasian, Black,and Hispanic populations are at the level of <1%, consis-tent with the observation in VNTR loci (Morton et al.1993). The estimate of 4 (3%) in the pooled sample shows

180

Page 7: short tandem repeats

Hammond et al.: STRs for Personal Identification

FABP CSFl PO PLA2A1 CYAR04 RENA4 D6S366

HPRTB CD4 THOI F13AO1 LIPOL FESFPS ARA

HUMHPRTB

HUMFABP

HUMCD4

HUMCSFI PO

HUMTH01

HUMPLA2AI

HUMF13AO1

HUMCYAR04

HUMLIPOL

HUMRENA4

HUMFESFPS

D6S366

HUMARA

Caucasian

Black

Hispanic

Asian

Pooled -

Figure 2 Illustration of the significant dependence of alleles in pairs of loci. Hatches in the upper diagonal indicate the populations that showdependence of alleles for each pair of loci. Shading in the lower diagonal indicates the pooled populations that showed dependence of alleles betweenpairs of loci.

that the effect of substructuring on the genotype probabil-ities is not appreciable.

Since the presence of null alleles may also be responsiblefor nonsignificant estimates, we calculated the correla-tion coefficients of the locus-specific O's and r's withineach population p = .95, .92, .96, and .91, for Caucasians,Blacks, Mexican-Americans and Asians, respectively).These suggest that the presence of related individuals inthe sample, or hidden substructure within the populationssampled, may not be the only causes of the individual sig-nificant locus-population O's. We show that the effects ofthe observed 0 values on using the allele frequency datafor personal identification purposes is not appreciable.

Independence across LociSubstructuring within populations can also produce ga-

metic phase disequilibrium. The random association of al-leles between genetically unlinked loci is disturbed by pop-ulation substructure. Figure 2 shows the pairs of lociwhere genotype probabilities are not independent, as de-termined by the contingency-table X2 test (empirical) basedon two-locus match frequency observations. In the upper-

diagonal segment, we have 17 significant pairs of loci (6 inCaucasians, 4 in Blacks, 5 in Mexican-Americans, and 2 in

Asians), which is close to the expectation (15.6), based on5% level of significance, for 312 total tests. Thirteen of the78 pairwise tests showed significant evidence of depen-dence of pairwise genotype frequencies in the pooled sam-ples (lower-diagonal segment of fig. 2).

Table 4 presents the values of the s4 statistic for the locistudied. The values do not differ significantly from the ex-pected values. The failure of the statistic to detect amalga-mation in the pooled population indicates an absence oflinkage disequilibrium, or lack of power in the test, as dis-cussed by Edwards et al. (1992).We observed nearly the expected number of significant

pairwise X2 values, and no deviation from linkage equilib-rium based on the Sk statistic. These observations suggestthat even if substructure exists within any of the four sam-pled populations, its level is so low that it has virtually noeffect on the multilocus genotype probabilities.

Mutation RatesIndirect estimates of mutation rates for the 13 STR loci

are shown in table 5. The mutation rate estimates werebetween 2.36 X 10-5 and 1.86 X 10-4 for the STR loci.This compares favorably with previously reported rates

I

I

181

Page 8: short tandem repeats

Am.J. Hum. Genet. 55:175-189, 1994

Table 4

Test for Random Association of Alleles at 13 STR Loci In Four Populations

ELEVEN AUTOSOMAL Loci ALL 13 LOCIb

POPULATION nc Observed Expected 95% CId n Observed Expected 95% CId

Caucasian ................ 127 2.39 2.20 1.68-2.73 51 2.47 2.48 1.53-3.42Black ........ ........ 110 1.49 1.94 1.44-2.45 50 1.72 2.32 1.43-3.22Mexican-American ........... 104 2.22 2.21 1.63-2.78 28 2.62 2.44 1.19-3.70Asian................ 45 2.92 2.46 1.48-3.43 16 2.73 2.72 1.89-4.56

Pooled................ 386 2.36 2.22 1.91-2.52 145 2.42 2.52 1.95-3.08

aThe statistic s' (the observed variance of the no. of heterozygous loci per population) detects linkage disequilibrium of alleles from the distributionof heterozygous alleles in a sample (Sokal and Rohlf 1969).

b For female samples, the test was performed only in the X-linked loci.' No. of individuals studied.d 95% confidence intervals.

(Edwards et al. 1992) and is generally between the valuesestimated for protein markers and VNTR loci.

Power ofExclusionParentage studies report two statistical values: the

power of exclusion for the loci studied and the probabilityof paternity, based on the alleles shared between parentand child at each locus. The average power of exclusionwas calculated for each locus, for all four populationgroups. The powers of exclusion ranged from 19.01% to

80.72%. The calculated values for the Caucasian popula-tion are presented in table 5. Combined average powers ofexclusion for each population are depicted in figure 3. Theorder of typing is the listed order of loci in table 1.

Calculations for Forensic Evaluations

Forensic identifications usually require comparison ofthe typing results from a known sample to those of a sam-

ple of unknown origin. A statement can then be made as

to the possibility that the unknown sample could haveoriginated from the individual donating the known sam-

ple. This comparison is straightforward, and DNA typingof VNTR loci and our STR loci provide very high powerof discrimination between unrelated individuals. When an

unknown sample is declared to match in genotype for alltyped loci, the forensic scientist is then asked to place a

significance number on this "match." Typically the foren-sic scientist will report the frequency at which the com-

Table 5

Statistics for Forensic Identification and Parentage Studies

HeteLocus (%) Pevb ± SEC

HUMHPRTB[AGATL.. 77.2 56.0 4.84 X 10-s ± 9.86 X 10-6HUMFABP[AAT . .. 64.9 40.7 4.63 X 10-s ± 8.46 X 10-6HUMCD4[AAAAG]n 68.0 39.8 4.81 X 10-i ± 8.60 X 10-6HUMCSF1PO[AGATn. 74.2 50.3 4.88 X 10-5 ± 8.75 x 10-6HUMTHO1[AATGL.. 77.3 56.0 3.96 X 10-5 ± 7.66 X 10-6HUMPLA2A1[AAT]n 72.4 51.4 4.61 X 10-5 + 8.40 X 10-6HUMF13AO1[AAAGL.. 73.3 49.4 6.60 X 10-5 ± 1.06 X 10-5HUMCYAR04[AAATL. 72.7 48.9 4.64 X 10-5 + 8.46 X 10-6HUMLIPOL[AAATn.. 68.1 41.7 3.57 X 10-5± 7.21 x 10-6HUMRENA4[ACAGL. 36.4 19.0 2.36 X 10-5 ± 5.62 X 10-6HUMFESFPS[AAATL.. 70.2 44.3 4.30 X 10-5 ± 8.12 X 10-6D6S366 .82.5 64.8 6.13 X 10-5 ± 1.02 X 10-5HUMARA[AGC]n 88.6 76.3 1.86 X 10-4 ± 2.53 X 10-s

' Unbiased estimate of heterozygosity of the Caucasian population.b Power of exclusion for the Caucasian population.C Indirect estimate of mutation rate and SE for each STR locus (Chakraborty and Neel 1989).

182

Page 9: short tandem repeats

183Hammond et al.: STRs for Personal Identification

Table 6

Individualization Potential Values Calculated under HWE and in the Presence of Inbreeding (qS)

POPULATION

Mexican-Caucasian Black American Asian Pooled

HWE HWE 10 HWE 0 HWE 10 HWE

HUMHPRTB[AGAT] ..... .0853 .0816 .0865 .0836 .1252 .1221 .1580 .1615 .0917 .0893HUMFABP[AATJn.......... .1715 .1715 .0657 .0672 .2400 .2397 .2315 .2316 .1258 .1262HUMCD4[AAAAGk ........ .1693 .1699 .0532 .0533 .1700 .1688 .2842 .2731 .1103 .1091HUMCSFlPO[AGATJn...... .1129 .1130 .0817 .0831 .1249 .1286 .1112 .1131 .1045 .1052HUMTHOI[AATGJn........ .0866 .0843 .0950 .0978 .0958 .0931 .1201 .1190 .0759 .0732HUMPLA2Al[AATJn....... .1083 .1085 .0558 .0567 .0823 .0825 .0732 .0697 .0645 .0636HUMF13A01[AAAGJn...... .1182 .1162 .0647 .0643 .0765 .0780 .1774 .1770 .0709 .0693HUMCYAR04IAAATk ..... .1214 .1193 .1602 .1580 .1473 .1478 .1301 .1336 .1213 .1190HUMLIPOL[AAAT]n ...... .1608 .1652 .1011 .1009 .1923 .1931 .2757 .2724 .1477 .1478HUMRENA4[ACAG]n ..... .4341 .4332 .3431 .3445 .4108 .4104 .4057 .4096 .3901 .3919HUMFESFPS[AAATJN ...... .1448 .1468 .1013 .0986 .1445 .1433 .1429 .1408 .1230 .1206HUMARA[AGCJ ........... .0253 .0251 .0168 .0165 .0220 .0221 .0265 .0264 .0159 .0156

bined genotype will be observed in a population (i.e., 1 in breeding within populations. The estimated values of k100 people, 1 in 1 million people, etc.). This is done by from table 3 were used in the calculations. The differencecalculating the individual genotypes and multiplying geno- in the P1 values is <5%. The presence of inbreeding withintype frequencies across unlinked loci. Figure 4 shows the the population does not contribute significantly to the cal-cumulative genotype frequency of the most common ge- culations of either multilocus statistics or genotype fre-notype for each locus analyzed, in four population groups. quencies.Loci were accumulated in the order in which they arelisted in table 1. Seven STR loci provide a most common Dsusogenotype frequency of <1 in 10,000 people, for all four Dsusopopulation groups. We have established a DNA-typing methodology using

Table 6 compares the values of the individualization po- 13 STR loci and have performed a genotype survey of fourtential calculated under HWE and in the presence of in-

* a a je . S S .S'9

U*8

o Ua

U

80

U

1 2 3 4 6 6 7 8 0 10 11 12 13

Number of Loci Typed

It.n

0" " 9 0 1 1

Cuber. -of toel Type4

Figure 3 Graph of the cumulative power of exclusion for four Figure 4 Graph of the most common multilocus genotype fre-

populations, by number of loci tested. 0 = Caucasian power-of-exclu- quency for four populations, by number of loci tested. 0 = Caucasian

sion data;O0 = Black data; 0 Mexican-American data; andE0 = Asian multilocis-genotype data;@ Black data; 0 Mexican-American data;data. All cumulative powers of exclusion are >99% after typing of the andE0 = Asian data. A frequency of 1 in 1 million will be obtained for

first eight SIR loci. most individuals after typing the first nine SIR loci.

1.00 I

a

0ka .7

.6

.4

Page 10: short tandem repeats

Am. J. Hum. Genet. 55:175-189, 1994

population groups, using these loci. The validity of usingthe square law (HWE) to calculate expected genotype fre-quencies and of employing the product rule to combinegenotype frequencies across loci has been examined.The populations studied were selected on the basis of

both visual identification of population group and sur-names. While all individuals may not be accurately classi-fied, improper classifications are likely to be small in num-ber. This classification system also reflects the classificationof individuals as they might be described for a forensic orparentage-testing case. We have only classified the individ-uals into broad racial/ethnic population groups.We have presented information validating the STR typ-

ing method for both parentage testing and forensic use.Criteria suggested for choosing loci for DNA typing havebeen suggested (Parentage Testing Committee, AmericanAssociation of Blood Banks 1990; Technical WorkingGroup on DNA Analysis Methods, California Associationof Criminalists Ad Hoc Committee on DNA Quality As-surance 1991). The main questions for validating STR typ-ing for personal identification are as follows: Do the lociexhibit Mendelian inheritance consistent with the chro-mosome on which they lie? Is the mutation rate sufficientlylow (<0.2%; Parentage Testing Committee, American As-sociation of Blood Banks 1990) that new mutations will berare? Do the populations studied provide allele frequenciesconsistent with the expectations of Hardy-Weinberg, sothat they may be used to calculate genotype frequenciesand probabilities of paternity? Are the loci studied un-linked, so that the product rule can be used to accumulatestatistical information across loci? Are the powers of ex-clusion large enough to provide significant parentage-test-ing results after typing a reasonable number of loci? AreSTR DNA typing results the same for standard specimensof varying age? Does DNA isolated from various tissuesof the same individual yield the same genotyping result?(Parentage Testing Committee, American Association ofBlood Banks 1990; Technical Working Group on DNAAnalysis Methods, California Association of CriminalistsAd Hoc Committee on DNA Quality Assurance 1991).Data showing the compliance of STRs with these issueshave been presented.

Other suggested validation studies include environmen-tal effects, mixed specimens, and matrix studies. Thesestudies, although important to forensic casework, are be-yond the scope of this paper in establishing STR DNA typ-ing as a reliable and powerful personal identificationmethod.

All of the loci have exhibited either autosomal or X-linked codominant inheritance in at least three large three-generation families. Other, smaller family studies have notshown variance from this observation.The population studies and tests for HWE, although

showing some deviation for particular locus-population-test combinations, do not show a consistent pattern of

deviation. Only 18 deviations were seen in the 156 possiblelocus-population-test combinations presented in this pa-per. Previous presentation of data for five of the locishowed similar results (Edwards et al. 1992). The fewdifferences between the data presented in this paper forthe five STR loci previously examined (Edwards et al.1992) have three bases. In Edwards et al. (1992) the HUM-HPRTB[AGAT]n and HUMFABP[AAT]I, loci containeddata from the CEPH families. The data presented here donot include the CEPH samples. A new permutation analy-sis has been performed (each new analysis with randompermutations will produce slightly different probabilitiesfor each statistical test), and 5,000 permutations (as com-pared with 1,000 in Edwards et al. 1992) were performed.Since no consistency in the deviations was observed, wemust conclude that the loci presented are in HWE.We have further shown that, between loci, there is no

random association of the alleles. They are in linkage equi-librium. Pairwise comparison of multigenotype data doesnot show consistent significant dependence of alleles, be-tween any of the loci. These results are not surprising,since the majority of the loci reside on separate chromo-somes. Thus, the product rule for combining genotype fre-quencies across unlinked loci is valid for use with the lociwe have studied.

Additional study of the effect of inbreeding within pop-ulations suggests that substructuring is not present at de-tectable levels, since, on average, none of the populationsexhibits a significant 4. If substructuring exists in the pop-ulations, its effect is too small to contribute significantly tomultilocus genotype frequencies. This would indicate thatthe four major population groups are appropriate classifi-cations. Allele frequency data for each locus-populationwill provide reasonable estimates of multilocus genotypefrequencies by product rule.

In Texas, the family-law courts require any method ofpaternity testing to provide a power of exclusion of >99%(increased from 95%, on September 1, 1993). That is, a"match" or nonexcluding result in a paternity case, mustexclude >99% of the male population from being the fa-ther of the child in question. Figure 3 summarizes the cu-mulative power of exclusion data. After typing of the firsteight loci (as listed in table 1), all population groups hadreached a power of exclusion greater than the 99% limit.The nine loci that we have included in the three multiplexreactions are our standard battery of loci with which weanalyze paternity cases. The power of exclusion, in all fourpopulation groups, after typing these nine loci, is >99%.We conclude that the STR loci do provide a highly dis-criminatory system that is appropriate for use in parentagetesting.

In addition to the statistical analyses presented, all 13loci were examined with respect to the mode of allele gen-eration. With exclusion of the Asian data (insufficient sam-ple size), we found 2 of 39 possible deviations from con-

184

Page 11: short tandem repeats

Hammond et al.: STRs for Personal Identification

formity, with the stepwise mutation model. Thirteen ofthe 39 locus-population combinations showed deviationfrom conformity to the infinite-allele model (IAM) (Shriveret al. 1993). As previously observed for the five loci studiedby Edwards et al. (1992), STR loci (3-, 4-, and 5-nucleotiderepeats) more closely emulate the stepwise mutationmodel than the IAM. The indirect estimates of the muta-

tion rates presented here (see table 5), which assume theIAM, are overestimates even if the STR loci strictly followthe stepwise-mutation model. Since the largest of these es-

timates (1.86 X O-' for the HUMARA[AGC]n locus) is anorder of magnitude smaller than that for the D1S7 minisat-ellite locus (Jeffreys et al. 1988), we conclude that the esti-mates of mutation rates for the 13 loci included in thepresent study are sufficient to indicate that an isolated dis-cordance of genotypes, between a putative parent-off-spring relationship, is the result of a de novo mutation.

Calculations of fixation indices within individuals (i.e.,k) for each locus-population combination have shownthat the levels of departure from HWE cannot be ex-

plained sufficiently by the presence of substructure withinthe populations. Further, using 0 estimates does not sig-nificantly change calculations of the individualization po-

tential or "match probability." Estimates of genotype fre-quencies will not be significantly changed by including Xvalues in their calculation. The frequency of the homozy-gous genotype with the most common allele in all 52 locus-population combinations (HUMRENA4[ACAG]n, Cauca-sian allele 8, 78.3 + 2.2% [Edwards et al. 1992]) showsa <1% underestimate of the frequency when only HWE,rather than HWE with 0 values, is used. This differencewill have very little impact on a multilocus genotype havinga frequency <1.0 X 10-6 (1 in 1 million). Similar calcula-tions with the power-of-exclusion values and probabilities

of paternity show equally small differences in the multilo-cus values.The benefit of STR typing over existing VNTR loci that

have similar characteristics is the ease and speed of analy-sis. Nine to 13 (or more) STR loci can be typed in 2 d, fromprepared DNA. The use of PCR for amplification of theloci also decreases the amount of DNA that must be usedfor testing. Degraded DNA samples are also amenable tothe STR methods. Many samples that would fail to pro-vide VNTR typing results will produce STR results. Thissystem is also amenable to automation using fluorescentdetection techniques (Edwards et al. 1991a, 1992). Wehave successfully used this technology to resolve questionsinvolving laboratory sample switches, contamination ofcultured amniocytes or chorionic villi cells in prenatal di-agnosis, questioned parentage, bone marrow transplants,missing children, aircraft accidents, sexual assault, murder,other criminal identification issues, and identification ofwar casualties. In conclusion, we find that the trimeric, tet-rameric, and pentameric STR loci we have presented pro-vide an accurate, highly discriminating, sensitive, and rapidtechnique for DNA typing for personal identification is-sues in forensic science, parentage testing, and medical di-agnostics.

AcknowledgmentsWe thank Dr. Belinda Rossiter for reviewing the manuscript.

C.T.C. is an investigator with the Howard Hughes Medical Insti-tute. This work was prepared under grants 90-IJ-CX-0037 and92-UJ-CX-K042 (to C.T.C.) and 90-UJ-CX-K024 (to R.C.), fromthe National Institute of Justice, Office of Justice Programs, U.S.Department of Justice. Points of view or opinions in the docu-ment are those of the authors and do not necessarily representthe official position of the Department of Justice.

185

Page 12: short tandem repeats

Appendix ATable A I

Allele Frequencies for Eight STR Loci in Four Populations

ALLELE NUMBER, FREQUENCY ± STANDARD ERROR AT Locus

HUMCD4 HUMCSFIPO HUMPLA2A1 HUMF13AO1 HUMCYAR04 HUMLIPOL HUMFESFPS D6S366

Caucasians:7, 38.7±2.58, 30.6 ± 2.411, .3 ±.312, 27.7 ± 2.313, 2.1 ±.714, .5 ± 0.4

n 382Blacks:

7, 31.1 ± 12.48, 13.2+1.79, .8±.410, 14.5 ± 1.811, 3.7± 1.012, 12.4 + 1.713, 15.0± 1.814, 6.8 ± 1.315, 1.6 ±.616, 1.0 ±.5

n 380Mexican-Americans:

7, 39.6 ± 2.58, 17.9 ± 2.010, .8 ±.412, 37.0 ± 2.413, 2.8 ±.814, 1.8 ±.7

n 386Asians:

6, 2.6 ± 1.37, 56.4 ±4.08, 1.9+1.110, 1.3 +.911, .6 ±.612, 35.9± 3.813, .6 ±.614, .6 ±.6

n 156

8, .3 ±.39, 4.0±1.110, 26.4 ± 2.411, 26.7±2.412, 33.4±2.513, 7.0 ± 1.414, 2.0±.815, .6± .4

344

7, 5.2 ± 1.28, 4.6±1.19, 2.7 ±.810, 25.5 ± 2.311, 23.4±2.212, 29.9 ±2.413, 7.3 ± 1.414, .8±.515, .5 +.4

368

7, .3 ±.38, .3 ±.34, 2.1±.710, 29.1 + 2.411, 28.9+2.312, 31.6±2.413, 6.7 ± 1.314, 1.1 ±.5

374

9, 4.9 ± 1.810, 27.8+3.711, 24.3+3.612, 33.3 ± 3.913, 9.0±2.4149 .7 +.7

144

10, 1.9 +.711, 46.2±2.612, 13.0± 1.713, 1.9 ±.714, 13.2 ± 1.815, 14.6 ± 1.816, 9.1 ± 1.5

370

10, .5 ±.411, 10.6 ± 1.612, 13.2 ± 1.813, 12.8 ± 1.714, 24.2 ± 2.215, 22.1 ± 2.116, 15.7 ± 1.917, .8± .4

376

10, 1.1 ±.511, 36.2±2.512, 7.3+1.413, 9.2 + 1.514, 13.2 ± 1.815, 23.8 ± 2.216, 8.9± 1.517, .3 ±.3

370

11, 29.5 ±3.7129 3.2+ 1.413, 17.9±3.114, 21.8 ± 3.915, 12.8 ±2.716, 14.7±2.8

156

3.2,a 8.3± 1.54, 2.0±.85, 19.2 ± 2.16, 34.5 ± 2.57, 32.5 ±2.58, .6 ±.411, .3 ±.314, .9 _.515, 1.7±.7

348

3.2,9 11.1±1.74, 5.7 1.25, 34.0 ± 2.56, 14.6 1.97, 20.3 ±2.18, 7.7 1.49, 1.1±.610, .6 ±.411, 1.1±.612, .3 ±.313, 2.0±.714, 1.1±.615, .3 ±.3

350

3.2,a 25.4 ± 2.34, 10.1 ± 1.65, 16.1 ± 1.96, 19.9 ± 2.17, 26.2 ± 2.38, .8 ±.513, .8 ±.515, .5 ±.4

366

3.2,a 27.8 ± 3.44, 8.7± 2.55, 10.3 ± 2.76, 50.8 ± 4.47, .8 ±.812, .8 ±.816, .8 ± .8

126

5, 34.2± 2.46, 15.0± 1.87, 10.0 ± 1.58, .5 ±.49, 1.3 ±.610, 35.3 ±2.411, 2.6±.812, 1.0 +.5

380

5, 36.6 ± 2.66, 42.5 ± 2.66A,' 1.4±.67, 5.5 ± 1.28, .3 ±.39, .3 ±.310, 10.3 ± 1.611, 4.0± 1.0

348

5, 41.2±2.66, 28.3 ± 2.46A,' .3 ± .37, 4.7±1.18, .5 ±.49, 1.1±.510, 22.8 ± 2.211, 1.1±.5

364

5, 29.9 ± 3.76, 17.5±3.17, 2.6 ± 1.39, 1.9±1.110, 40.3 ± 4.011, 6.5±2.012, 1.3±.9

154

9, 3.7± 1.010, 43.6 ±2.611, 28.3±2.312, 22.0±2.113, 2.4±.8

378

7, .6±.49, 14.1±1.910, 35.9 ±2.611, 13.5±1.812, 27.6 ± 2.413, 7.8±1.414, .6±.4

348

9, 1.7+.710, 53.6 ±2.611, 20.1±2.112, 21.0±2.213, 3.1±.9149 .6+.4

358

9, 1.3±.910, 67.5 ±3.811, 13.0±2.7129 13.0+2.7139 4.5l1.714, .6+.6

154

8, 1.4±.610, 30.7± 2.411, 39.0±2.612, 22.0± 2.213, 6.3±1.314, .5±.4

364

7, .3+.38, 7.3±1.49, 4.6±1.210, 24.0± 2.411, 35.7 ±2.612, 23.2± 2.313, 4.8±1.2

328

8, .6±.49, .3+.310, 16.6±2.211, 43.4±2.812, 27.7 ± 2.513, 7.8±1.514, 1.6±.7

318

8, .7+.79, 15.+1.010, 9.7i2.611, 47.0±4.312, 21.6±3.613, 18.7±3.4

134

10, 1.5±.611, 5.6±1.212, 26.5 ± 2.413, 20.0 ±2.214, 11.2±1.715, 12.6± 1.816, 18.5±2.117, 3.8±1.018, .3±.3

340

9, 3.1±.910, 1.4±.611, 6.7±1.312, 13.2± 1.813, 25.6 ±2.314, 25.3 ±2.315s, 17.4±2.016, 6.5±1.317, .8±.S

356

9, .9±0.510, .9 ±.511, 4.6±1.212, 12.2± 1.813, 32.9 ± 2.614, 17.7±2.115, 9.1±1.616, 19.8±2.217, 1.5±.718, .3±.3

328

9, 2.9±1.411, 8.8±2.412, 33.1±4.013, 41.2±4.214, 11.0±2.715, 1.5±1.0

136

NOTE.-Locus designations are the GenBank locus name, without the repeat designations. The allele designation is the no. of units of the corerepeat. Allele frequency (%) ± standard error of the allele frequency is shown. n = No. of chromosomes sampled.

* Variant alleles that do not migrate with adjacent allele repeat reiterations. We have followed the recommendations on nomenclature standardizationof STR loci, made by the DNA commission members at the meeting of the International Society of Forensic Haemogenetics in Venice, Italy, in October1993, in naming these alleles, where the sequence of the variant is known (i.e., 3.2 is a fragment that migrates at the position of a fragment that is twobases longer than a fragment containing three repeats of the core sequence); unsequenced variants are designated with the adjacent smaller allele and aletter (i.e., 6A).

186

Page 13: short tandem repeats

Appendix BTable BI

Tests of Hardy-Weinberg Expectation

POPULATIONbLocus AND

TEST Caucasian Black Mexican-American Asian Pooled

HPRTB:X ........... .013c (.629) .050c (.482) .853 .808 .000dL ......... .651 .755 .253 .329 .097E.................. .357 .478 .170 .306 .038cr.......... .058 .031 .048 .046

FABP:.2 ......... .577 .289 .080 .637 .005c

L.................. .905 .334 .610 .713 .000dE ....... .835 .506 .310 .460 .000d

................... .000 .000 .019 .400CD4:

.2 .935 .839 .509 .203 .001dL .................. .225 .218 .567 .138 .000dE ......... .407 .101 .516 .112 .000dr ......... .000 .000 .006 .074

CSF1 PO:%2................ .789 .347 .108 .269 .105L ........ .833 .274 .803 .095 .078E .................. .819 .376 .890 .080 .114r................... .000 .000 .000 .000

TH01:2

......... .266 .164 .068 .039c (.393) .013cL .......... .085 .004 (.004) .457 .121 .002dE ......... .062 .004 (.004) .314 .047c (.467)b .001dr .... .030 .000 .025 .032

PLA2A1:2

......... .325 .486 .920 .000 (.486) .026cL.......... .500 .432 .701 .078 .128E.................. .629 .463 .819 .029' (.603)b .108r ......... .000 .000 .000 .086

F13AO1:2

......... .037c (.257) .396 .643 .878 .000dL.................. .599 049c (.306)b .461 .450 .021cE ......... .536 .060 .670 .287 .011cr ......... .016 .008 .000 .005

CYAR04:2

......... 1.000 .006 (.070 .670 .224 .003L .......... .425 .009d (.135)b .544 .971 .018cE ....... .335 .006d (.366)b .430 .979 .016c

r..018 .016 .000 .000LIPOL:

%2 .......... .312 .350 .844 .379 .589L .......... .467 .599 .999 .341 .374E .......... .605 .594 .998 .603 .369r ......... .000 .002 .000 .000

RENA4:%2 .........

.793 .226 .800 1.000 .396L ......... .960 .281 .519 .062 .617E .......... .963 .129 .391 .058 .211r .......... .000 .007 .000 .019

FESFPS:2

.......... .729 .373 .110 .007 162) .016cL ......... .970 .461 .351 .146 .338E ........ .991 .186 .442 .041c (.455)b .111r ....... .000 .029 .009 .039

187

Page 14: short tandem repeats

188 Am. J. Hum. Genet. 55:175-189, 1994

Table B I (continued)

POPULATIONbLocUS ANI)

TEST Caucasian Black Mexican-American Asian Pooled

D6S366:X2 ......... .099 .542 .678 .106 .003cL ......... .454 .052 .213 .163 .302E ......... .307 .036c (.411)b .281 .063 .151r ........ .008 .009 .000 .031

ARA:X2 ................ 577 .013c (.757)b .441 .084 .OO0dL ....... .248 .065 .426 .244 .013cE ....... .224 .015d (.383)b .266 .143 .004cr ....... .006 .033 .000 .060

* The empirical levels of significance (based on 5,000 replications of allele shuffling), in the three test procedures for detecting derivations of theobserved genotype frequencies from those expected under Hardy-Weinberg expectations, were calculated. The tests were conducted for genotypes offemales only for the X-linked loci. x2 analysis is of observed and expected no. of heterozygotes; probability values reflect 5,000 permutations. Proba-bilities computed from the L test (-2 In Lo/L1) were determined with 5,000 permutations.

b Nos. in parentheses are revised levels of significance, incorporating the estimated rs in shuffling of alleles across individuals, shown only when theresults were significant.

' Significant deviation from Hardy-Weinberg proportions, P < .05.d Significant deviation from Hardy-Weinberg proportions, P < .01.

ReferencesAhn YI, Kamboh MT, Ferrell RE (1992) Two new alleles in the

tetranucleotide repeat polymorphism at the lipoprotein lipase(LPL) locus. Hum Genet 90:184

Bell GI, Selby MJ, Rutter WJ (1982) The highly polymorphic re-gion near the insulin gene is composed of simple tandemly re-peating sequences. Nature 295:31-35

Boylan KB, Ayres TM, Popko B, Takahashi N, Hood LE, Prusi-ner SB (1990) Repetitive DNA (TGGA)n 5' to the human my-elin basic protein gene: a new form of oligonucleotide repeti-tive sequence showing length polymorphism. Genomics 6:16-22

Brown AHD, Feldman MW, Nevo E (1980) Multilocus structureof natural populations of Hordeum spontaneum. Genetics 96:523-536

Callen DF, Thompson AD, Shen Y, Phillips HA, Richards RI,Mulley JC, Sutherland GR (1993) Incidence and origin of"null" alleles in the (AC)n microsatellite markers. Am J HumGenet 52:922-927

Caskey CT, Hammond HA (1992) Forensic use of short tandemrepeats via PCR. In: Advances in forensic haemogenetics.Springer, Berlin, pp 18-25

Chakraborty R (1984) Detection of nonrandom association ofalleles from the distribution of the number of heterozygousloci in a sample. Genetics 108:719-731

(1993a) A class of population genetic questions formu-lated as the generalized occupancy problem. Genetics 134:953-958

(1993b) Generalized occupancy problem and its applica-

tion in population genetics. In: Sing CF, Hanis CL (eds) Genet-ics of cellular, individual, family, and population variability.Oxford University Press, New York, pp 179-192

Chakraborty R, Fornage M, Guegue R, Boerwinkle E (1991) Pop-ulation genetics of hypervariable loci: analysis of PCR basedVNTR polymorphism within a population. In: Burke T, DolfG, Jeffreys AJ, Wolff R (eds) DNA fingerprinting: approachesand applications. Birkhiuser, Basel, Boston, Berlin, pp 127-143

Chakraborty R, Kidd K (1991) The utility of DNA typing in fo-rensic work. Science 254:1735-1739

Chakraborty R, Neel JV (1989) Description and validation of amethod for simultaneous estimation of effective populationsize and mutation rate from human population data. Proc NatlAcad Sci USA 86:9407-9411

Chakraborty R, Zhong Y, Jin L, Budowle B (1994) Nondetect-ability of restriction fragments and independence of DNA-fragment sizes within and between loci in RFLP typing ofDNA. Am J Hum Genet 55 (in press)

Chakravarti A, Li CC (1983) The effect of linkage on paternitycalculations. In: Walker RH (ed) Inclusion probabilities in par-entage testing. American Association of Blood Banks, Arling-ton, VA, pp 411-420

Deka R, Chakraborty R., Ferrell RE (1991) A population geneticstudy of sixVNTR loci in three ethnically defined populations.Genomics 11:83-92

Edwards A, Civitello A, Hammond HA, Caskey CT (1991a)DNA typing and genetic mapping with trimeric and tetramerictandem repeats. Am J Hum Genet 49:746-756

Edwards A, Hammond HA, Jin L, Caskey CT, Chakraborty R(1992) Genetic variation at five trimeric and tetrameric tandem

Page 15: short tandem repeats

Hammond et al.: STRs for Personal Identification 189

repeat loci in four human population groups. Genomics 12:241-253

Edwards A, Voss H, Rice P, Civitello A, Stegemann J, SchwagerC, Zimmermann J, et al (1990) Automated DNA sequencingof the human HPRT locus. Genomics 6:593-608

Edwards MC, Clemens PR, Tristan M, Pizzuti A, Gibbs RA(1991b) Pentanucleotide repeat length polymorphism at thehuman CD4 locus. Nucleic Acids Res 19:4791

Gart Jj, Nam J (1984) A score test for the possible presence ofrecessive alleles in generalized ABO-like genetic systems. Bio-metrics 40:887-894

Guo S-W, Thompson EA (1992) Performing the exact test ofHardy-Weinberg proportion for multiple alleles. Biometrics48:361-372

Heame CM, Todd JA (1991) Tetranucleotide repeat polymor-phism at the HPRT locus. Nucleic Acids Res 19:5450

Huang TH-M, Hejtmancik JF, Edwards A, Pettigrew AL, Her-rera CA, Hammond HA, Caskey CT, et al (1991) Linkage ofthe gene for an X-linked mental retardation disorder to a hy-pervariable (AGAT)n repeat motif within the human hypoxan-thine phosphoribosyltransferase (HPRT) locus (Xq26). Am JHum Genet 49:1312-1319

Jeffreys AJ, Royle NJ, Wilson V, Wong Z (1988) Spontaneousmutation rates to new length alleles at tandem-repetitive hy-pervariable loci in human DNA. Nature 332:278-281

Jeffreys AJ, Wilson V, Thein SL (1985) Hypervariable 'minisatel-lite' regions in human DNA. Nature 314:67-73

Koorey DJ, Bishop GA, McCaughan GW (1993) Allele non-am-plification: a source of confusion in linkage studies employingmicrosatellite polymorphisms. Hum Mol Genet 2:289-291

Litt M, LutyJA (1989) A hypervariable microsatellite revealed byin vitro amplification of a dinucleotide repeat within thecardiac muscle actin gene. Am J Hum Genet 44:397-401

Milkman R, Beatty LD (1970) Large-scale electrophoresis studiesof allelic variation in Mytilus Edulis. Biol Bull 139:430

Morton NE, Collins A, Balazs 1 (1993) Kinship bioassay on hy-pervariable loci in Blacks and Caucasians. Proc Natl Acad SciUSA 90:1892-1896

Nakamura Y, Leppert M, O'Connell P, Wolff R, Holm T, CulverM, Martin C, et al (1987) Variable number of tandem repeat(VNTR) markers for human gene mapping. Science 235:1616-1622

Panzer SW, Hammond HA, Stephens L, Chai A, Caskey CT(1993) Trinucleotide repeat polymorphism at the D6S366 lo-cus. Hum Mol Genet 2:1511

Parentage Testing Committee, American Association of BloodBanks (1990) Standards for parentage testing laboratories.American Association of Blood Banks, Arlington, VA

Patel PI, Nussbaum RL, Gramson PE, Ledbetter DH, Caskey CT,Chinault AC (1984) Organization of the HPRT gene and re-lated sequences in the human genome. Somat Cell Mol Genet10:483-493

Polymeropoulos MH, Rath DS, Xiao H, Merril CR (1990) Tri-nucleotide repeat at the human pancreatic phospholipase A-2gene (PLA2). Nucleic Acids Res 18:7468

(1991a) Tetranucleotide repeat polymorphism at the hu-man c-fes/fps proto-oncogene (FES). Nucleic Acids Res 19:4018

(1991b) Tetranucleotide repeat polymorphism at the hu-man coagulation factor XIII A subunit gene (F13A1). NucleicAcids Res 19:4306

(1991c) Trinucleotide repeat polymorphism at the humanfatty acid binding protein gene (FABP2). Nucleic Acids Res 18:7198

Polymeropoulos MH, Xiao H, Rath DS, Merril CR (1991d) Tet-ranucleotide repeat polymorphism at the human tyrosine hy-droxylase gene (TH). Nucleic Acids Res 19:3753

(1991e) Tetranucleotide repeat polymorphism at the hu-man aromatase cytochrome P-450 gene (CYP19). NucleicAcids Res 19:195

Puers C, Hammond HA, Jin L, Caskey CT, Schumm JW (1993)Identification of repeat sequence heterogeneity at the poly-morphic short tandem repeat locus HUMTH01[AATG]n andreassignment of alleles in population analysis by using a locus-specific allelic ladder. Am J Hum Genet 53:953-958

Risch NJ, Devlin B (1992) On the probability of matching DNAfingerprints. Science 255:717-720

Roewer L, Epplen JT (1992) Rapid and sensitive typing of foren-sic stains by PCR amplifications of polymorphic simple repeatsequences in case work. Forensic Sci Int 53:163-171

Sensabaugh GF (1982) Biochemical markers of individuality. In:Saferstein R (ed) Forensic science handbook. Prentice-Hall,Englewood Cliffs, NJ, pp 338-415

Shriver MD, Jin L, Chakraborty R, Boerwinkle E (1993) VNTRallele frequency distributions under the stepwise mutationmodel: a computer simulation approach. Genetics 134:983-993

Sleddens HF, Oostra BA, Brinkmann AO, Trapman J (1992) Tri-nucleotide repeat polymorphism in the androgen receptorgene (AR). Nucleic Acids Res 20:1427

Sokal RR, Rohlf JF (1969) Biometry, 2d ed. WH Freeman, NewYork

Southern EM (1975) Detection of specific sequences amongDNA fragments separated by gel electrophoresis. J Mol Biol98:503-517

Tautz D (1989) Hypervariability of simple sequences as a generalsource for polymorphic DNA markers. Nucleic Acids Res 17:6463-6471

Technical Working Group on DNA Analysis Methods, Califor-nia Association of Criminalists Ad Hoc Committee on DNAQuality Assurance (1991) Guidelines for a quality assuranceprogram for DNA analysis. Crime Lab Dig 18:44-75

Weber JL (1990) Informativeness of human (dC-dA)n-(dG-dT)npolymorphisms. Genomics 7:524-530

WeberJL, May PE (1989) Abundant class of human DNA poly-morphisms which can be typed using the polymerase chain re-action. Am J Hum Genet 44:388-396

Weir BS (1991) Genetic data analysis. Sinauer, Sunderland, MAYasuda N (1968) Estimation of the inbreeding coefficient fromphenotype frequencies by a method of maximum likelihoodscoring. Biometrics 24:915-935

Zuliani G, Hobbs HH (1990) Tetranucleotide repeat polymor-phism in the LPL gene. Nucleic Acids Res 18:4958