Construction of a library of cloned short tandem repeat (STR).pdf

8
Construction of a library of cloned short tandem repeat (STR) alleles as universal templates for allelic ladder preparation Le Wang a, *, Xing-Chun Zhao a, *, Jian Ye a, *, Jin-Jie Liu b , Ting Chen b , Xue Bai a , Jian Zhang a , Yuan Ou a , Lan Hu a , Bo-Wei Jiang c , Feng Wang a a Key Laboratory of Forensic Genetics, Institute of Forensic Science, Ministry of Public Security, Beijing 100038, PR China b School of Forensic Medicine, Shanxi Medical University, Taiyuan 030001, PR China c The First Research Institute of the Ministry of Public Security, Beijing 100048, PR China 1. Introduction During the past two decades, short tandem repeat (STR) genotyping has emerged as the dominant technique for human identity determination and paternity testing [1–4]. Allelic ladders, which serve as standards in STR analysis, are necessary for adjusting for different sizing measurements obtained from different instruments and under different conditions used by various laboratories, which makes allelic ladders an indispensable component of commercial kits and newly developed STR analysis systems. Allelic ladders are prepared on the basis of the polymerase chain reaction (PCR) for which the templates can be obtained by three optional strategies. The first strategy is to cut gels after polyacrylamide gel electrophoresis (PAGE) to collect separated fragments of target alleles [5]. The DNA template thus prepared is not suitable for long-term storage because it is inclined to degrade. Another disadvantage of this strategy is the amount of DNA dissolved from gels is usually very limited. Once the entire DNA template is used, the demanding procedures of PAGE and gel cutting have to be repeated. Additionally, the concentration and purity of prepared DNA templates from different batches of experiments can vary significantly. The second strategy is based on plasmid construction. Each allele is amplified by PCR and inserted into plasmids that can be transformed easily into Escherichia coli [6]. Thus, large amounts of genetically engineered plasmids bearing STR fragments, which are ideal PCR templates for allelic Forensic Science International: Genetics 12 (2014) 136–143 A R T I C L E I N F O Article history: Received 23 February 2014 Received in revised form 7 June 2014 Accepted 11 June 2014 Keywords: Forensic science Short tandem repeat (STR) Allelic ladder Recombinant plasmid Repeat structure A B S T R A C T Short tandem repeat (STR) genotyping methods are widely used for human identity testing applications, including forensic DNA analysis. Samples of DNA containing the length-variant STR alleles are typically separated and genotyped by comparison to an allelic ladder. Here, we describe a newly devised library of cloned STR alleles. The library covers alleles X and Y for the sex-determining locus Amelogenin and 259 other alleles for 22 autosomal STR loci (TPOX, D3S1358, FGA, D5S818, CSF1PO, D7S820, D8S1179, TH01, vWA, D13S317, D16S539, D18S51, D21S11, D2S1338, D6S1043, D12S391, Penta E, D19S433, D11S4463, D17S974, D3S4529 and D12ATA63). New primers were designed for all these loci to construct recombinant plasmids so that the library retains core repeat elements of STR as well as 5 0 - and 3 0 - flanking sequences of 500 base pairs. Since amplicons of commercial STR genotyping kits and systems developed in laboratories are usually distributed from 50 to <500 base pairs, this library could provide universal templates for allelic ladder preparation. We prepared three different sets of allelic ladders for this locus TH01 and an updated version of an allelic ladder for the DNATyper 1 19 multiplex system using these plasmids to confirm the suitability of the library as a good source for allelic ladder preparation. Importantly, the authenticity of each construct was confirmed by bidirectional nucleotide sequencing and we report the repeat structures of the 259 STR alleles. The sequencing results showed all repeat structures we obtained for TPOX, CSF1PO, D7S820, TH01, D16S539, D18S51 and Penta E were the same as reported. However, we identified 102 unreported repeat structures from the other 15 STR loci, supplementing our current knowledge of repeat structures and leading to further understanding of these widely used loci. ß 2014 Elsevier Ireland Ltd. All rights reserved. * Corresponding authors. Tel.: +86 10 66269184; fax: +86 10 63267051. E-mail addresses: [email protected] (L. Wang), [email protected] (X.-C. Zhao), [email protected] (J. Ye). Contents lists available at ScienceDirect Forensic Science International: Genetics jou r nal h o mep ag e: w ww .elsevier .co m /loc ate/fs ig http://dx.doi.org/10.1016/j.fsigen.2014.06.005 1872-4973/ß 2014 Elsevier Ireland Ltd. All rights reserved.

Transcript of Construction of a library of cloned short tandem repeat (STR).pdf

Forensic Science International: Genetics 12 (2014) 136–143

Construction of a library of cloned short tandem repeat (STR)alleles as universal templates for allelic ladder preparation

Le Wang a,*, Xing-Chun Zhao a,*, Jian Ye a,*, Jin-Jie Liu b, Ting Chen b, Xue Bai a, Jian Zhang a,Yuan Ou a, Lan Hu a, Bo-Wei Jiang c, Feng Wang a

a Key Laboratory of Forensic Genetics, Institute of Forensic Science, Ministry of Public Security, Beijing 100038, PR Chinab School of Forensic Medicine, Shanxi Medical University, Taiyuan 030001, PR Chinac The First Research Institute of the Ministry of Public Security, Beijing 100048, PR China

A R T I C L E I N F O

Article history:

Received 23 February 2014

Received in revised form 7 June 2014

Accepted 11 June 2014

Keywords:

Forensic science

Short tandem repeat (STR)

Allelic ladder

Recombinant plasmid

Repeat structure

A B S T R A C T

Short tandem repeat (STR) genotyping methods are widely used for human identity testing applications,

including forensic DNA analysis. Samples of DNA containing the length-variant STR alleles are typically

separated and genotyped by comparison to an allelic ladder. Here, we describe a newly devised library of

cloned STR alleles. The library covers alleles X and Y for the sex-determining locus Amelogenin and 259

other alleles for 22 autosomal STR loci (TPOX, D3S1358, FGA, D5S818, CSF1PO, D7S820, D8S1179, TH01,

vWA, D13S317, D16S539, D18S51, D21S11, D2S1338, D6S1043, D12S391, Penta E, D19S433, D11S4463,

D17S974, D3S4529 and D12ATA63). New primers were designed for all these loci to construct

recombinant plasmids so that the library retains core repeat elements of STR as well as 50- and 30-

flanking sequences of �500 base pairs. Since amplicons of commercial STR genotyping kits and systems

developed in laboratories are usually distributed from 50 to <500 base pairs, this library could provide

universal templates for allelic ladder preparation. We prepared three different sets of allelic ladders for

this locus TH01 and an updated version of an allelic ladder for the DNATyper119 multiplex system using

these plasmids to confirm the suitability of the library as a good source for allelic ladder preparation.

Importantly, the authenticity of each construct was confirmed by bidirectional nucleotide sequencing

and we report the repeat structures of the 259 STR alleles. The sequencing results showed all repeat

structures we obtained for TPOX, CSF1PO, D7S820, TH01, D16S539, D18S51 and Penta E were the same as

reported. However, we identified 102 unreported repeat structures from the other 15 STR loci,

supplementing our current knowledge of repeat structures and leading to further understanding of these

widely used loci.

� 2014 Elsevier Ireland Ltd. All rights reserved.

Contents lists available at ScienceDirect

Forensic Science International: Genetics

jou r nal h o mep ag e: w ww .e lsev ier . co m / loc ate / fs ig

1. Introduction

During the past two decades, short tandem repeat (STR)genotyping has emerged as the dominant technique for humanidentity determination and paternity testing [1–4]. Allelic ladders,which serve as standards in STR analysis, are necessary foradjusting for different sizing measurements obtained fromdifferent instruments and under different conditions used byvarious laboratories, which makes allelic ladders an indispensablecomponent of commercial kits and newly developed STR analysissystems.

* Corresponding authors. Tel.: +86 10 66269184; fax: +86 10 63267051.

E-mail addresses: [email protected] (L. Wang), [email protected]

(X.-C. Zhao), [email protected] (J. Ye).

http://dx.doi.org/10.1016/j.fsigen.2014.06.005

1872-4973/� 2014 Elsevier Ireland Ltd. All rights reserved.

Allelic ladders are prepared on the basis of the polymerasechain reaction (PCR) for which the templates can be obtained bythree optional strategies. The first strategy is to cut gels afterpolyacrylamide gel electrophoresis (PAGE) to collect separatedfragments of target alleles [5]. The DNA template thus prepared isnot suitable for long-term storage because it is inclined to degrade.Another disadvantage of this strategy is the amount of DNAdissolved from gels is usually very limited. Once the entire DNAtemplate is used, the demanding procedures of PAGE and gelcutting have to be repeated. Additionally, the concentration andpurity of prepared DNA templates from different batches ofexperiments can vary significantly. The second strategy is based onplasmid construction. Each allele is amplified by PCR and insertedinto plasmids that can be transformed easily into Escherichia coli

[6]. Thus, large amounts of genetically engineered plasmidsbearing STR fragments, which are ideal PCR templates for allelic

Table 1Primer sequences used for molecular cloning.

Locus Primer Length of

flanking

sequence (bp)

TPOX F: CACATACCCAGCACACACCTG 500

R: CAGGTGTGTGCTGGGTATGTG 522

D3S1358 F: AAATGCTATCTGGCTGAACGT 465

R: GCTGCTGATTTGAATGGGTC 759

FGA F: CATTAGGGTTAGGAAACATTG 500

R: GGCTGAGGGCTCAGAGGTGTG 500

D5S818 F: TTCACTGTGACCTAAGACAA 500

R: TGGGCCTCAGCTTGCTTTGC 500

CSF1PO F: CCAACCCAGGCTTCGAGAAG 500

R: CAGGATTAGAAGCCGTAACTGAA 525

D7S820 F: TCAATCTGGGAATCGAGAAC 500

R: AGGGTTCATGGACCCTAATG 500

D8S1179 F: CGTGCCCAGCTAGCTAAT 536

R: AGACCGTGTCTTGCTCTGT 587

TH01 F: TACCTGGAAATGACACTGCTACAACT 226

R: TCAGGGTCCATGCAAACC 479

vWA F: CCTTTGTCCCAGTCCTGTCC 500

R: GGAGACAGAGATTACATGGGTTAGG 607

D13S317 F: CAGGAAATAGATGGGATC 514

R: CTGCATATGAAGTTACAGTAAG 558

D16S539 F: CATGTCAGCTTCATCCTCTTC 500

R: GAGCACGACTCTCCTTACAAT 564

D18S51 F: AGGCGTGGGGGAACAGGCATTA 500

R: CTTAATTACATGTGTCAAGA 500

D21S11 F: TGATTGTTTTAGTAGTTATAGG 500

R: CAGATCTGTACAGCTGTCCT 500

Amelogenin F: GATCACTGGAGAATAAATGTG –

R: ACACAGGCTTGAGGCCAA –

D2S1338 F: AGTAGTGGTGAAGGGGACAA 500

R: ATGATTCCCAAGGGGCTGTC 500

D6S1043 F: AGTGATCTCATGAAAAGTTC 519

R: CCGCATTAATATGTTGACCTC 500

D12S391 F: AGCCTGGCAACAGAGCCA 501

R: CATAGCAGTATCACTAGACTTG 577

Penta E F: GAAATGATTGAGAACTCTAC 545

R: TTCTGTCGTCTCTCTCATTC 500

D19S433 F: TGCAATGTATAGGTCGTTT 450

R: CCATCTTGAAATTCTTAATA 500

D11S4463 F: AGGATGATTTATCCATTAGGCATTG 620

R: CTTGACTCCTGGAAGAACC 816

D17S974 F: CAGCACTGGGGCTTTAGG 613

R: AAACCCCAATCTTCTCCC 550

D3S4529 F: AAGAAAACTGAAACTGCA 482

R: TAAATGTGCTCAGTCCA 395

D12ATA63 F: CTGCCTCCTGGGTTT 519

R: AACTTCCCTGACCTCCTA 482

L. Wang et al. / Forensic Science International: Genetics 12 (2014) 136–143 137

ladder preparation, can be prepared conveniently from bacterialcultures. The amount of DNA obtained as PCR templates from onebatch of plasmid preparation is more than sufficient for long-termallelic ladder preparation. Prepared DNA plasmids are usually ofhigh quality and can be stored stably and conveniently. The thirdstrategy is to use prepared allelic ladders as PCR templates.Obviously, this strategy would be invalid if we did not haveprepared allelic ladders, which is an unavoidable situation whenwe develop allelic ladders for new STR analysis systems. Further,this strategy tends to introduce contamination and the intra-locusand inter-locus imbalances are easily magnified.

New STR analysis systems are being reported each year. TheSTR loci used by different systems normally overlap, whereasthe primer sets used for the same locus can be different. Allelicladders for the same locus cannot be shared across differentsystems. In this study, we modified the second strategy andconstructed a library of STR alleles. A total of 261 alleles of 22autosomal STR loci and the sex-determining locus Amelogeninwere inserted into plasmids. The inserted fragment included therepeat sequences as well as 50- and 30-flanking regions longenough to cover all the primer binding sites reported for eachlocus. Thus, the library can supply universal templates for allelicladder preparation.

2. Materials and methods

2.1. Samples

Blood samples were collected from healthy Chinese individ-uals, mainly from six provinces of China (Beijing, Hebei,Shandong, Henan, Chongqing and Guangdong), and kept onfilter paper. Written informed consent was given by blooddonors and this work was approved by the Ethical ReviewBoard of the Institute of Forensic Science, Ministry of PublicSecurity of China.

2.2. Amplification for genotyping

Supplementary Table 1 gives the primer sequences used for STRtyping for D11S4463, D17S974, D3S4529 and D12ATA63. PCRswere performed in a total volume of 10 ml containing 20 mM Tris–HCl pH 8.3, 50 mM KCl, 1.6 mM MgCl2, 0.8 mg/ml bovine serumalbumin, 0.2% (v/v) Tween-20, 3.2% (v/v) glycerol, 0.02% (w/v)NaN3, 200 mM each dNTP, 0.32 mM each primer, 1 U of TaqDNApolymerase (Roche) and a 1.0 mm diameter circle of storage paper.For the other 19 loci, genotyping was done with the DNATyper119multiplex system (developed by our institute and availablecommercially in China; see Supplementary Table 2 for furtherinformation). Amplification was done with the GeneAmp1 9700thermal cycler (Applied Biosystems, USA). Pre-PCR denaturationoccurred at 95 8C for 11 min. This was followed by 28 cycles ofdenaturing at 94 8C for 30 s, annealing at 59 8C for 2 min, extensionat 72 8C for 1 min and a final extension step at 60 8C for 60 min.

2.3. Electrophoresis, detection and analysis

A 1 ml sample of PCR products was added to 10 ml ofdeionized formamide containing an internal size standard.Samples were denatured for 3 min at 95 8C followed by coolingon ice for 10 min. All samples were separated on the 3100Genetic Analyzer (Applied Biosystems) using POPTM-7 polymer(Applied Biosystems) and a 36 cm capillaryTM (Applied Biosys-tems). All samples were injected for 10 s at 3 kV. The PCRproducts were separated at 15 kV at a run temperature of 60 8C.Initial fragment sizing and allele calling were done withGeneMappler1ID v3.2 (Applied Biosystems) with the peak

amplitude threshold set at 50 relative fluorescence units (RFUs)for all colors.

2.4. Plasmid construction

Target fragments (including core repeats and flanking regions)were amplified directly from blood samples kept on filter paperusing the primers given in Table 1. Homozygous blood sampleswere used as PCR templates with higher priority, whereas theheterozygous samples could be the choice for some alleles withrelatively low frequency. For heterozygous samples, differentalleles could be separated when single bacterial clones were pickedand cultured to yield plasmids and the allele of interest selected viagenotyping experiments. PCR was done in a total volume of 50 mlcontaining 20 mM Tris–HCl pH 8.3, 50 mM KCl, 1.6 mM MgCl2,0.8 mg/ml bovine serum albumin, 0.2% Tween-20, 3.2% glycerol,0.02% NaN3, 200 mM each dNTP, 0.5 mM each primer, 5 U ofTaqDNA polymerase (Roche) and two 1.2 mm diameter circles ofstorage paper. Thermal cycling parameters were similar to thosedescribed above, except the denaturing–annealing–extensioncycle was repeated 40 times. Amplified DNA was subjected to

Fig. 1. Diagram for molecular cloning of STR alleles. Core repeat elements are

represented by small green boxes and the 50- and 30-flanking sequences are shown

in light blue. In all, 259 STR fragments were inserted into the pMD18-T vector.

L. Wang et al. / Forensic Science International: Genetics 12 (2014) 136–143138

agarose electrophoresis, recovered from gels, inserted into thepMD-18T vector (TaKaRa) and transformed into JM109 competentcells. Positive clones were confirmed by STR genotyping andbidirectional nucleotide sequencing.

2.5. Preparation of allelic ladders

Purified plasmids were digested with SalI (TaKaRa) and EcoRI(TaKaRa) for 1 h then incubated at 95 8C for 5 min to denature andpermanently inactivate the restriction enzymes. For each locus,digested DNA for single alleles was mixed, diluted, amplified, re-analyzed and balanced to produce a single ladder for each locus,which were mixed, balanced and purified while concentrated asthe final allelic ladder for a multiplex STR system.

3. Results

3.1. Molecular cloning of STR alleles

We collected 259 alleles for 22 autosomal STR loci and clonedthem into the pMD-18T vector (Fig. 1). Similarly, we constructedrecombinant plasmids for the X and Y alleles for the sex-determining locus Amelogenin. Primers used for cloning are givenin Table 1. Instead of using any reported primer sequence or theprimers we used for genotyping in this work, we redesignedprimers for all 23 loci. For each STR locus, both forward and reverseprimer binding sites were selected far away (�500 base pairs) fromrepeat sequences. Thus, the amplified fragments preserved longerflanking sequences on both sides of the core repeating elements(Fig. 1). Smaller PCR products are usually pursued in developingSTR typing systems. For example, Mini-STR systems are goodsolutions for degraded DNA sample typing [6]. By contrast,amplification of longer fragments would suffer technical difficul-ties in several aspects of genotyping; e.g. amplification of larger

Fig. 2. Allelic ladders for the TH01 locus. Nine recombinant plasmids were used as PCR tem

PowerPlex 16 kit [7] and (C) published work [8].

fragments tends to be inhibited by PCR inhibitors. To ourknowledge, almost all STR analysis systems, including commercialkits and those described in the literature, constrain the size of PCRproducts to <500 base pairs. The fragments we cloned shouldcover all primer binding sites for those STR analysis systems. Inother words, the recombinant plasmids we constructed could be alibrary of universal templates for allelic ladder preparation.

The sex-determining locus Amelogenin is not an STR. The Xchromosome version of the Amelogenin gene contains a 6 bpdeletion compared to the Y chromosome version. We also cloned a539 bp conserved fragment of the Y version and its counterpart ofthe X chromosome into the pMD18-T vector (Supplementary Fig. 1).

3.2. Preparation of allelic ladders

To confirm that the recombinant plasmids we constructed inthis work were universal templates for allelic ladder preparation,we took TH01 as an example, and prepared three different setsof allelic ladders for this locus using the same recombinantplasmids as PCR template (Fig. 2). In detail, the three sets of

plates. Primer sequences were adopted from (A) the DNATyper119 system, (B) the

Fig. 3. Allelic ladder for the DNATyper119 system prepared using the recombinant plasmids as template.

L. Wang et al. / Forensic Science International: Genetics 12 (2014) 136–143 139

TH01 primer sequences were adopted from the DNATyper119system, the PowerPlex 16 kit [7] and published work [8],respectively. For the convenience of comparison, all three sets ofprimers in Fig. 2 were labeled with the same fluorescent dyecarboxytetramethylrhodamine (TAMRA), but this allelic libraryis suitable for preparing ladders labeled with various fluorescentdyes (data not shown).

In addition, we used 217 of the 261 plasmids and prepared anupdated allelic ladder for the DNATyper119 multiplex system(Fig. 3). Compared to the old version (Supplementary Fig. 2)prepared by the traditional gel-cutting strategy, the backgroundof the updated allelic ladder is clear. Thanks to the convenienceof plasmid concentration determination, balances among alleles(both intra-locus and inter-locus balance) can be achievedeasily.

3.3. Repeat structure analysis of cloned STR alleles

We sequenced all 261 recombinant plasmids bidirectionally tostudy the repeat structures of each allele. The sequencing resultsshowed that all repeat structures we obtained for TPOX, CSF1PO,D7S820, TH01, D16S539, D18S51 and Penta E were the same asreported; however, we identified 102 unreported repeat structuresfrom 15 STR loci. Representative results of the repeat structures forthe alleles cloned are summarized in Table 2. See SupplementaryTable 3 for the entire summary of repeat structures.

Although alleles 12 and 20 of D3S1358 are present incommercial allelic ladders [9], their repeat structures have notbeen reported previously. We identified a new repeat structurefor allele 19, which contains one less [TCTG] unit but one more[TCTA] unit compared to the previously reported repeatstructure.

For the FGA locus, we identified a repeat structure for allele 13,which is a rare allele and is absent from commercial allelic ladders.New variations for alleles 29 and 30 are given in Table 2.

The core sequence for D5S818 is strictly [AGAT] repeatsaccording to sequenced alleles [10]. We confirmed the repeatstructure of allele 16 is [AGAT]16 as expected.

The core sequence for D8S1179 is composed of [TCTA] and[TCTG] repeats. We discovered new arrangements of [TCTA] and[TCTG] units for alleles 12, 15 and 17.

The repeat structure of allele 13 for the vWA locus weidentified was different from all three reported structures[11,12], but it resembled the repeat structure of allele 14, whichwas first reported by Brinkmann et al. [12] and confirmed byour work. The only difference was that allele 13 possessed oneless [TCTA] unit compared to allele 14. The repeat structure forallele 21 described here is new, which is the first case of a vWAallele containing as many as five consecutive [TCTG] repeats inits core sequence.

The repeat structure of the D13S317 locus was intriguing.According to Lins et al. [10], the D13S317 allele N contains N

[TATC] repeats. However, our sequencing results led to a deeperunderstanding of this locus and showed D13S317 allele N mightcontain N, N + 1 or N + 2 [TATC] repeats, depending on whether2, 1 or 0 [AATC] units followed, which we named the [AATC] tail.Fig. 4 shows the 50- and 30-flanking sequences are highlyconserved and the core repeats at positions 1–14 are composedof variable number of [TATC] repeats. However, the sequenceelements at positions 15 and 16 could be [TATC] or [AATC]; as aresult, some D13S317 alleles might contain one or two more[TATC] repeats than expected. Genotyping of all the alleles wasdone with the DNATyper119 system and confirmed with theIdentifilerTM kit.

The repeat structure of D21S11 can be summarized as:[TCTA]m[TCTG]n[TCTA]3TA[TCTA]3TCA[TCTA]2TCCATA[TCTA]p

where m, n and p are integrals. Different combinations of m, n and p

with the same sum are indistinguishable in DNA profiling and areconsidered as the same allele. Owing to this complexity, more thanone repeat structure has been reported for each allele of D21S11.

Table 2Representative results of the repeat structures for the alleles cloned.

Locus Allele Repeat structure in this work Repeat structure reported earlier Reference

D3S1358 12 TCTA[TCTG]2[TCTA]9 No report

19 TCTA[TCTG]2[TCTA]16 TCTA[TCTG]3[TCTA]15 [16]

20 TCTA[TCTG]3[TCTA]16 No report

FGA 13 [TTTC]3TTTTTTCT[CTTT]5CTCC[TTCC]2 No report

29 [TTTC]3TTTTTTCT[CTTT]21CTCC[TTCC]2 [TTTC]3TTTTTTCT[CTTT]15CCTT[CTTT]5CTCC[TTCC]2 [17]

30 [TTTC]3TTTTTTCT[CTTT]1CTCT[CTTT]20CTCC[TTCC]2 [TTTC]3TTTTTTCT[CTTT]16CCTT[CTTT]5CTCC[TTCC]2 [17]

D5S818 16 [AGAT]16 No report

D8S1179 12 [TCTA]1[TCTG]1[TCTA]10 [TCTA]12 [18]

15 [TCTA]2[TCTG]1[TCTA]12 [TCTA]1[TCTG]1 [TCTA]13 [18]

17 [TCTA]2[TCTG]1 [TCTA]14 [TCTA]2[TCTG]2 [TCTA]13 [18]

vWA 13 [TCTA]1[TCTG]1[TCTA]1[TCTG]4[TCTA]2

TCCA[TCTA]3TCCATCCA

[TCTA]2[TCTG]4[TCTA]3TCCA[TCTA]3 [11]

[TCTA]1[TCTG]4[TCTA]8TCCATCTA [12]

[TCTA]1[TCTG]4[TCTA]10 [12]

21 [TCTA]1[TCTG]5[TCTA]15TCCATCTA [TCTA]1[TCTG]4[TCTA]16TCCATCTA [12]

D13S317 5 [TATC]5AATC AATC No report

8 [TATC]8AATC AATC [TATC]8 [10]

9 [TATC]9AATC AATC [TATC]9 [10]

10 [TATC]12 [TATC]10 [10]

[TATC]10AATC [10]

11 [TATC]12AATC [TATC]11 [10]

12 [TATC]13AATC [TATC]12 [10]

13 [TATC]14AATC [TATC]13 [10]

14 [TATC]15AATC [TATC]14 [10]

D21S11 26 [TCTA]6[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]7

[TCTA]4[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]8

[19]

28 [TCTA]5[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]10

[TCTA]4[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]10

[19]

[TCTA]5[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]9

[20]

30.3 [TCTA]6[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]5TCA[TCTA]6

No report

31 [TCTA]4[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]13

[TCTA]5[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]12

[11]

[TCTA]6[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]12

[19]

[TCTA]6[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]11

[20]

[TCTA]7[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]11

[21]

31.3 [TCTA]7[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]5TCA[TCTA]6

No report

33 [TCTA]9[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]11

[TCTA]5[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA [TCTA]14

[20]

34 [TCTA]8[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]13

[TCTA]10[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]11

[22]

[TCTA]5[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]15

[20]

35 [TCTA]9[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]13

[TCTA]11[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]11

[22]

[TCTA]10[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]12

[11]

36 [TCTA]9[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]14

[TCTA]10[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]13

[22]

[TCTA]10[TCTG]6[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]12

[22]

[TCTA]11[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]12

[11]

37 [TCTA]9[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]15

[TCTA]9[TCTG]11[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]12

[22]

[TCTA]11[TCTG]5[TCTA]3TA[TCTA]3TCA

[TCTA]2TCCATA[TCTA]13

[11]

D6S1043 9 [AGAT]9 No report

10 [AGAT]10 No report

14 [AGAT]14 No report

16 [AGAT]10[ACAT]1[AGAT]5 No report

17 [AGAT]11[ACAT]1[AGAT]5 No report

18 [AGAT]12[ACAT]1[AGAT]5 [AGAT]6[ACAT]1[AGAT]11 [13]

18.2 [AGAT]12AT[ACAT]1[AGAT]5 No report

19 [AGAT]13[ACAT]1[AGAT]5 No report

20 [AGAT]13GGAT[ACAT]1[AGAT]5 [AGAT]6[ACAT]1[AGAT]1[ACAT]1[AGAT]11 [13]

20.3 [AGAT]12GAT[AGAT]2[ACAT]1[AGAT]5 No report

L. Wang et al. / Forensic Science International: Genetics 12 (2014) 136–143140

Table 2 (Continued )

Locus Allele Repeat structure in this work Repeat structure reported earlier Reference

21 [AGAT]15[ACAT]1[AGAT]5 No report

21.3 [AGAT]13GAT[AGAT]2[ACAT]1[AGAT]5 [AGAT]6[ACAT]1[AGAT]2AGT[AGAT]12 [13]

23 [AGAT]17[ACAT]1[AGAT]5 No report

D12S391 14 [AGAT]7[AGAC]6[AGAT]1 No report

16 [AGAT]8[AGAC]7[AGAT]1 [AGAT]9[AGAC]6[AGAT]1 [14]

19 [AGAT]11[AGAC]8 [AGAT]12[AGAC]6[AGAT]1 [14]

19.3 [AGAT]2GAT[AGAT]9[AGAC]7[AGAT]1 [AGAT]5GAT[AGAT]7[AGAC]6[AGAT]1 [23]

22 [AGAT]13[AGAC]8[AGAT]1 [AGAT]15[AGAC]6[AGAT]1 [14]

[AGAT]12[AGAC]10 [14]

26 [AGAT]16[AGAC]10 [AGAT]17[AGAC]8[AGAT]1 [14]

[AGAT]17[AGAC]9 [14]

27 [AGAT]18[AGAC]8[AGAT]1 No report

D19S433 4 [AAGG]1[AAAG]1[AAGG]1[TAGG]1[AAGG]2 No report

6.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]5 No report

8 [AAGG]1[AAAG]1[AAGG]1[TAGG]1[AAGG]6 No report

11.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]10 No report

12.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]11 No report

13.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]12 No report

14.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]13 No report

15.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]14 No report

16.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]15 No report

17.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]16 No report

18.2 [AAGG]1AA[AAGG]1[TAGG]1[AAGG]17 No report

D17S974 13 [CTAT]13 No report

Fig. 4. Sequence alignment of the D13S317 alleles. The [TATC] repeat units are highlighted with an alternating yellow and green background, whereas [AATC] tails have a

magenta background. The highly conserved 50- and 30-flanking sequences have a blue background. Repeat units are indicated with red numbers.

L. Wang et al. / Forensic Science International: Genetics 12 (2014) 136–143 141

L. Wang et al. / Forensic Science International: Genetics 12 (2014) 136–143142

Our sequencing results provide new repeat structures for alleles26, 28, 31, 33, 34, 35, 36 and 37, extending our basic knowledge ofthis locus. We also provide sequence details for microvariantalleles 30.3 and 31.3, in which a repeat structure of a TCAtrinucleotide is inserted into the [TCTA]p unit at the end of therepeat core and separate this [TCTA]p unit into two non-neighboring [TCTA] repeats.

Repeat structures for some alleles of D6S1043 were reportedby Phillips et al. [13]. Our sequencing results agree well withtheir work for alleles 11, 12, 13 and 15, but differ significantly foralleles 18, 20 and 21.3. We repeated and checked our DNAprofiling and sequencing results, further confirming the datagiven in Table 2. This discordance might be owing to thecollection of samples from different areas of the world. Furtherwork, perhaps by a third research group, is required to fullyaddress this issue.

For D12S391, repeat structures for alleles from 15–26 havebeen reported [14]. Our sequencing results identified new repeatstructures for alleles 16, 19, 19.3, 22 and 26, and released repeatstructures for rare alleles 14 and 27, which exhibit patterns similarto other alleles.

Among all the loci analyzed in this study, D19S433 hasthe greatest number of x.2 microvariant alleles. We discoveredthat all these alleles contain an [AA] dinucleotide insertion afterthe first [AAGG] repeat. Surprisingly, we discovered a rare allelewith the repeat [AAGG]1[AAAG]1[AAGG]1[TAGG]1[AAGG]2,which should be named allele 4 according to the DNA profilingresult and the naming recommendations of the InternationalSociety of Forensic Genetics [15]. The most condensedallele discovered before this study was allele 5.2. This resultexpanded our understanding of this locus significantly,and should be taken into consideration in DNA profilingpractices.

For loci D2S1338, D3S4529, D12ATA63 and D17S974, the repeatstructures are generally accepted to consist of[TGCC][TTCC](GTCC[TTCC]), [ATCT]ATTT[ATCT], [TAA][CAA] and[CTAT] repeats, respectively. However, detailed sequences forevery allele except those among the GenBank sequences remain tobe elucidated (see Table 2). Further, we identified allele 13 as a newmember of the D17S974 allelic family.

4. Discussion

This work demonstrated a modified method of allelic ladderpreparation dependent on a novel library of STR alleles. Althoughthe library covers the X and Y alleles for Amelogenin and 259alleles of 22 widely used autosomal loci, further expansion of thelibrary for other STR loci is currently undergoing. The plasmidsconstructed here bear long flanking sequences on both sides of corerepeats, so the library is not tailored for a single primer set but issuitable for various primers in different multiplex systems. Morethan 100 new repeat structures from 15 STR loci were identified inthis work, extending our knowledge of these loci substantially.Specifically, we discovered the repeat structure for D13S317 alleleswas more complicated than expected. One or two [AATC] units,named here as [AATC] tail, might be present or absent following[TATC] repeats (Fig. 4). Some rare alleles were identified, includingalleles 14 and 27 of D12S391, allele 4 of D19S433 and allele 13 ofD17S974. These rare alleles would contribute to accurategenotyping. It would be highly informative and helpful forindividual identification if arrestees from criminal cases weregenotyped and found to bear such rare alleles.

Next generation sequencing (NGS) technologies are rapidlyevolving and slowly being adopted by forensic laboratories.As such, it is likely that STR typing will be performed by NGS inthe future [24]. The repeat structures reported in this work

could be used as reference sequences to assist NGS-based STRdata analysis, and our recombinant plasmids have the potentialof being used as reference materials to verify concordancebetween capillary electrophoresis generated and NGS generatedSTR data.

Acknowledgements

The authors thank Dr. John M. Butler for critical discussion andreading of the manuscript and Dr. Wanli Bi for technical assistance.This study was supported, in part, by grants 2012JB001,2013JBYY009 and 2014JB001 to the Institute of Forensic Science,Ministry of Public Security of China and, in part, by grant2013GABJC035 to the Ministry of Public Security of China.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.fsigen.2014.06.005.

References

[1] T.A. Brettell, J.M. Butler, R. Saferstein, Forensic science, Anal. Chem. 77 (12) (2005)3839–3860.

[2] T.A. Brettell, J.M. Butler, J.R. Almirall, Forensic science, Anal. Chem. 79 (12) (2007)4365–4384.

[3] T.A. Brettell, J.M. Butler, J.R. Almirall, Forensic science, Anal. Chem. 81 (12) (2009)4695–4711.

[4] T.A. Brettell, J.M. Butler, J.R. Almirall, Forensic science, Anal. Chem. 83 (12) (2011)4539–4556.

[5] B. Glock, D.W. Schwartz, E.M. Schwartz-Jungl, W.R. Mayr, Sequence determina-tion of an allele ladder for the STR polymorphism at the CD4 locus and applicationof the ladder in testing an Austrian Caucasian population sample, Forensic Sci. Int.78 (2) (1996) 125–130.

[6] X. Bai, S. Li, B. Cong, X. Li, X. Guo, L. He, et al., Construction of two fluorescence-labeled non-combined DNA index system miniSTR multiplex systems to analyzedegraded DNA samples in the Chinese Han Population, Electrophoresis 31 (17)(2010) 2944–2948.

[7] B.E. Krenke, A. Tereba, S.J. Anderson, E. Buel, S. Culhane, C.J. Finis, et al., Valida-tion of a 16-locus fluorescent multiplex system, J. Forensic Sci. 47 (4) (2002)773–785.

[8] M.H. Polymeropoulos, H. Xiao, D.S. Rath, C.R. Merril, Tetranucleotide repeatpolymorphism at the human tyrosine hydroxylase gene (TH), Nucleic AcidsRes. 19 (13) (1991) 3753.

[9] M.G. Ensenberger, C.R. Hill, R.S. McLaren, C.J. Sprecher, D.R. Storts, Developmentalvalidation of the PowerPlex((R)) 21 system, Forensic Sci. Int. Genet. 9 (2014)169–178.

[10] A.M. Lins, K.A. Micka, C.J. Sprecher, J.A. Taylor, J.W. Bacher, D.R. Rabbach, et al.,Development and population study of an eight-locus short tandem repeat (STR)multiplex system, J. Forensic Sci. 43 (6) (1998) 1168–1180.

[11] R.A. Griffiths, M.D. Barber, P.E. Johnson, S.M. Gillbard, M.D. Haywood, C.D. Smith,et al., New reference allelic ladders to improve allelic designation in a multiplexSTR system, Int. J. Legal Med. 111 (5) (1998) 267–272.

[12] B. Brinkmann, A. Sajantila, H.W. Goedde, H. Matsumoto, K. Nishi, P. Wiegand,Population genetic comparisons among eight populations using allele frequencyand sequence data from three microsatellite loci, Eur. J. Hum. Genet. 4 (3) (1996)175–182.

[13] C. Phillips, S. Kind, L. Fernandez-Formoso, M. Gelabert-Besada, A. Carracedo, M.V.Lareu, Global population variability in Promega PowerPlex CS7, D6S1043, andPenta B STRs, Int. J. Legal Med. 127 (5) (2013) 901–906.

[14] M.V. Lareu, M.C. Pestoni, F. Barros, A. Salas, A. Carracedo, Sequence variation of ahypervariable short tandem repeat at the D12S391 locus, Gene 182 (1–2) (1996)151–153.

[15] W. Bar, B. Brinkmann, B. Budowle, A. Carracedo, P. Gill, P. Lincoln, et al., DNArecommendations. Further report of the DNA commission of the ISFH regardingthe use of short tandem repeat systems. International Society for ForensicHaemogenetics, Int. J. Legal Med. 110 (4) (1997) 175–176.

[16] E. Mornhinweg, C. Luckenbach, R. Fimmers, H. Ritter, D3S1358: sequence analysisand gene frequency in a German population, Forensic Sci. Int. 95 (2) (1998) 173–178.

[17] M.D. Barber, B.J. McKeown, B.H. Parkin, Structural variation in the alleles of a shorttandem repeat system at the human alpha fibrinogen locus, Int. J. Legal Med. 108(4) (1996) 180–185.

[18] M.D. Barber, B.H. Parkin, Sequence analysis and allelic designation of the twoshort tandem repeat loci D18S51 and D8S1179, Int. J. Legal Med. 109 (2) (1996)62–65.

[19] A. Moller, E. Meyer, B. Brinkmann, Different types of structural variation in STRs:HumFES/FPS, HumVWA and HumD21S11, Int. J. Legal Med. 106 (6) (1994)319–323.

L. Wang et al. / Forensic Science International: Genetics 12 (2014) 136–143 143

[20] H.G. Zhou, K. Sato, Y. Nishimaki, L. Fang, H. Hasekura, The HumD21S11 system ofshort tandem repeat DNA polymorphisms in Japanese and Chinese, Forensic Sci.Int. 86 (1–2) (1997) 109–118.

[21] D.W.M. Schwartz, E.M. Dauber, B. Glock, W.R. Mayr, AMPFLP-typing of theD21S11 microsatellite polymorphism: allele frequencies and sequencingdata in the Austrian population, Adv. Forensic Haemogenet. 6 (1996) 622–625.

[22] B. Brinkmann, E. Meyer, A. Junge, Complex mutational events at the HumD21S11locus, Hum. Genet. 98 (1) (1996) 60–64.

[23] M.J. Farfan, P. Sanz, M.V. Lareu, A. Carracedo, Population data on the D1S1656 andD12S391 STR loci in Andalusia (south Spain) and the maghreb (north Africa),Forensic Sci. Int. 104 (1) (1999) 33–36.

[24] M. Scheible, O. Loreille, R. Just, J. Irwin, Short tandem repeat typing on the 454platform: strategies and considerations for targeted sequencing of commonforensic markers, Forensic Sci. Int. Genet. (2014), http://dx.doi.org/10.1016/j.fsi-gen.2014.04.010.