AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator....

29
Appendix A. Analysis procedure 1. Import the data into STR-validator. 2. Create histograms of number of peaks by size (click the ’Plot’ button in the ’Distributions’ group located in the ’Result’ tab. Select the dataset and ’Size’ column). This can identify important artefacts. 3. Verify that the chosen analytical threshold is not set too low. The instrument noise varies during the lifetime of the capillary array, and extraction negative controls are usually more noisy than PCR negative control samples. Therefore the AT can be misjudged if only looking on a few samples. Follow the steps out-lined below to verify if a suitable AT has been used. (a) If there are non-unique sample names in the dataset (e.g. ’Neg’), but they were imported from different files, sample names can be made unique by combining sample name with file name: click the ’Columns’ button in the ’Tools’ tab. Select the dataset. Select ’Sample.Name’ as column 1 and ’File.Name’ as column 2. The column for new values should be ’Sample.Name’ and the action should be ’&’ to concatenate the values. (b) Click the ’Height’ button in the ’Tools’ tab. Select the dataset and make sure ’Replace NA with 0’ is checked, and ’Add re- sult to dataset’ and ’Exclude values in ’Allele’ column’ are both unchecked. Click to calculate. (c) Click the ’Edit’ button available in all tabs. Select the dataset con- taining the calculated height values. Make sure the ’Limit number of rows to’ option is unchecked. Then right click the ’Peaks’ table heading and select to ’Sort by column (decreasing)’ and ’Sort by column (increasing)’ respectively. This will sort the data accord- ing to number of peaks. Note the top and bottom ranked samples by e.g. copying the table to a spread-sheet software. (d) Zero peaks may be caused by size standard error. If so the samples should be removed from the dataset. Ideally, the sizing quality flag should be exported from GeneMapper. Then samples that have failed could be removed from the dataset using the ’Crop’ function. (e) High number of peaks may indicate noise peaks above the ana- lytical threshold. Check the EPG’s of the respective samples and verify if noise has been labelled. If the top ranked samples are all 1

Transcript of AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator....

Page 1: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Appendix A. Analysis procedure

1. Import the data into STR-validator.2. Create histograms of number of peaks by size (click the ’Plot’ button in

the ’Distributions’ group located in the ’Result’ tab. Select the datasetand ’Size’ column). This can identify important artefacts.

3. Verify that the chosen analytical threshold is not set too low. Theinstrument noise varies during the lifetime of the capillary array, andextraction negative controls are usually more noisy than PCR negativecontrol samples. Therefore the AT can be misjudged if only looking ona few samples. Follow the steps out-lined below to verify if a suitableAT has been used.(a) If there are non-unique sample names in the dataset (e.g. ’Neg’),

but they were imported from different files, sample names can bemade unique by combining sample name with file name: click the’Columns’ button in the ’Tools’ tab. Select the dataset. Select’Sample.Name’ as column 1 and ’File.Name’ as column 2. Thecolumn for new values should be ’Sample.Name’ and the actionshould be ’&’ to concatenate the values.

(b) Click the ’Height’ button in the ’Tools’ tab. Select the datasetand make sure ’Replace NA with 0’ is checked, and ’Add re-sult to dataset’ and ’Exclude values in ’Allele’ column’ are bothunchecked. Click to calculate.

(c) Click the ’Edit’ button available in all tabs. Select the dataset con-taining the calculated height values. Make sure the ’Limit numberof rows to’ option is unchecked. Then right click the ’Peaks’ tableheading and select to ’Sort by column (decreasing)’ and ’Sort bycolumn (increasing)’ respectively. This will sort the data accord-ing to number of peaks. Note the top and bottom ranked samplesby e.g. copying the table to a spread-sheet software.

(d) Zero peaks may be caused by size standard error. If so the samplesshould be removed from the dataset. Ideally, the sizing qualityflag should be exported from GeneMapper. Then samples thathave failed could be removed from the dataset using the ’Crop’function.

(e) High number of peaks may indicate noise peaks above the ana-lytical threshold. Check the EPG’s of the respective samples andverify if noise has been labelled. If the top ranked samples are all

1

Page 2: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

cleared from suspicion, continue with the next step. If noise waslabelled the AT should be adjusted, or alternatively the few un-usually noisy controls samples could be removed from the dataset.

4. Identify possible spikes using the function opened by clicking the ’Cal-culate’ button in the ’Drop-in tools’ group under the ’Result’ tab. Thefunction searches for peaks of similar size across dyes. By default pos-sible spikes are flagged if the number of peaks of similar size are equalto or more than the number of dyes minus one. This is to allow for oneunlabelled spike (outside the defined markers). The identified spikesshould ideally be manually verified by inspecting the EPG.

5. Remove the spikes using the ’Filter’ button in the ’Drop-in tools’ groupunder the ’Result’ tab. Select the dataset to remove spikes from in thefirst dropdown list, and the list of possible spikes in the second drop-down list. To remove spikes is especially important prior to classifyingresults based on number of peaks. Spikes can make samples fall into thewrong bin e.g. if both a drop-in and a spike, labelled across the dyes, ispresent then the sample will not be classified as ’drop-in contamination’if the rule is maximum two peaks.

6. Remove off-ladder alleles (labelled as ’OL’) from the dataset. Use the’Crop’ function in the ’Tools’ tab. The assumption is that most off-ladder alleles does not represent a true drop-in allele.

7. Identify likely artefacts using the function opened by clicking the ’Cal-culate’ button in the ’Drop-in tools’ group under the ’Result’ tab. Thefunction calculates the proportion of each unique allele in each markerto the total number of observations.(a) To characterize size and peak height of a specific artefact first

open the result generated in step 7. This table contain the meanand range for the size and peak height.

(b) To enable plotting of the distributions, use the ’Crop’ functionto extract the data for the artefact from the dataset generated instep 6. This is done in two steps 1) with target column ’Marker’discard values ’not equal to’ the marker of interest, 2) with targetcolumn ’Allele’ discard values ’not equal to’ the allele of interest.Save the resulting dataset.

(c) Click the ’plot’ button located in the ’Distributions’ group on’Result’ tab. Select the dataset for the artefact of interest. Selectthe ’Height’ or ’Size’ column. The cumulative density function

2

Page 3: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

(CDF), probability density function (PDF), or histograms (His-togram) can be plotted. A box-plot can be overlayed. The datacan be transformed to logarithms. Further options are found inexpandable groups.

8. Remove the likely artefacts using the ’Filter’ button in the ’Drop-intools’ group under the ’Result’ tab. Select the dataset to remove arte-facts from in the first dropdown list, and the ranked list of allele obser-vations in the second dropdown list. To remove artefacts is especiallyimportant prior to classifying results based on number of peaks. Arte-facts can make samples fall into the wrong bin e.g. if both drop-ins andartefacts are present then the sample may not be classified as ’drop-incontamination’ if the maximum number of drop-in peaks are exceeded.

9. Estimate the drop-in limit by following the two step process outlinedbelow:(a) First the number of peaks in each sample must be determined.

The easiest way to do this is to use the function located in the’Number of Peaks’ group on the ’Result’ tab. Click ’Calculate’ toopen the function. Select the dataset generated in step 8 and theoption to count peaks by sample. The group labels and definitionsare not important in this step.

(b) Open the ’Plot contamination’ function in the ’Drop-in tools’group to create a plot of the observed and expected number ofpeaks per profile with the fitted Poisson distribution overlayed.The point when the drop-in peaks no longer fit the Poisson dis-tribution (independence model) differentiate between drop-in con-tamination and gross contamination.

10. Group samples into result categories based on number of peaks us-ing the ’Calculate’ button in ’Number of peaks’ group found on the’Result’ tab. Select the option to count peaks by sample. Use the de-fault group labels of ’No contamination’, ’Drop-in contamination’, and’Gross contamination’. Define the bins as ’0,x’, where x is the drop-inlimit from step 9. Review the data according to step 14. Data maycontain obvious outliers, which can be artefacts.

11. Check data for artefacts. Make a list of identified drop-in peaks usingthe ’Crop’ function: 1) Select the dataset from step 10, the target col-umn ’Group’, and discard values not equal to "Drop-in contamination"(specify type character). Then select the target column ’Height’, anddiscard values that is NA. Save the dataset. 2) Open the dataset and

3

Page 4: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

sort the list by height in descending order. Manually check detecteddrop-in peaks in the EPG to verify that it is not an artefact (e.g. spike).Start with the tallest peak and work your way down the list to check atleast ten peaks. Sort the list in ascending order and manually check thesmallest peaks. The majority of detected drop-in peaks should appearas clearly defined single peaks. Any remaining artefact will be in thedrop-in zone and will most likely have a small effect on the peak heightdistribution. Ideally all suspected drop-ins should be re-amplified toverify that the peaks are not reproducible.

12. If artefacts were found in the previous step, they can be removed usingthe ’Crop’ function: If the artefact has a unique height, discard thespecific height (replace with ’NA’ cannot be used since this will leavethe size value unchanged and thus included into the size distribution).If the peak height is not unique circumvent the problem by first editthe peak height manually to a unique value using the ’Edit’ function.After removing artefacts repeat from step 10.

13. Calculate the proportion of negative controls with drop-in contamina-tion. Use the ’Plot’ button in the ’Number of peaks’ group found onthe ’Result’ tab and select the dataset from step 10.

14. Plot the probability density function (PDF) of the peak height of thedrop-in events using the ’Plot’ button in the ’Distributions’ group foundon the ’Result’ tab. Select the dataset from step 10, the group ’Drop-incontamination’, and the column ’Height’. Click the ’PDF’ button toplot.

15. Plot the probability density function of the fragment size of the drop-in events using the ’Plot’ button in the ’Distributions’ group found onthe ’Result’ tab. Select the dataset from step 10, the group ’Drop-incontamination’, and the column ’Size’. Click the ’PDF’ button to plot.

16. Calculate the probability of drop-in by: 1) Determine the total numberof observed drop-in peaks n e.g. from the list created in step 11. Usu-ally amelogenin is excluded from the calculation (tip: sort the list byMarker). 2) Determine the total number of controls evaluated N e.g.from the plot created in step 13. Calculate p(C) = n/N ∗ L, where Lis the number of loci evaluated (remember to adjust if amelogenin wasremoved). If a higher AT is used for routine analysis than the one usedto collect the drop-in data, use the ’Crop’ function to discard valuesfrom the list created in step 11 based on the ’Height’ column (NB! Theprocess is more complicated if dye specific ATs are used. Also, it is im-

4

Page 5: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

portant to select numeric as the data type). Calculate the probabilityof drop-in using the new n.

5

Page 6: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Appendix B. Additional figures

0

250

500

750

100 200 300 400Fragment size (bp)

Cou

nt

Histogram (1890 observations)

Figure B.9: Histogram of number of peaks by fragment size in PCR and extraction negativecontrols (bin width=1). Raw data at AT=57 RFU, including off-ladder peaks, artefacts,contamination, and any spikes present.

1

Page 7: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

0

250

500

750

100 200 300 400Fragment size (bp)

Cou

nt

Histogram (1832 observations)

Figure B.10: Histogram of number of peaks by fragment size in PCR and extraction neg-ative controls (bin width=1). Data at AT=57 RFU. Possible spikes, have been removed.

2

Page 8: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

●● ●● ●●

0.000

0.025

0.050

0.075

0.100

0.125

30 40 50 60Peak height (RFU)

Den

sity

Probability density function (304 observations)

Figure B.11: Peak height distribution of likely artefact called as allele 6 in D8S1179. Themean peak height is 34.75 RFU with a maximum height of 66 RFU. The size range is199.06 to 199.58 bp.

3

Page 9: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

0

5

10

15

100 200 300 400Fragment size (bp)

Cou

nt

Histogram (464 observations)

Figure B.12: Histogram of number of peaks by fragment size in PCR and extractionnegative controls (bin width=1). Data at AT=57 RFU. Possible spikes, off-ladder peaks,and likely artefacts have been removed.

4

Page 10: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

0.8162

0.1565

0.02730.0

0.2

0.4

0.6

0.8

No contamination Drop−in contamination Gross contaminationGroup

Pro

port

ion

Analysis of peaks from 1061 samples

Figure B.13: Bar plot of result groups in the analysed negative controls. Data at AT=57RFU. Possible spikes, off-ladder peaks, and likely artefacts have been removed.

5

Page 11: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

0.000

0.002

0.004

0.006

100 200 300 400Fragment size (bp)

Den

sity

Probability density function (210 observations)

Figure B.14: Fragment size distribution of drop-in events. The mode of the fragment sizeswas 100.2 bp, the median 130.7 bp, the mean 165.1 bp, the third quartile 223.1 bp, andthe maximum observed fragment size was 387.8 bp. Data at AT=57 RFU, and drop-indefined as 1-3 peaks in a negative control.

6

Page 12: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

AMEL D3S1358 TH01 D21S11 D18S51

D10S1248 D1S1656 D2S1338 D16S539

D22S1045 vWA D8S1179 FGA

D2S441 D12S391 D19S433 SE33

PowerPlex ESX 17 Fast System

100 200 300 400

Size (bp)

Marker size range

[42]

[23]

[21] [17]

[12]

[11]

[10]

[9]

[9]

[9]

[8]

[7]

[7]

[7]

[7]

[6][5]

Figure B.15: The marker size ranges of the ESX 17 Fast kit. The number of drop-in peaksobserved in each marker is given within the square brackets.

SE33

D19S433

D12S391

D2S441

FGA

D8S1179

vWA

D22S1045

D16S539

D2S1338

D1S1656

D10S1248

D18S51

D21S11

TH01

D3S1358

AMEL

S07.C

01.exp01.p1.1S

07.C01.exp01.p1.10

S07.C

01.exp01.p1.11S

07.C01.exp01.p1.13

S07.C

01.exp01.p1.14S

07.C01.exp01.p1.16

S07.C

01.exp01.p1.17S

07.C01.exp01.p1.19

S07.C

01.exp01.p1.2S

07.C01.exp01.p1.20

S07.C

01.exp01.p1.21S

07.C01.exp01.p1.22

S07.C

01.exp01.p1.24S

07.C01.exp01.p1.4

S07.C

01.exp01.p1.5S

07.C01.exp01.p1.7

S07.C

01.exp01.p1.8S

08.C01.exp02.p1.1

S08.C

01.exp02.p1.10S

08.C01.exp02.p1.11

S08.C

01.exp02.p1.13S

08.C01.exp02.p1.14

S08.C

01.exp02.p1.15S

08.C01.exp02.p1.16

S08.C

01.exp02.p1.17S

08.C01.exp02.p1.18

S08.C

01.exp02.p1.19S

08.C01.exp02.p1.2

S08.C

01.exp02.p1.21S

08.C01.exp02.p1.22

S08.C

01.exp02.p1.24S

08.C01.exp02.p1.4

S08.C

01.exp02.p1.5S

08.C01.exp02.p1.7

S08.C

01.exp02.p1.8S

09.C01.exp02.p1.1

S09.C

01.exp02.p1.10S

09.C01.exp02.p1.12

S09.C

01.exp02.p1.13S

09.C01.exp02.p1.15

S09.C

01.exp02.p1.16S

09.C01.exp02.p1.18

S09.C

01.exp02.p1.19S

09.C01.exp02.p1.2

S09.C

01.exp02.p1.21S

09.C01.exp02.p1.22

S09.C

01.exp02.p1.23S

09.C01.exp02.p1.24

S09.C

01.exp02.p1.3S

09.C01.exp02.p1.4

S09.C

01.exp02.p1.6S

09.C01.exp02.p1.7

S09.C

01.exp02.p1.9S

10.C01.exp02.p1.1

S10.C

01.exp02.p1.10S

10.C01.exp02.p1.11

S10.C

01.exp02.p1.12S

10.C01.exp02.p1.13

S10.C

01.exp02.p1.14S

10.C01.exp02.p1.15

S10.C

01.exp02.p1.17S

10.C01.exp02.p1.18

S10.C

01.exp02.p1.20S

10.C01.exp02.p1.21

S10.C

01.exp02.p1.23S

10.C01.exp02.p1.24

S10.C

01.exp02.p1.3S

10.C01.exp02.p1.4

S10.C

01.exp02.p1.6S

10.C01.exp02.p1.7

S10.C

01.exp02.p1.8S

10.C01.exp02.p1.9

Sample name

Mar

ker

Dropout

locus

allele

none

Allele and locus dropout

Figure B.16: Allele and locus drop-out visualized as a heatmap. The sample name isconstructed of the single molecule amplification id (S##), the number of sperm cells(C01), the experimental and PCR plate id (exp##p#), and a final number which is thecapillary id. Interpretation of the heatmap: Since sperm cells are haploid only one alleleis expected per marker. Consequently the drop-out value ’none’ (green) denotes detectedhomozygotes, ’allele’ (pink) denotes detected heterozygotes, and ’locus’ (purple) denotesdrop-out. Note that one sample had both alleles detected at the heterozygous loci TH01online supplement Figure B.20.

7

Page 13: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Sample FileSample File Sample NameSample Name PanelPanel SQOSQO SOSSOS SQSQ OMROMR

GeneMapper® ID-X 1.2

Project: Experiment_1_dropin

fr nov 18,2016 03:24PM, CET Printed by: gmidx Page 25 of 44

S07.C01_01_2016-10-04-15-22-07.hid S07.C01S07.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

A… D3S1358 TH01 D21S11 D18S51

X131

9.3117

28107

17113

0

70

70 140 210 280 350 420

S07.C01_01_2016-10-04-15-22-07.hid S07.C01S07.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D10S1248 D1S1656 D2S1338 D16S539

OL30

OL30

12143

2063

934

0

80

70 140 210 280 350 420

S07.C01_01_2016-10-04-15-22-07.hid S07.C01S07.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

FGAD22S1045 vWA D8S1179

22161

OL115

1582

1782

0

90

70 140 210 280 350 420

S07.C01_01_2016-10-04-15-22-07.hid S07.C01S07.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D2S441 D12S391 D19S433 SE33

14171

12122

18181

1330

29.290

0

100

70 140 210 280 350 420

Figure B.17: Typical EPG from the single molecule amplification of sample S07. Note theartefact in D22 detected as ’OL’.

8

Page 14: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Sample FileSample File Sample NameSample Name PanelPanel SQOSQO SOSSOS SQSQ OMROMR

GeneMapper® ID-X 1.2

Project: Experiment2Plate1_Dropin

fr nov 18,2016 03:02PM, CET Printed by: gmidx Page 31 of 110

S08.C01_14_2016-10-06-17-15-11.hid S08.C01S08.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

A… D3S1358 TH01 D21S11 D18S51

17131

Y165

1660

9.393

28129

1781

0

90

70 140 210 280 350 420

S08.C01_14_2016-10-06-17-15-11.hid S08.C01S08.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D10S1248 D1S1656 D2S1338 D16S539

14155

OL46

13201

1338

0

110

70 140 210 280 350 420

S08.C01_14_2016-10-06-17-15-11.hid S08.C01S08.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D22S1045 vWA D8S1179 FGA

6147

Spike

OL98

15174

1485

1581

0

90

70 140 210 280 350 420

S08.C01_14_2016-10-06-17-15-11.hid S08.C01S08.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D2S441 D12S391 D19S433 SE33

12187

21276

29.2109

0

140

70 140 210 280 350 420

Figure B.18: Typical EPG from the single molecule amplification of sample S08. Note theartefacts in D22 detected as ’OL’ and ’6’.

9

Page 15: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Sample FileSample File Sample NameSample Name PanelPanel SQOSQO SOSSOS SQSQ OMROMR

GeneMapper® ID-X 1.2

Project: Experiment2Plate1_Dropin

fr nov 18,2016 03:02PM, CET Printed by: gmidx Page 53 of 110

S09.C01_19_2016-10-06-17-15-11.hid S09.C01S09.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

TH01 D21S11 D18S51A… D3S1358

9.3152

28118

1753

Y168

15117

0

90

70 140 210 280 350 420

S09.C01_19_2016-10-06-17-15-11.hid S09.C01S09.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D10S1248 D1S1656 D2S1338 D16S539

OL33

13208

OL107

12181

1236

0

110

70 140 210 280 350 420

S09.C01_19_2016-10-06-17-15-11.hid S09.C01S09.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D22S1045 vWA D8S1179 FGA

OL35

6120

OL31

1556

1771

1552

22105

0

60

12070 140 210 280 350 420

S09.C01_19_2016-10-06-17-15-11.hid S09.C01S09.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D2S441 D12S391 D19S433 SE33

1083

18273

1596

29.263

0

140

70 140 210 280 350 420

Figure B.19: Typical EPG from the single molecule amplification of sample S09. Note theartefacts in D22 detected as ’OL’ and ’6’.

10

Page 16: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Sample FileSample File Sample NameSample Name PanelPanel SQOSQO SOSSOS SQSQ OMROMR

GeneMapper® ID-X 1.2

Project: Experiment2Plate1_Dropin

fr nov 18,2016 03:02PM, CET Printed by: gmidx Page 59 of 110

S10.C01_03_2016-10-06-17-15-11.hid S10.C01S10.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

A… D3S1358 TH01 D21S11 D18S51

15250

Y195

1437

641

9.368

28161

15148

0

130

70 140 210 280 350 420

S10.C01_03_2016-10-06-17-15-11.hid S10.C01S10.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D10S1248 D1S1656 D2S1338 D16S539

13203

1234

Spike 1282

20182

970

0

110

70 140 210 280 350 420

S10.C01_03_2016-10-06-17-15-11.hid S10.C01S10.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D22S1045 vWA D8S1179 FGA

6146

15318

14310

1559

22205

0

160

70 140 210 280 350 420

S10.C01_03_2016-10-06-17-15-11.hid S10.C01S10.C01 PowerPlex_ESX_17_Fast_IDX_v1.2PowerPlex_ESX_17_Fast_IDX_v1.2

D2S441 D12S391 D19S433 SE33

15299

12345

21482

1436

29.2104

0

250

70 140 210 280 350 420

Figure B.20: Typical EPG from the single molecule amplification of sample S10. Unex-pectedly, both heterozygous alleles are detected at locus TH01. Note the artefact in D22detected as ’6’.

11

Page 17: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Sample Marker Allele Height.Min Height.Max Height.Mean Height.n Height.Sd

S07 AMEL X 99 154 136.0 17 14.0

S07 TH01 9.3 85 140 117.3 17 15.0

S07 D21S11 28 73 116 100.2 17 12.0

S07 D18S51 17 77 126 105.4 17 13.6

S07 D1S1656 12 113 181 152.1 17 18.7

S07 D2S1338 20 61 79 68.1 15 5.9

S07 D22S1045 15 74 101 86.1 17 6.9

S07 vWA 17 67 97 80.9 17 7.6

S07 FGA 22 132 193 166.2 17 15.6

S07 D2S441 12 119 182 149.4 17 20.4

S07 D12S391 18 152 248 199.9 17 26.0

S07 D19S433 14 125 212 174.1 17 23.4

S07 SE33 29.2 67 113 92.3 17 13.1

Sample Marker Allele Height.Min Height.Max Height.Mean Height.n Height.Sd

S08 AMEL Y 118 380 174.9 18 55.6

S08 D3S1358 17 95 316 138.8 18 46.9

S08 TH01 9.3 62 233 96.6 18 36.5

S08 D21S11 28 93 341 142.6 18 55.2

S08 D18S51 17 61 203 88.0 17 36.0

S08 D10S1248 13 147 421 203.2 18 57.8

S08 D1S1656 14 110 363 161.8 18 53.5

S08 D22S1045 15 141 414 182.6 18 60.2

S08 vWA 14 64 201 88.6 18 32.0

S08 D8S1179 15 59 171 86.0 18 29.6

S08 D2S441 12 126 376 180.7 18 56.8

S08 D12S391 21 203 680 273.2 18 107.0

S08 SE33 29.2 80 305 115.6 18 50.1

Sample Marker Allele Height.Min Height.Max Height.Mean Height.n Height.Sd

S09 AMEL Y 88 203 162.6 18 27.1

S09 D3S1358 15 73 138 113.2 18 18.7

S09 TH01 9.3 110 247 163.3 18 34.2

S09 D21S11 28 89 214 123.2 17 27.7

S09 D18S51 17 57 80 64.7 9 8.6

S09 D10S1248 13 110 266 205.3 18 35.8

S09 D1S1656 12 106 288 181.8 18 38.0

S09 D16S539 12 64 64 64.0 1 NA

S09 D22S1045 15 57 77 66.6 10 5.7

S09 vWA 17 58 111 74.4 15 13.8

S09 D8S1179 15 59 82 68.7 12 7.7

S09 FGA 22 60 205 104.9 18 30.0

S09 D2S441 10 68 114 90.4 17 14.5

S09 D12S391 18 193 410 304.2 18 60.0

S09 D19S433 15 60 201 109.7 18 30.2

S09 SE33 29.2 57 141 75.1 16 19.1

Sample Marker Allele Height.Min Height.Max Height.Mean Height.n Height.Sd

S10 AMEL Y 153 355 238.9 19 46.1

S10 D3S1358 15 187 446 301.8 19 62.4

S10 TH01 6 67 74 70.3 3 3.5

S10 TH01 9.3 58 124 80.8 17 18.2

S10 D21S11 28 109 314 196.3 19 48.1

S10 D18S51 15 89 291 183.3 19 49.2

S10 D10S1248 13 142 359 237.2 19 53.1

S10 D1S1656 12 62 148 96.2 18 23.1

S10 D2S1338 20 119 356 218.6 19 60.0

S10 D16S539 9 68 152 93.4 17 22.0

S10 D22S1045 15 181 431 303.5 19 64.7

S10 vWA 14 170 458 303.9 19 74.1

S10 D8S1179 15 57 123 73.7 13 17.4

S10 FGA 22 97 322 198.9 19 49.1

S10 D2S441 12 231 530 367.5 19 81.0

S10 D12S391 21 271 781 501.9 19 133.9

S10 D19S433 15 162 476 307.7 19 81.9

S10 SE33 29.2 62 188 113.0 18 34.3

Figure B.21: Peak height summary statistics for the single molecule amplifications. Theminimum, maximum, and mean peak height is shown together with the number of obser-vations and the standard deviation.

12

Page 18: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

●●●

●●●●●

●●

●●●

●●●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●●●

●●●

●●●●●●●●

●●●●●●

●●●

●●●

●●

●●●●●●●

●●

●●●

●●●

●●●

●●●

●●●●●

●●●●

●●

●●

●●

●●

●●●●●●

●●●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●●

●●

●●●

●●●

●●●

●●●●●

●●●

●●

●●●

●●●

●●

●●

●●

● ●●●●●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●●●●●●●●●●●●●●

●●●

●●●●

●●●

●●●●●●●●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●●●●

●●●●●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●●●

●●●●●

●●●

●●

●●●

●●●●

●●

●●

●●●●●

●●●●●●●

●●

●●

●●

●●

●●

200

400

600

800

AMEL

D3S13

58

TH01

D21S11

D18S51

D10S12

48

D1S16

56

D2S13

38

D16S53

9

D22S10

45

vWA

D8S11

79

FGA

D2S44

1

D12S39

1

D19S43

3

SE33

Marker

Hei

ght Group

Drop−in

Single molecule

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

1.75

2.00

2.25

2.50

2.75

AMEL

D3S13

58

TH01

D21S11

D18S51

D10S12

48

D1S16

56

D2S13

38

D16S53

9

D22S10

45

vWA

D8S11

79

FGA

D2S44

1

D12S39

1

D19S43

3

SE33

Marker

Hei

ght Group

Drop−in

Single molecule

Figure B.22: Peak heights by marker for the two experiments. Normal scale (top) andlog10 scale (bottom). Note that two different sets of primer and reaction mix lot numberswere used for the drop-in data, and a third set for the single molecule data. This mayindicate marker specific differences in amplification efficiencies between kit production lots.

13

Page 19: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Appendix C. Example EPGs

Below follow some examples of easily distinguishable drop-in events, un-certain results, and some false positives. The difficulty to classify detecteddrop-in events as true or false increase with decreasing peak height. This ex-emplifies the complexities of real life data. Drop-in events are marked withred rectangles, and spikes are marked with blue rectangles.

The analysis conditions were as follows: Data was analysed down to30 RFU in GeneMapperID-X, while an AT of 57 RFU was used in STR-validator. The artefact in D22S1045 labelled as allele 6 was removed. Spikedetection required labelled peaks > 57 RFU vertically aligned in at least 3dyes. All off-ladder peaks were removed. Drop-in was defined as 1-3 peaks,and 4 or more was defined as gross contamination.

Unambiguous drop-in event

Figure C.23: Example of an unambiguous drop-in event with a nice background.

1

Page 20: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Three detected drop-in events

Figure C.24: Example of multiple likely drop-in events with a relatively nice background.

2

Page 21: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Undetected spike scored as two drop-in events.

Figure C.25: Example of two erroneously scored drop-in peaks. The spike was undetectedby the algorithm because two of the peaks lay outside the marker region (not present inthe genotypes table).

3

Page 22: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Drop-in event or artefact?

Figure C.26: Example of a likely drop-in event in an ugly background, possibly low-levelgross contamination.

4

Page 23: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Drop-in event, artefacts and/or weak gross contamination?

Figure C.27: Example multiple detected drop-in events that can be a combination oflow-level gross contamination, and artefacts or noise.

5

Page 24: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Drop-in event or artefact?

Figure C.28: Example of a detected drop-in event that may be an artefact.

6

Page 25: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Drop-in peaks or artefacts? Undetected gross-contamination (too low + OL) Undetected spike (one peak too low)

Figure C.29: Example of an ugly background with possible low-level gross contaminationin combination with an undetected spike.

7

Page 26: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Likely noise

Figure C.30: Example of a detected drop-in event that is more likely to be from backgroundnoise.

8

Page 27: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Likely artefacts detected as drop-in. Possible spike detected and removed.

Figure C.31: Example of an ugly background with a possible spike removed. Three re-maining artefacts detected as drop-in events.

9

Page 28: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Detected spike

Figure C.32: Example of a detected spike.

10

Page 29: AppendixA. Analysisprocedure · AppendixA. Analysisprocedure 1.ImportthedataintoSTR-validator. 2.Createhistogramsofnumberofpeaksbysize(clickthe’Plot’buttonin the’Distributions

Detected spike

Figure C.33: Example of a detected spike.

11