................... GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48

1
................... GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48 GM01 GM10 0.52 GM01 GM11 0.60 GM01 GM12 0.68 GM02 GM01 0.04 GM02 GM02 0.00 GM02 GM03 0.08 GM02 GM04 0.16 GM02 GM05 0.20 GM02 GM06 0.24 ................... MadMapper And CheckMatrix: Python Scripts To Infer Orders Of Genetic Markers And For Visualization And Validation Of Genetic Maps And Haplotypes. Alexander Kozik and Richard Michelmore. The Genome Center, University of California Davis, CA 95616. Contemporary molecular marker techniques can generate mapping data for thousands molecular markers simultaneously. Construction and validation of high density genetic maps is a challenge and requires robust, high-throughput approaches. As part of the Compositae Genome Project, we developed a suite of Python scripts for quality control of genetic markers, grouping and inference of linear order of markers in linkage groups. These scripts can be used in conjunction with other mapping programs or can be used as a stand- alone package. The suite consists of three programs: MadMapper_RECBIT, MadMapper_XDELTA and CheckMatrix. MadMapper_RECBIT analyses raw marker scores for recombinant inbred lines. MadMapper_RECBIT generates pairwise distance scores for all markers, clusters based on pairwise distances, identifies genetic bins, assigns new markers to known linkage groups, validates allele calls, and assigns quality classes to each marker based on several criteria and cutoff values. MadMapper_XDELTA utilizes a new algorithm, Minimum Entropy Approach and Best-Fit Extension, to infer linear order of markers. MadMapper_XDELTA analyzes two- dimensional matrices of all pairwise scores and finds best map that has minimal total sum of differences between adjacent cells (map with lowest entropy). This approach scales well and can accommodate large numbers of markers, unlike some commonly used mapping programs. CheckMatrix serves as a visualization tool to validate constructed genetic maps. CheckMatrix generates graphical genotypes and two-dimensional heat plots of pairwise scores. Visualization of regions with positive and negative linkage as well as of allele fraction per marker simplifies genetic map validation without applying statistical approaches. Scripts are freely available at http://cgpdb.ucdavis.edu/XLinkage/MadMapper/ BRIEF DESCRIPTION OF RIL MAPPING PIPELINE: 1. Processing of raw markers scores and grouping : MadMapper_RECBIT generates multiple text files for further analysis 2. Construction of genetic map (ordering of markers) per linkage group: MadMapper_XDELTA (or any other mapping program) 3. Visualization and validation of genetic maps : CheckMatrix generates heat plots of recombination scores and graphical genotyping MadMapper and CheckMatrix are Python scripts and can be used on any computer platform: UNIX, Windows, Mac OS-X. Grouping can be done on a set of ~2,000 markers; map construction works in reasonable timeframe with up to ~500 markers MadMapper_XDELTA JoinMap Record physical coordinates of markers on Arabidopsis genome inferred order of markers by three different approaches (mapping programs) Side-by-side comparison of linear order of markers on Arabidopsis genome inferred by three different approaches (mapping programs) and comparison with physical order of markers (Col- 0 genomic sequence): MadMapper_XDELTA (minimum entropy approach), JoinMap (maximum likelihood) and RECORD (minimum number of recombination events) [Diagonal dot-plot was created using GenoPix_2D_Plotter] regions with negative linkage regions with quasi linkage main diagonal with linked markers 2-D diagonal ChekMatrix heat-plot: all markers versus all markers [color gradient reflects linkage scores between markers] Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V CheckMatrix graphical genotyping Haplotypes per RIL (inbred line) [ red – Columbia; blue – L.erecta ] LINEAR ORDER OF MARKERS INFERRED BY THREE DIFFERENT METHODS: REFERENCES AND DATA SOURCES: 1. Dean and Lister Arabidopsis Genetic Map and Raw Data: http://www.arabidopsis.info/new_ri_map.html 2. MadMapper: http://cgpdb.ucdavis.edu/XLinkage/MadMapper/ 3. JoinMap: http://www.kyazma.nl/index.php/mc.JoinMap 4. RECORD: http://www.dpw.wau.nl/pv/pub/recORD/index.ht m 5. GenoPix_2D_Plotter http://www.atgc.org/GenoPix_2D_Plotter/ CREDITS: This work was funded by NSF grant # 0421630 to Compositae Genome Consortium http://compgenomics.ucdavis.edu/ PAG-14 POSTERS WITH EXAMPLES OF MADMAPPER USAGE: #P751 High-Density Haplotyping With Microarray-Based Single Feature Polymorphism Markers In Arabidopsis allele composition per markers MINIMUM ENTROPY APPROACH TO INFER LINEAR ORDER OF MARKERS: CheckMatrix 2D plot: random order high ‘entropy’ partially wrong order right order low ‘entropy’ Example of group analysis by MadMapper_RECBIT grouping cutoff stringency distinct linkage group #4 MadMapper_XDELTA analyzes two- dimensional matrices of all pairwise scores and finds best map that has minimal total sum of differences between adjacent cells (map with lowest ‘entropy’). Two-dimensional matrix of recombination pairwise scores CheckMatrix Color Scheme adjacent cells (values) Numerical data generated by MadMapper Visualization of numerical data using ChekMatrix Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V VISUALIZATION OF ARABIDOPSIS GENETIC MAP (DEAN AND LISTER, http://www.arabidopsis.info/ ) USING CHECKMATRIX [ MAP WAS RE-CONSTRUCTED USING MADMAPPER ] high density of markers low density of markers MadMapper JoinMap RECORD CHECKMATRIX USAGE: Three input files are required: LG GM01 0 LG GM02 1 LG GM03 2 LG GM04 3 LG GM05 4 LG GM06 5 LG GM07 6 LG GM08 7 LG GM09 8 LG GM10 9 LG GM11 10 LG GM12 11 Map file Matrix file ; 1 10 20 25 ; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A Locus file CheckMatrix Upon program execution three output files will be generated: HEAT PLOT – it assists to validate the quality of constructed genetic map and identify markers with wrong position GRAPHICAL GENOTYPING: visualization of haplotypes per recombinant line (suspicious double crossovers are highlighted) 1 2 CIRCULAR GRAPH – it assists to validate genetic map and identify markers with spurious linkage 3

description

MadMapper_XDELTA. JoinMap. Record. MadMapper And CheckMatrix: Python Scripts To Infer Orders Of Genetic Markers And For Visualization And Validation Of Genetic Maps And Haplotypes. Alexander Kozik and Richard Michelmore. The Genome Center, University of California Davis, CA 95616. - PowerPoint PPT Presentation

Transcript of ................... GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48

Page 1: ................... GM01   GM07   0.36  GM01   GM08   0.40  GM01   GM09   0.48

...................GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48 GM01 GM10 0.52 GM01 GM11 0.60 GM01 GM12 0.68 GM02 GM01 0.04 GM02 GM02 0.00 GM02 GM03 0.08 GM02 GM04 0.16 GM02 GM05 0.20 GM02 GM06 0.24 ...................

MadMapper And CheckMatrix: Python Scripts To Infer Orders Of Genetic Markers And

For Visualization And Validation Of Genetic Maps And Haplotypes.Alexander Kozik and Richard Michelmore.

The Genome Center, University of California Davis, CA 95616.

Contemporary molecular marker techniques can generate mapping data for thousands molecular markers simultaneously. Construction and validation of high density genetic maps is a challenge and requires robust, high-throughput approaches. As part of the Compositae Genome Project, we developed a suite of Python scripts for quality control of genetic markers, grouping and inference of linear order of markers in linkage groups. These scripts can be used in conjunction with other mapping programs or can be used as a stand-alone package. The suite consists of three programs: MadMapper_RECBIT, MadMapper_XDELTA and CheckMatrix. MadMapper_RECBIT analyses raw marker scores for recombinant inbred lines. MadMapper_RECBIT generates pairwise distance scores for all markers, clusters based on pairwise distances, identifies genetic bins, assigns new markers to known linkage groups, validates allele calls, and assigns quality classes to each marker based on several criteria and cutoff values. MadMapper_XDELTA utilizes a new algorithm, Minimum Entropy Approach and Best-Fit Extension, to infer linear order of markers. MadMapper_XDELTA analyzes two-dimensional matrices of all pairwise scores and finds best map that has minimal total sum of differences between adjacent cells (map with lowest entropy). This approach scales well and can accommodate large numbers of markers, unlike some commonly used mapping programs. CheckMatrix serves as a visualization tool to validate constructed genetic maps. CheckMatrix generates graphical genotypes and two-dimensional heat plots of pairwise scores. Visualization of regions with positive and negative linkage as well as of allele fraction per marker simplifies genetic map validation without applying statistical approaches. Scripts are freely available at http://cgpdb.ucdavis.edu/XLinkage/MadMapper/

BRIEF DESCRIPTION OF RIL MAPPING PIPELINE:1. Processing of raw markers scores and grouping: MadMapper_RECBIT generates multiple text files

for further analysis2. Construction of genetic map (ordering of markers) per linkage group: MadMapper_XDELTA (or any

other mapping program)3. Visualization and validation of genetic maps: CheckMatrix generates heat plots of

recombination scores and graphical genotyping

MadMapper and CheckMatrix are Python scripts and can be used on any computer platform: UNIX, Windows, Mac OS-X. Grouping can be done on a set of ~2,000 markers; map construction works in reasonable timeframe with up to ~500 markers

MadMapper_XDELTA JoinMap Record

physical coordinates of markers on Arabidopsis genome

infe

rred

ord

er o

f mar

kers

by

thre

e di

ffere

nt

appr

oach

es (m

appi

ng p

rogr

ams)

Side-by-side comparison of linear order of markers on Arabidopsis genome inferred by three different approaches(mapping programs) and comparison with physical order of markers (Col- 0 genomic sequence):

MadMapper_XDELTA (minimum entropy approach), JoinMap (maximum likelihood) and RECORD (minimum number of recombination events) [Diagonal dot-plot was created using GenoPix_2D_Plotter]

regions with negative linkage

regions with quasi linkage

main diagonalwith linked markers

2-D diagonal ChekMatrix heat-plot: all markers versus all markers [color gradient reflects linkage scores between markers]

Link

age

grou

p I

Link

age

grou

p II

Link

age

grou

p III

Link

age

grou

p IV

Link

age

grou

p V

Linkage group I Linkage group II Linkage group III Linkage group IV Linkage group V CheckMatrixgraphical genotyping

Haplotypes per RIL (inbred line)[ red – Columbia; blue – L.erecta ]

LINEAR ORDER OF MARKERS INFERRED BY THREE DIFFERENT METHODS:REFERENCES AND DATA SOURCES:1. Dean and Lister Arabidopsis Genetic Map and Raw Data: http://www.arabidopsis.info/new_ri_map.html2. MadMapper: http://cgpdb.ucdavis.edu/XLinkage/MadMapper/3. JoinMap: http://www.kyazma.nl/index.php/mc.JoinMap4. RECORD: http://www.dpw.wau.nl/pv/pub/recORD/index.htm5. GenoPix_2D_Plotter http://www.atgc.org/GenoPix_2D_Plotter/

CREDITS: This work was funded by NSF grant # 0421630 to Compositae Genome Consortium http://compgenomics.ucdavis.edu/

PAG-14 POSTERS WITH EXAMPLES OF MADMAPPER USAGE:

#P751 High-Density Haplotyping With Microarray-Based Single Feature Polymorphism Markers In Arabidopsis

#P761 Gene Expression Markers: Using Transcript Levels Obtained From Microarrays To Genotype A Segregating Population

allele compositionper markers

MINIMUM ENTROPY APPROACH TO INFER LINEAR ORDER OF MARKERS:

CheckMatrix 2D plot:

randomorderhigh

‘entropy’

partiallywrongorder

rightorderlow

‘entropy’

Example of group analysis by MadMapper_RECBIT

grouping cutoff stringency

dist

inct

link

age

grou

p #4

MadMapper_XDELTA analyzes two-dimensional matrices of all pairwise scores and finds best map that has minimal total sum of differences between adjacent cells (map with lowest ‘entropy’).

Two-dimensional matrix of recombination pairwise scores

CheckMatrix Color Scheme

adjacent cells(values)

Numerical datagenerated

by MadMapper

Visualization of numerical data

using ChekMatrix

Link

age

grou

p I

Link

age

grou

p II

Link

age

grou

p III

Link

age

grou

p IV

Link

age

grou

p V

VISUALIZATION OF ARABIDOPSIS GENETIC MAP (DEAN AND LISTER, http://www.arabidopsis.info/ ) USING CHECKMATRIX[ MAP WAS RE-CONSTRUCTED USING MADMAPPER ]

high densityof markers

low densityof markers

MadMapper JoinMap RECORD

CHECKMATRIX USAGE:Three input files are required:

LG GM01 0 LG GM02 1 LG GM03 2 LG GM04 3 LG GM05 4 LG GM06 5 LG GM07 6 LG GM08 7 LG GM09 8 LG GM10 9 LG GM11 10 LG GM12 11

Map fileMatrix file

; 1 10 20 25; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A

Locus file

CheckMatrix

Upon program execution three output files will be generated:

HEAT PLOT – it assists to validate the quality of constructed genetic map and identify markers with

wrong position

GRAPHICAL GENOTYPING: visualization of haplotypes per

recombinant line (suspicious double

crossovers are highlighted)

1

2

CIRCULAR GRAPH – it assists to validate

genetic map and identify markers with

spurious linkage

3