GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Post on 05-Dec-2014

671 views 2 download

description

 

Transcript of GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

ISMU pipeline for NGS data analysis and facilitating

molecular breeding

http://hpc.icrisat.cgiar.org/NGS/

• Short read length of sequences• Availability of many tools• Platform dependency and command line driven• No direct ways for prediction of SNPs between

genotypes• Quality scores vary depending on version and

technology

Challenges

ISMU version 1

• SNP discovery from NGS data

– Pipeline for mapping / assembling

– Calling SNPs between genotypes

– Visualisation

ISMU version 2

• Application of identified SNPs to breeding

• Benchmark available open source short reads assembly and downstream analysis programs/software.

• Assembly and polymorphism detection between genotypes and visualization

• Assay design (Illumina GoldenGate Assay), genotype calling and visualization and analysis of SNP genotyping and haplotype data

• Identify and use parental lines for using in MABC or MARS

• Discovery of SNP markers for use in foreground and background selection of MABC or MARS.

• Documentation of the pipeline and the integrated software.

Objectives of NGS Pipeline

Control Flowchart

ICRISATCROPS

YesNo

Input Data & validation

Upload Reference& data

Mapping (Maq,Novo)

Mapped reads

Assembly Visualization

Consensus calling

Report SNPs

• Extract sequences with SNPs• Design primers

• In silico validation by SNP2CAPS

DatabaseADT Score

G.G Assay

Bead Studio

Flapjack

Genotype 1 Genotype 2

Chrom1 Pos RefAllele Gtyp1 Gtyp25 303 A G ?

Maq NovoProgramme

SNP Bet Genotypes

Standard Methodology

Mapping Mapping

Assembly

SNP Callingag. Reference

ADT Scoring

Reporting

Remove duplicates

Check the inverse combination

Compare allele between genotypes

Base calling in 2nd genotype

Predicted SNPs against Reference

Customized Methodology (Consensus Base Calling-cc)

ccMaq ccNovo

SNP Calling

Genotype 1 Genotype 2

Programme

Inhouse Script

ADT scoring

Genotype 2fmaj=21/28=0.75

Genotype 1fmaj =38/40=0.95

Mapping Mapping

Consensus Base CallingParameters (Default)

• Max number of mismatches <= 7• Sum of mismatches score <=60• Min mapping quality =>0• Read depth threshold =>5• Major base frequency threshold => 0.75

What if more than 2 genotypes?

Genotype1

Genotype2Genotype3

Genotype4

G1 G2 G3

G1 0 1 1

G2 0 0 1

G3 0 0 0

Combination of genotypes = (n2–n)/2

• Reads format fna and qual(Standard/Sanger)FastqSCARF fomatSolexa fastq, Solexa exportAB SOLiD read formatFASTA

• Reference sequenceChickpea transcript assemblyPearl millet transcript assemblyPigeonpea transcript assemblyMedicago genomeSorghum genome

NGS pipeline input data

NGS pipeline (Input 1)

http://hpc.icrisat.cgiar.org/NGS/

NGS pipeline (Input 2)

NGS pipeline (Help page)

NGS pipeline (Results)

NGS pipeline (Visualisation)

Available in 2 Editions

1. Server Edition

2. Desktop Edition

Pipeline Editions

• User friendly web interface– Installation on following Linux platform

• Fedora 13• Cent OS 5

• Clients can be any OS with a web browser• Communication resources

• SMTP (Email)

• Session specific job processing- Avoid file over writing

Server Edition

Desktop Edition

• All functionalities of Server Edition on a Desktop

• Supported OS

• Fedora 13

• RHEL 5

• Single command installation

• Available in Installable CD

Future plans

•Consideration of new tools to integrate / update eg: BWA, Bowtie

•Implementation of the extension to the pipeline

•Evaluate cloud computing and high performance computing cluster options

•Initiatives such as iPlant (discovery environment – genotype to phenotype)

• Identification ofappropriate modules forMARS, GWS and GBS

• Integration of MARS andGWS module

• Linking of ISMU pipelinewith DMS of IBP

• Documentation & Trainingof ISMU pipeline

Future Plans: ISMU v 2

Internet

Architecture

ReferenceSequences

Velvet

Perl Prog

Maq

Novo

CGISNP Database

Files downloading

DynamicQuerying

AssemblyVisualization

Input datavalidation

NGS Data Analysis pipeline at ICRISAT

Apache ServerHosting Web

Pages

SMTPServer

• Rajeev K. Varshney

• Abhishek Rathore

• Jayashree B

• Vivek Thakur

• R. Pradeep

• A. Bhanu Prakash

• Sarwar Azam

• G.Meenakshi

• David Marshall

• Iain Milne

Contributors

• Jonathan Jones

• David Studholme

• Greg May

• Andrew Farmer

• Jimmy Woodward

• Dave Edwards