CLC bio presentation at 5th SFAF 6/3/2010
-
Upload
saul-kravitz -
Category
Technology
-
view
1.176 -
download
2
description
Transcript of CLC bio presentation at 5th SFAF 6/3/2010
CLC bioA Comprehensive Platform
for NGS Data Analysis
Saul A. Kravitz, PhDDirector of Consulting Services
Before the Flood
2005: $5M Human genome – 19 sequencer years
Sample Prep AnalysisSequencing Science
Nextgen Sequencing Revolution
Sample Prep AnalysisSequencing Science
2010: $6k Human genome ~1 sequencer day
Help!!
Bioinformatics Challenges
•Data Analysis Tools for Biomedical Researchers•GUI-driven•HPC integration
•Unprecedented data volumes•Rapid technology change, applications growth
•Multi-platform data integration•No one-size-fits-all solutions
•Rapid customization and adaptation
CLC bio NGS Analysis Platform
CLC Genomics WorkbenchCLC Genomics Server
CLC Assembly CellDeveloper SDK
Easy to use, Wizard-driven Desktop SoftwareEnterprise solution
High performance NGS algorithms
Workbench and Server Customization
Swiss Army Knife of NGS Analysis
Genomics Transcriptomics EpigenomicsRNA-SeqmiRNA
CHIP-SeqRead MappingDe Novo AssemblySNP/DIP Detection
Visualization
File Format Conversion
Desktop SolutionsEnterpriseSolutions
Traditional Bioinformatics
Intuitive GUISDK
Tools Integration
High Performance
Why not use free tools?
•Are tools free or “free”?
•Tools vs solutions
•True cost of ownership
•Ease of Use
•Tools integration
•Support
Small RNA Analysis(in Beta soon)
•Identify and filter/trim adapters
•annotate using mirBASE and other resources
- target species of interest
•Merge/group by mature, precursor/reference
•Fully integrated with expression analysis
De Novo Assembler
• Human assembly of 38x Illumina paired-end
• CLC Quality equivalent to Abyss
• CLC: 7 hrs, 1 node, 42 Gb of RAM
• Abyss: 80 hrs, 21 nodes, 336 Gb of RAM
• Metagenomics Assembly
• METAHIT Dataset MH0041 40M 75bp paired end
• 3 hrs on desktop, 6 Gb RAM
• Higher N50 and Total Contig Size than Reported
Viral Sequencing at JCVI(See Nadia Fedorova’s Poster!)
• Amplify and Barcode using SISPA, 454 + Illumina Sequencing
• Depth of coverage sometimes >1000x
• De novo Assembly of Consensus for all Segments
• For each segment:
• Map reads from each technology independently using best full length reference from NCBI, call variations
• Update reference with variations confirmed by multiple technologies
• Map reads using updated reference and all reads
• Convert to consed, analyze, order Sanger closure reactions
Source: Jessica Hostetler, Nadia Federova, Tim Stockwell, Danny Katzel
Why CLC bioTools?
• CLC handled hybrid sequencing technologies directly
• Very biased coverage confounded other assemblers that expect random arrival stats. CLC didn’t seem to suffer from biased coverage.
• Very accurate SNP calls in areas of deep coverage.
Tim StockwellDirector of Viral InformaticsJ. Craig Venter Institute
Targeted Resequencing QC
•Assessment of targeted sequencing technology
•Coverage Statistics for Targeted Regions
•Very short schedule, limited bioinformatics staff
•Plug-in development leveraging CLC tools to automate the process and meet short deadline
•QC Report now available as plug-in
Professional Services
•Developing customized solutions
•Integration with LIMS, workflows, DB
•Bioinformatics Algorithm Development
•Cloud and Grid Integration
•Data Analysis
Thank you for listening
Saul A. Kravitz, PhDskravitz @ clcbio.com 301)355-0813
Questions