Umcp cs talk_11_3_16_v1

99
Ben Busby, Ph.D. Genomics Outreach Coordinator NCBI [email protected] Making the Transition from Sharing Data to Sharing Knowledge Genomic Variation in the Rising Era of Individual Genome Sequence

Transcript of Umcp cs talk_11_3_16_v1

Page 1: Umcp cs talk_11_3_16_v1

Ben Busby, Ph.D.Genomics Outreach Coordinator

[email protected]

Making the Transition from Sharing Data to Sharing

KnowledgeGenomic Variation in the Rising Era of Individual Genome Sequence

Page 2: Umcp cs talk_11_3_16_v1

but first...Better PubMed Searches!

Page 3: Umcp cs talk_11_3_16_v1

For more information go to:ncbi.nlm.nih.gov/learn

Page 4: Umcp cs talk_11_3_16_v1

Review of terminology and conceptsNext Generation Sequencing

Graphic Credit: Spencer Martin, UBC

Page 5: Umcp cs talk_11_3_16_v1

Review of terminology and conceptsHow Genomes are Mapped and Assembled

© Martine Zilversmit 2013

Page 6: Umcp cs talk_11_3_16_v1

http://1.usa.gov/1J1xmYs

NCBI NGS Online Workshop – Available on the NCBI YouTube Channel!

Review of terminology and conceptsHow Genomes are Mapped and Assembled

Page 7: Umcp cs talk_11_3_16_v1

BioProject

Page 8: Umcp cs talk_11_3_16_v1

BioProject

Page 9: Umcp cs talk_11_3_16_v1

dbGaP

Page 10: Umcp cs talk_11_3_16_v1

dbGaP

2007 2008 2009 2010 2011 2012 2013 2014 2015

14,20153,216

139,311

374,464

485,727

566,181

660,665

876,849

1,002,935

Subjects

Page 11: Umcp cs talk_11_3_16_v1
Page 12: Umcp cs talk_11_3_16_v1

dbGaP – GWAS and PheGenI

Page 13: Umcp cs talk_11_3_16_v1

dbGaP – GWAS and PheGenI

Page 14: Umcp cs talk_11_3_16_v1

dbGaP – ClinVar

Page 15: Umcp cs talk_11_3_16_v1

ClinVar

Page 16: Umcp cs talk_11_3_16_v1

ClinVar

Page 17: Umcp cs talk_11_3_16_v1

ClinVar – Why Should we Care?

Page 18: Umcp cs talk_11_3_16_v1

ClinVar – Why Should we Care?

Page 19: Umcp cs talk_11_3_16_v1

ClinVar – Why Should we Care?

Page 20: Umcp cs talk_11_3_16_v1

ClinVar – Why Should we Care?

Page 21: Umcp cs talk_11_3_16_v1

ClinVar – Why Should we Care?

Page 22: Umcp cs talk_11_3_16_v1

ClinVar – Why Should we Care?

Page 23: Umcp cs talk_11_3_16_v1

SRA Data Structures

Page 24: Umcp cs talk_11_3_16_v1
Page 25: Umcp cs talk_11_3_16_v1
Page 26: Umcp cs talk_11_3_16_v1
Page 27: Umcp cs talk_11_3_16_v1
Page 28: Umcp cs talk_11_3_16_v1

Investigation of NGS:SRA BLAST!

Page 29: Umcp cs talk_11_3_16_v1

sra-search

Page 30: Umcp cs talk_11_3_16_v1

sra-search

Page 31: Umcp cs talk_11_3_16_v1

sra-search

Page 32: Umcp cs talk_11_3_16_v1

Investigation of NGS:SRA BLAST!

Page 33: Umcp cs talk_11_3_16_v1

Investigation of NGS:MagicBLAST!

Page 34: Umcp cs talk_11_3_16_v1

Why SRA Data Structures?

sam-dump.2.6.3 --aligned-region 17:41243452-41277500 SRR925743 > BRCA1.sam

Page 35: Umcp cs talk_11_3_16_v1

GATK (use screen or &)

Page 36: Umcp cs talk_11_3_16_v1

.vcf from GATK

Page 37: Umcp cs talk_11_3_16_v1

hisat2

Page 38: Umcp cs talk_11_3_16_v1

Read Count generator (spark_genes)

Page 39: Umcp cs talk_11_3_16_v1

GitHub Repositories

Page 40: Umcp cs talk_11_3_16_v1

Visualizing Data on Assemblies

Page 41: Umcp cs talk_11_3_16_v1

Visualizing SRA in the Context of RefSeq

http://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_000009.11&app_context=Variation_Viewer_1-1&srz=SRR1556217&v=21967751:21994490

https://goo.gl/8GPv8S

Page 42: Umcp cs talk_11_3_16_v1

Helping Investigators make reads into [good] genomes!

Page 43: Umcp cs talk_11_3_16_v1

The NCBI Eukaryotic Annotation Pipeline

Page 44: Umcp cs talk_11_3_16_v1

The NCBI Prokaryotic Annotation Pipeline

Page 45: Umcp cs talk_11_3_16_v1

Transcriptome Shotgun Assembly Database

Page 46: Umcp cs talk_11_3_16_v1

Type Strain Databases

Page 47: Umcp cs talk_11_3_16_v1

Targeted Locus Studies!

Page 48: Umcp cs talk_11_3_16_v1

Making OTUs from Metagenomic DataMOLE-BLAST!

Page 49: Umcp cs talk_11_3_16_v1

“Superbankit!”

Page 50: Umcp cs talk_11_3_16_v1

Superbankit!

Page 51: Umcp cs talk_11_3_16_v1

Viral Genomes

Page 52: Umcp cs talk_11_3_16_v1

Virus Variation

Page 53: Umcp cs talk_11_3_16_v1

Virus Variation

Page 54: Umcp cs talk_11_3_16_v1

Virus Variation

Subscribe!

Page 55: Umcp cs talk_11_3_16_v1

Food Borne Pathogens

Page 56: Umcp cs talk_11_3_16_v1

Food Borne Pathogens

Page 57: Umcp cs talk_11_3_16_v1

Food Borne Pathogens

Page 58: Umcp cs talk_11_3_16_v1

Where to Get More Information!

Page 59: Umcp cs talk_11_3_16_v1

Where to Get More Information!

Page 60: Umcp cs talk_11_3_16_v1

E-Utilities (Eutils)

Video available at:http://www.ncbi.nlm.nih.gov/education/webinars/

Page 61: Umcp cs talk_11_3_16_v1

61

E-Utilities (Eutils)

Page 62: Umcp cs talk_11_3_16_v1

62

Introducing… Entrez DirectThe E-utilities on the UNIX

command line

esearch –db gene –query “foxp2[gene] AND human[orgn]” | \

elink –target protein –name gene_protein_refseq | \

efetch –format fasta

ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/

Page 63: Umcp cs talk_11_3_16_v1

63

Edirect Cookbook

Page 64: Umcp cs talk_11_3_16_v1

64

Moving from FTP-scraping cron jobs to on-demand APIs

Page 65: Umcp cs talk_11_3_16_v1

65

Edirect Cookbook (DRAFT)

Page 66: Umcp cs talk_11_3_16_v1

66

New APIs!

Page 67: Umcp cs talk_11_3_16_v1

67

Generating apps that work with our APIs and Data Structures,

and Improve Metadata:

NCBI Hackathons!

Page 68: Umcp cs talk_11_3_16_v1

January 2015 4 functional software products 3 days

Page 69: Umcp cs talk_11_3_16_v1

Hackathons

Page 70: Umcp cs talk_11_3_16_v1

August 2015 6 Functional Software Products 3 Days

Page 71: Umcp cs talk_11_3_16_v1

August 2015 6 Functional Software Products 3 Days

Page 72: Umcp cs talk_11_3_16_v1

August 2015 6 Functional Software Products 3 Days

Page 73: Umcp cs talk_11_3_16_v1

Hackathons

www.iMetric.io

Page 74: Umcp cs talk_11_3_16_v1

An Educational Resource for RNAseq

Available to

anyone on AWS

Page 75: Umcp cs talk_11_3_16_v1

Part of an Online Workshop

First 5 lectures

now available

on

Page 76: Umcp cs talk_11_3_16_v1

Community Tools

www.iMetric.io

Page 77: Umcp cs talk_11_3_16_v1

Community Tools

Page 78: Umcp cs talk_11_3_16_v1

Community Tools

Page 79: Umcp cs talk_11_3_16_v1

January 2016 6 Functional Software Products 3 Days

Page 80: Umcp cs talk_11_3_16_v1

January 2016 6 Functional Software Products 3 Days

Page 81: Umcp cs talk_11_3_16_v1

January 2016 6 Functional Software Products 3 Days

Page 82: Umcp cs talk_11_3_16_v1

Hackathons

www.iMetric.io

Page 83: Umcp cs talk_11_3_16_v1

January 2016

6 Functional Software Products 3 Days

Page 84: Umcp cs talk_11_3_16_v1

Hackathons

Page 85: Umcp cs talk_11_3_16_v1

January 2016 6 Functional Software Products 3 Days

Page 86: Umcp cs talk_11_3_16_v1

Hackathons

Page 87: Umcp cs talk_11_3_16_v1

Hackathons

Page 88: Umcp cs talk_11_3_16_v1

Hackathons

Page 89: Umcp cs talk_11_3_16_v1

January 2016 6 Functional Software Products 3 Days

Page 90: Umcp cs talk_11_3_16_v1

HackathonsJanuary 2016 6 functional software products 3 days

Page 91: Umcp cs talk_11_3_16_v1

Hackathons

Page 92: Umcp cs talk_11_3_16_v1

Hackathons

Page 93: Umcp cs talk_11_3_16_v1

In April, July, August and

October 2016

we built on

those projects .

Page 94: Umcp cs talk_11_3_16_v1

Hackathons

Page 95: Umcp cs talk_11_3_16_v1
Page 96: Umcp cs talk_11_3_16_v1

Finding immunogenic peptides from single RNA-seq samples

Page 97: Umcp cs talk_11_3_16_v1

DangerTrackDifficult to assess regions

Combined score is the average of SVs, mappability, GC..

NCBI region list

Encode blacklist

Page 98: Umcp cs talk_11_3_16_v1

Get More Info!

In Twitter @NCBI@DCGenomics

Page 99: Umcp cs talk_11_3_16_v1

In 2017 we will Build on Those Projects!

Biomedical Informatics Hackathon January 9th – 11th NIH Campus, Bethesda!

NCBI Genomics Hackathon March 20-22nd NIH Campus, Bethesda