Sequence Tracking

22
Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics nderstanding your sequence context

description

Sequence Tracking. Understanding your sequence context. Deanna M. Church Staff Scientist, NCBI. Short Course in Medical Genetics 2013. @ deannachurch. What’s in a name?. Bob. Bob. Bob. Bob. What’s in a name?. Bob. *. 123-45-6789. *http://howmanyofme.com. What’s in a name?. Bob. - PowerPoint PPT Presentation

Transcript of Sequence Tracking

Page 1: Sequence Tracking

Sequence TrackingDeanna M. Church Staff Scientist, NCBI

@deannachurch Short Course in Medical Genetics 2013

Understanding your sequence context

Page 2: Sequence Tracking

What’s in a name?Bob Bob

BobBob

Page 3: Sequence Tracking

Bob

*

*http://howmanyofme.com

What’s in a name?

123-45-6789

Page 4: Sequence Tracking

Bob

MirandaLydia

Samantha

What’s in a name?

Need more than unique identifiertrack updates/improvements

Page 5: Sequence Tracking

chr1Chr11Chrom1

Page 6: Sequence Tracking

Mouse chrX: 34,800,000-34,890,000

NC_000086.123456 CM001013.17 2

Page 7: Sequence Tracking

Mouse chrX: 35,000,000-36,000000

X

MGSCv3 MGSCv36

Page 8: Sequence Tracking

GenBank

Data Archives

Data in a common formatData in a single location (and mirrored)Most quality checked prior to depositionRobust data tracking mechanism (accession.version)Data owned by submitter

Page 9: Sequence Tracking

Data tracking ABC14-1065514J1

GapsPhase LengthDate

FP565796.1 1 121-Oct-2009

FP565796.2 1 014-Oct-2010

FP565796.3 3 007-Nov-2010

Page 10: Sequence Tracking

Data Archives

Initial versions of human and mouse reference assemblies not in INSDC!!*

First human version in INSDC: GRCh37First mouse version in INSDC: NCBI36

* But were tracked by RefSeq

Page 11: Sequence Tracking

Data ArchivesINSDC archives track INDIVIDUAL sequences

An assembly is a COLLECTION of sequences

Page 12: Sequence Tracking

hg19GRCh37

mm8MGSCv37

NCBIM37

danRer5Zv7

More naming issues

Page 13: Sequence Tracking

chr21:8,913,216-9,246,964

Zv7

Page 14: Sequence Tracking

Zv7 chr21:8,913,216-9,246,964 vs MGSCv36 chrX

Page 15: Sequence Tracking

http://www.ncbi.nlm.nih.gov/genome/assembly

GRCh37hg19

Page 16: Sequence Tracking
Page 17: Sequence Tracking

Genome Browser AgreementSubmitter deposits

assembly to GenBank/EMBL/DDBJ

Assembly QA

Submitter updates assembly based on QA

results

Browsers pick up assembly from

GenBank/EMBL/DDBJ Assemblies must be in GenBank/EMBL/DDBJ

Page 18: Sequence Tracking

GenBank RefSeq vs

Submitter Owned RefSeq OwnedRedundancy Non-Redundant

Updated rarely CuratedINSDC Not INSDC

BRCA183 genomic records31 mRNA records27 protein records

3 genomic records 5 mRNA records1 RNA record5 protein records

Page 19: Sequence Tracking
Page 20: Sequence Tracking

RefSeq for Assemblies

Typical assembly edits

Addition of non-nuclear (e.g. MT) assembly units

Removal of contamination

Drop unlocalized/unplaced scaffoldsMask contamination that is placed on chromosome(while preserving coordinate space)

Page 21: Sequence Tracking

http://www.ncbi.nlm.nih.gov/assembly/organism/9606/

Human assemblies in assembly database

Page 22: Sequence Tracking

Take home messages

Assemblies can (and do) update!Know what assembly your are working on

Track by accession.version, not just nameData in INSDC databases are mirroredRefSeq is NCBI specific