Computational Analysis of Transcript Identification Using GenBank.

41
Computational Analysis of Transcript Identification Using GenBank
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    4

Transcript of Computational Analysis of Transcript Identification Using GenBank.

Page 1: Computational Analysis of Transcript Identification Using GenBank.

Computational Analysis of Transcript Identification Using

GenBank

Page 2: Computational Analysis of Transcript Identification Using GenBank.

Differentiation of hematopoietic cellsPluripotent stem cell

Myeloid Lymphoid

Erythrocyte PlateletMonocyteNeutrophil Eosinophil Basophil B cell T cell

Pluripotent stem cellMyeloid LymphoidMyeloid Lymphoid

Page 3: Computational Analysis of Transcript Identification Using GenBank.
Page 4: Computational Analysis of Transcript Identification Using GenBank.
Page 5: Computational Analysis of Transcript Identification Using GenBank.
Page 6: Computational Analysis of Transcript Identification Using GenBank.

Genome-wide gene expression

number of expressed genes level of expression

100

< 5 mRNA / cell

5--50 mRNA / cell

>500 mRNA / cell

9,000

900

Page 7: Computational Analysis of Transcript Identification Using GenBank.
Page 8: Computational Analysis of Transcript Identification Using GenBank.
Page 9: Computational Analysis of Transcript Identification Using GenBank.
Page 10: Computational Analysis of Transcript Identification Using GenBank.
Page 11: Computational Analysis of Transcript Identification Using GenBank.
Page 12: Computational Analysis of Transcript Identification Using GenBank.
Page 13: Computational Analysis of Transcript Identification Using GenBank.
Page 14: Computational Analysis of Transcript Identification Using GenBank.
Page 15: Computational Analysis of Transcript Identification Using GenBank.
Page 16: Computational Analysis of Transcript Identification Using GenBank.
Page 17: Computational Analysis of Transcript Identification Using GenBank.
Page 18: Computational Analysis of Transcript Identification Using GenBank.

SAGE (Serial Analysis of Gene Expression)

isolate SAGE tags

link tags together& sequencing

AAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAA

AAAAAAAAAA

AAAAAAAAAAA

AAAAAAA

AAAAAAAA

gene identification

mRNA/cDNA

Page 19: Computational Analysis of Transcript Identification Using GenBank.
Page 20: Computational Analysis of Transcript Identification Using GenBank.
Page 21: Computational Analysis of Transcript Identification Using GenBank.

SAGE & GLGI Overview

SPGI

SAGE

identify most of expressed genes

quantitative analysis of expressed genes by collecting tags

GLGI

Gene identification

GenBank

collect cDNA clones

mRNA

extend tags into longer 3' cDNAs

multi-match

single-match

no match

matchmatch

Page 22: Computational Analysis of Transcript Identification Using GenBank.

SAGE tags match to many genes(Tags from Hashimoto S, et al. Blood 94:837, 1999)

Tags matched gene numbers Matched genes (only show up to 10)

CCTGTAATCC 405 Hs.267557,Hs.240615,Hs.231705,Hs.283045,Hs.236713,Hs.232277,Hs.181553,Hs.262716,Hs.181392,Hs.220696GTGAAACCCC 305 Hs.282868,Hs.170225,Hs.184220,Hs.194021,Hs.231625,Hs.171830,Hs.270571,Hs.270572,Hs.272193,Hs.283921CCACTGCACT 174 Hs.118778,Hs.256868,Hs.96023,Hs.31575,Hs.47517,Hs.200451,Hs.271222,Hs.253240,Hs.270018,Hs.270415ACTTTTTCAA 44 Hs.16426,Hs.10669,Hs.75155,Hs.28166,Hs.13975,Hs.79136,Hs.111334,Hs.133430,Hs.79356,Hs.239100TTGGGGTTTC 9 Hs.231375,Hs.273127,Hs.275603,Hs.175173,Hs.276612,Hs.224773,Hs.62954,Hs.182771,Hs.276326TGCACGTTTT 8 Hs.199160,Hs.279943,Hs.36927,Hs.5338,Hs.169793,Hs.83450,Hs.173902,Hs.183506TGTGTTGAGA 5 Hs.284136,Hs.275865,Hs.275221,Hs.274466,Hs.181165CCCGTCCGGA 5 Hs.276353,Hs.277498,Hs.277573,Hs.276350,Hs.180842TTGGTCCTCT 4 Hs.12328,Hs.108124,Hs.9739,Hs.112845CTGACCTGTG 3 Hs.277477,Hs.181244,Hs.77961TACCTGCAGA 3 Hs.100000,Hs.256957,Hs.253884AGGCTACGGA 3 Hs.119122,Hs.211582,Hs.183297GGGCTGGGGT 3 Hs.183698,Hs.118757,Hs.90436CCCTGGGTTC 2 Hs.52891,Hs.111334CACAAACGGT 2 Hs.2043,Hs.195453GTGAAGGCAG 2 Hs.4221,Hs.77039GGGCATCTCT 2 Hs.75061,Hs.76807ATGGCTGGTA 2 Hs.254246,Hs.182426CGCCGCCGGC 2 Hs.182825,Hs.132753AGGGCTTCCA 2 Hs.29797,Hs.276544TTGGTGAAGG 2 Hs.278674,Hs.75968GTGGCCACGG 1 Hs.112405GTTCACATTA 1 Hs.84298TGGTGTTGAG 1 Hs.275865CCCATCGTCC 1 Hs.151604GTTGTGGTTA 1 Hs.75415TTGTAATCGT 1 Hs.125078CCCACAACCT 1 Hs.252136GAGGGAGTTT 1 Hs.76064CCAGAACAGA 1 Hs.111222

Page 23: Computational Analysis of Transcript Identification Using GenBank.

Tag Frequency Groups for 10-base Tag Set

Containing 878,938 Tags for UniGene Human

Page 24: Computational Analysis of Transcript Identification Using GenBank.

Unique Tags among 878,938 EST Derived Tags

Page 25: Computational Analysis of Transcript Identification Using GenBank.

Unique Tags among 32,851 Gene Derived Tags

Page 26: Computational Analysis of Transcript Identification Using GenBank.

Converting tag into longer 3’ sequence

3' end

3' end5' end

SAGE tag

3' longer sequence

Page 27: Computational Analysis of Transcript Identification Using GenBank.

Generation of Longer 3'cDNA for Gene Identification (GLGI)

TAAAAAAAAAAACTCGCCGGCGAANNNNNNNNNNATTTTTTTTTTTGAGCGGCCGCTT

10 bases

hundred bases

TAAAAAAAAAAACTCGCCGGCGAANNNNNNNNNN

NNNNNNNNNN

NNNNNNNNNN

NNNNNNNNNN

NNNNNNNNNN

Sense extension

antisense extension TGAGCGGCCGCTT

nnnnnnnnnn

nnnnnnnnnn

nnnnnnnnnn

nnnnnnnnnn

nnnnnnnnnn

nnnnnnnnnn

SAGE tag

TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT

TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT

TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT

TAAAAAAAAAAACTCGCCGGCGAA TGAGCGGCCGCTT

Page 28: Computational Analysis of Transcript Identification Using GenBank.

UniGene Human 3’ Part Length Distribution

Page 29: Computational Analysis of Transcript Identification Using GenBank.
Page 30: Computational Analysis of Transcript Identification Using GenBank.

Number of Tags which Move for k to k+25

Page 31: Computational Analysis of Transcript Identification Using GenBank.

Unique Tags among 878,938 EST Derived Tags

Page 32: Computational Analysis of Transcript Identification Using GenBank.

Unique Tags among 32,851 Gene Derived Tags

Page 33: Computational Analysis of Transcript Identification Using GenBank.

Idealized Construction

Page 34: Computational Analysis of Transcript Identification Using GenBank.

Random Model

Page 35: Computational Analysis of Transcript Identification Using GenBank.

Ideal Case Tag Count Progression

Page 36: Computational Analysis of Transcript Identification Using GenBank.

Myeloid Tag Matches with UniGene Human SAGE Tag Reference Database

Page 37: Computational Analysis of Transcript Identification Using GenBank.

SAGE Tag Processing with GIST

Page 38: Computational Analysis of Transcript Identification Using GenBank.

k-mer tree

Page 39: Computational Analysis of Transcript Identification Using GenBank.
Page 40: Computational Analysis of Transcript Identification Using GenBank.

GIST Performance with Improved IO

Page 41: Computational Analysis of Transcript Identification Using GenBank.

Conspirators

Sanggyu LeeJanet D. RowleySan Ming Wang

Terry ClarkAndrew HuntworkJosef JurekL. Ridgway Scott