Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

20
Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology

Transcript of Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Page 1: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Some Jolly Fun with Barley ESTs

David Marshall&

All the Folks in Computational Biology

Page 2: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Summary of ESTs – Sep 13, 2002

Top Twelve PlantsGlycine max (soybean) 274,840Hordeum vulgare (barley) 262,138Triticumaestivum(bread wheat) 205,506Zea mays (maize) 179,431Arabidopsis thaliana (thale cress) 174,624Medicago truncatula (barrel medic) 170,500Lycopersicon esculentum(tomato) 148,346Oryza sativa (rice) 108,429Solanumtuberosum(potato) 94,420Sorghum bicolor (sorghum) 84,712Lactuca sativa (lettuce) 68,188Pinus taeda (loblolly pine) 60,226

Top Four Non-PlantHomo sapiens (human) 4,664,006Mus musculus + domesticus (mouse) 2,691,077Rattus sp. (rat) 351,864Drosophila melanogaster (f ruit fly) 256,583

Page 3: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

BLAST for Recognition of Undesirable ClonesSummary of 84 Barley Libraries (ver. 0.90)

# . %

High quality sequences282,720 E. coli genome 507 0.18Lambda genome 39 0.01 rRNA 6,075 2.15Chloroplast 2,664 0.94Mitochondrion 204 0.07Fungal cDNA 366 0.13Repetitive Elements 289 0.10Low complexity 1,194 0.42Odd vector 37 0.01Both polyA & polyT 28 0.01

Total Good 271,317 96.0

Page 4: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Unigenes in ESTs in Current Assembly

Ideally: one “unigene” per gene in the genome, expecting ~50,000 based on rice.Maximum unigene count in ESTs: the sum of the number of contigs and singletons following assembly:

Contigs 24,208Singletons 24,899Total 49,107

Minimum unigene count in ESTs: the sum of the number of contigs and singletons that have good 3’ ends:

Contigs 14,589Singletons 7,219Total 21,880

Page 5: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Microarray Chip Gene Expression Data

http://www.affymetrix.com/

The Immediate Objective

Page 6: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
Page 7: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
Page 8: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
Page 9: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Barley 2H Caleosins

Hvcal1 Hvcal2

Barley 2H

Steptoe x Morex

Rice R4 Gene Map

Oscal1 Oscal2BAC OSJB0004

<0cM>

< 8kb >

78.2cM

0cM

77cM

EST alignmen

t

EST alignmen

t

Page 10: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

TIGR Rice Caleosin Gene Models

OSCal01(R4)

OSCal03(R3)

OSCal02(R4)

Page 11: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Barley

Rice

Barley

Rice

Barley

Rice

Exon 1

Exon 1

Exon 1bExon 1a

Exon 2

Exon 2

Exon 2 Exon 3

Exon 3

Exon 3

Exon 4

Exon 4

Exon 4

Exon 6

Exon 5

Exon 5

Exon 6

Exon 6

Exon 6

Caleosin2

Caleosin1

Caleosin3

156

156

156

156

149

150

86

86

86

86

86

86

96

95

96

99

95

95

125

126

125

125

126

126

Comparison of Gene Structures of Barley and

Rice Caleosins

Page 12: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
Page 13: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
Page 14: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Wheat Group 5 Deletions

Page 15: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

0

1

2

3

4

5

6

7

8

9

10

11

12

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141

Wheat ESTs mapped to Group 3 Deletion lines

Ric

e C

hro

mso

mes

Homology of Wheat G3 Deletion line mapped ESTs to Rice

Chromosomes

Page 16: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
Page 17: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

General Comclusions

• EST sequence• May lack polyA • Reading frame may be ambiguous• Exon/intron boundaries may not be obvious• We don’t have all barley genes despite >330,000

ESTS. (probably between 33% to 50%.

• Value of comparative studies with rice• BUT poor annotation (actually appalling)• Rice genomic sequencing is work in progress• Comparative route is OK but can’t be only game in

town. Several examples of genes not being there !!!

Page 18: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Major Issues• Data validation

» Errors in public database sequence» Errors in annotation» ‘Chinese whispers’ – anchoring annotation in biochemistry

• Comparative Data» Rice > wheat > maize – but also Arabidopsis» When is homology actually orthology ?» Partial data sets» % match only part of the story» Need for domain/feature information – mammalian/bacterial bias» Everything in work in progress ?

• Where are the data sources» dbEST» Nr nucleotide database at NCBI» Gramene at CSHL» TIGR» GrainGenes/wEST at USDA, Albany» CUGI > AGI» Iowa State/USDA» Harvest/Foxpro» ContEST at SCRI» The horses mouth

Page 19: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Phenotype <-> Sequence• Sd1 – green revolution gene in rice. Mutation in

gibberellin-20 oxidase (plant hormone production pathway) one member of a small gene family other members have subtely different pattern of expression able to partially compensate for mutation.

• Rht1 – green revolution gene in wheat. Mutation in receptor response pathway. Copies in all 3 wheat genomes

• Barley - commercially significant dwarfs from both of these and several other pathway or response genes.

Page 20: Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.

Acknowledgements• Robbie Waugh• Peter Hedley, • David Caldwell, • Luke Ramsay,• Hui Liu• Linda Cardle• Paul Shaw• Arnise Druker

• Doreen Ware• Dave Mathews• Tim Close• Olin Anderson