Jo Dicks John Innes Centre Analysis of crop plant genomes [email protected]

20
Jo Dicks John Innes Centre Analysis of crop plant genomes [email protected] http://jic-bioinfo.bbsrc.ac.uk/bioinformatics- research/

Transcript of Jo Dicks John Innes Centre Analysis of crop plant genomes [email protected]

Page 1: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Jo DicksJohn Innes Centre

Analysis of crop plant genomes

[email protected]://jic-bioinfo.bbsrc.ac.uk/bioinformatics-

research/

Page 2: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

DataWe want to compare the genomes

of crop plants (e.g. wheat, rice, maize, millets, barley, pea)

At present, we mainly compare:Whole genome sequencesGenetic markers (comparative

mapping)Transposable elements

Page 3: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

What can we learn from the data?

Understand evolutionary processes in crop plants.

Use comparative mapping to predict gene/marker location and function across species.

Use transposable elements to maximise diversity within a subset of a germplasm collection (core collection).

Page 4: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Whole genome sequencesLinear streams of data, where each

element is represented of one of four letters (A, C, G or T).

Streams can be long – billions of letters.Blocks of sequence can be meaningful

(e.g. they encode genes or transposable elements) or are deemed ‘junk’.

Species 1: caggaaaacacacactcacatacatgaacaatatctc ||||| || ||||| |||||||| |||| || ||Species 2: caggataatgcacac catacatgcacaaaat tc

Page 5: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Comparative mapping data

1 2 4 5Species 2

1 23 45Species 1

In most data sets, links (homologies) may be spread across chromosomes

Markers have a location and an orientation.When markers in two species are related by descent from a common ancestor, they are called homologues.Comparative mapping data are combinatorial.

Page 6: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Retrotransposons

1

1

2 3 4

2 4

Accession 1

Accession 2

Retrotransposons are a type of transposable element.There are various locations in a genome where they are either present or absent.An entry in a germplasm collection (called an accession) is therefore essentially a barcode representing multiple retrotransposon locations.

Page 7: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Evolution

Data change in time due to errors known as mutations (there are several distinct types of mutation).

Differences between species are often quantified in terms of the number and type of such mutations.

The relationship between species is often represented as a tree of evolution (often called a phylogenetic tree).

Page 8: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

An evolutionary tree

Species 1 Species 2 Species 3 Species 4

Ancestral species

Mutations occur through time, along the tree branches

Page 9: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Data problemsIn comparative mapping studies,

there may be elements between the markers that are important but of which we know nothing (i.e. missing data) and erroneous links between data items (i.e. data errors).

Missing data will be largely alleviated by whole genome sequences (when will this be though?) but there will still be errors in the data.

Page 10: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Projects

UK CropNet (data)CHROMTREE (analysis)

GENE-MINE (data)Germinate (analysis)

JIC are also involved in Arabidopsis and Brassica IGF projects

Page 11: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

UK CropNet databasesUK CropNet curates and develops

databases and data analysis tools for:

Arabidopsis thaliana (AGR)Brassicas (BrassicaDB)Cereals (BarleyDB, CeResDB and MilletGenes)Forage grasses (FoggDB)Potato (SpudBase)

as well as developing a database for:Comparative mapping data (CropSeqDB and

ComapDB)

Page 12: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Problems

To get hold of comparative mapping data from the crop plant community, we need to access disparate data sources of differing quality (not necessarily electronic).

We need to link the data sources to form a single, queriable entity.

Page 13: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

BarleyDB

BrassicaDB

CerealsDB FoggDB

MilletGenes

SpudBase

AGR

The UK CropNet single- and related-species databases

ComapDB

ARCADE

Will the GRID be a better solution than ARCADE?

Page 14: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Analysing chromosomal evolution

Page 15: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Chromosomes evolve over time

Inversion

Inversion

InversionTranslocation

Mutations events can be mathematically modelled and used to construct a phylogenetic

tree

Page 16: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

ProblemsUnlike DNA sequences, data are

combinatorial, not linear.Algorithms are very slow (many require

optimisation over a multi-dimensional space) and analysis of large data sets is not currently possible on JIC machines.

Parallelisation of algorithms may help, as it has done for DNA sequence phylogenetic analysis. However, is the only answer?

In some cases (due to mutations such as allo-polyploidy) we may wish to consider phylogenetic networks instead of trees – an even harder computational problem.

Page 17: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Analysing germplasm collections

GENE-MINE and GERMINATE

Page 18: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Germplasm projectsGENE-MINE: An EU-funded project to

develop a data-management and analysis computer system for plant germplasm collections

GERMINATE: A BBSRC-funded project allied to GENE-MINE and another EU project TEGERM, to develop specialist tools for analysis of the TEGERM data.

The problems seen in these projects are essentially the same as those of UK CropNet and CHROMTREE.

Page 19: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Retrotransposon insertion

1 2 3

Like chromosomal mutations, retrotransposon insertion can be mathematically modelled

Page 20: Jo Dicks John Innes Centre Analysis of crop plant genomes jo.dicks@bbsrc.ac.uk

Relationship between accessions

INS

INS

INS

Again, sometimes we may need to estimate a phylogenetic network (due to introgression between

accessions)