Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero,...

28
Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester

Transcript of Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero,...

Page 1: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Using Ontology Reasoning to Classify Protein Phosphatases

K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens

University of Manchester

Page 2: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Introduction

Automated classification of proteins into protein subfamilies

1. Background

2. Architecture

3. Advantages

4. Results

5. Future directions

Page 3: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Motivation

Biological data production fast- High throughput techniques- Large numbers of species being sequenced- Large amount of data uncharacterised

Data analysis is now the rate-limiting step

Page 4: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Why Classify?

• Classification and curation of a genome is the first step in understanding the processes and functions happening in an organism

• Classification enables comparative genomic studies - what is already known in other organisms

• The similarities and differences between processes and functions in related organisms often provide the greatest insight into the biology

Page 5: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Protein Classification

• Proteins divided into broad functional classes

“Protein Families”

- evolutionary relationships

- common domain architecture• Relationship between sequence and structure

allows searching for distinct structural (and functional) domains within the sequence

• Domains could be several amino acids long – or could span most of the protein

Page 6: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Example

A search of the linear sequence of protein tyrosine phosphatase type K – identified 9 functional domains

>uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa).

MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHVSAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNPGTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYIAIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV………..

Page 7: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Protein Family Classification

• Often diagnostic domains/motif signify family membershipe.g. ALL proteins with a tyrosine protein kinase-specific active site (IPR008266) domain are types of tyrosine kinase

Page 8: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Current Techniques

• Human expert classification– gold standard

– human knowledge applied to results from bioinformatics analysis tools

• Automated use of bioinformatics analysis tools– quick– less detailed

Page 9: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Automated Methods

Bioinformatics analysis tools• top BLAST hit - annotating as ‘similar to’ other

known proteins

- Could result in protein A is similar to protein B, which is similar to protein C, which is similar to protein D etc, etc,

• Interpro Scan analysis

- shows number and types of domains, but does not provide interpretations

Page 10: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Human Expert Annotation

• Same similarity searching tools used for domain/motif identification

• Humans use expert knowledge to classify proteins according to domain arrangementsPresence / order / number of each important

Can an ontology be used to capture this knowledge to the standard of a human annotator?

Page 11: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Ontology Approach

• Use ontology to capture the ‘rules’ for protein family membership in formal OWL representation

• Ontology contains the human expert knowledge• Ontology reasoning can take the place of human

analysis of the data

Page 12: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

The Protein Phosphatases

• large superfamily of proteins – involved in the removal of phosphate groups from molecules

• Important proteins in almost all cellular processes

• Involved in diseases – diabetes and cancer• human phosphatases well characterised

Page 13: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Phosphatase Functional Domains

Andersen et al (2001) Mol. Cell. Biol. 21 7117-36

Page 14: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Determining Class Definitions

R5 - Contains 2 protein tyrosine phosphatase domains- Contains 1 transmembrane domain- Contains 1 fibronectin domains- Contains 1 carbonic anhydrase

Page 15: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Protégé OWL Modelling

Page 16: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Requirements

• Extract phosphatase sequences from rest of protein sequences from a whole genome

• Identify the domains present in each• Compare these sequences to the formal ontology

descriptions• Classify each protein instance to a place in the hierarchy

Page 17: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Architecture

Instance

Store

myGrid

Services

OWL DL

ontology

Reasoner

(racer)

Classified Protein

Phosphatases

Raw protein

sequences

Page 18: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

myGrid Services

• extract protein phosphatase sequences from whole genome using simple filtering

– patmatdb EMBOSS tool used to extract proteins with phosphatase diagnostic motifs

• perform InterproScan to determine domain architecture• transform the InterproScan results into abstract OWL

instance descriptions

Page 19: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

InterproScan Results

Page 20: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Conversion to abstract OWL format

restriction(<http://www.owl-ontologies.com/unnamed.owl#containsDomainIPR000340> cardinality(1))

restriction(<http://www.owl-ontologies.com/unnamed.owl#containsDomainIPR001763> cardinality(1))

restriction(<http://www.owl-ontologies.com/unnamed.owl#containsDomainIPR000387> cardinality(1))

Page 21: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Instance Store

• Instance Store enables reasoning over individuals

• Can support much higher numbers of individuals

• OWL ontology is loaded into the instance store• A DL reasoner (racer) is used to compare

individuals to the OWL ontology definitions

Page 22: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Instance Store

Page 23: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Example Instances

• Protein Individual

Dual Specificity Phosphatase DUSE

restriction(<http://www.owl-ontologies.com/unnamed.owl#containsDomainIPR000340> cardinality(1))

restriction(<http://www.owl-ontologies.com/unnamed.owl#containsDomainIPR000387> cardinality(1))

• Ontology Definition of Dual Specificity Phosphatase

containsDomain IPR000340

Necessary and Sufficient for class membership

Also inherits

containsDomain IPR000387 from Parent Class PTP

Page 24: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Results

• Human phosphatases have been classified using the system

• The ontology classification performed equally well as expert classification

• The ontology system refined classification- DUSC contains zinc finger domain

characterised and conserved – but not in classification

- DUSA contains a disintegrin domain

previously uncharacterised – evolutionarily conserved

Page 25: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Aspergillus fumigatus

• Phosphatase proteins very different from human>100 human <50 A.fumigatus

• Whole subfamilies ‘missing’Different fungi-specific phosphorylation pathways?

No requirement for tissue-specific variations?

• Novel serine/threonine phosphatase with homeobox

conserved in aspergillus and closely related species, but not in any other - virulence

Page 26: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Ongoing Work

• Phosphatases in other genomes– Trypanosomes– Plasmodium falciparum

• Other protein families– Ion Channels– ABC transporters– Nuclear receptors

Page 27: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Conclusions

• Using ontology allows automated classification to reach the standard of human expert annotation

• Reasoning capabilities allow interpretation of domain organisation

• Highlights anomalies and variations from what is known

• Allows fast, efficient comparative genomics studies

Page 28: Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.

Acknowledgements

PhD Supervisors: Andy Brass, Robert Stevens

Group: myGrid, Phil Lord, Carole Goble

Phosphatase Biologist: Lydia Tabernero

Medical Research Council