Amo amos amot amomus amotis amont. Happy birthday Swiss-Prot Fortaleza August 2006.

43
amo amos amot amomus amotis amont . Happy birthday Swiss-

Transcript of Amo amos amot amomus amotis amont. Happy birthday Swiss-Prot Fortaleza August 2006.

amoamos

amotamomus

amotisamont

. Happy birthday Swiss-Prot Fortaleza August 2006

Three (Orthogonal) Ontologies

• Biological Process

– Goal or objective within cell, tissue ..

• Molecular Function

– Elemental activity or task

• Cellular Component

– Location or complex

•molecular function 7,432 terms•biological process 10,740 terms•cellular component 1,772 terms

•all 19,994 terms

definitions 19,042 (96%)

Content of GO

!version v.4.2!date 4 November 1998!author Michael Ashburner$Gene Ontology ; GO:0000001 ; remark: $function ; GO:0000002 ; remark: %macromolecule ; GO:0000003 ; remark: %protein ; GO:0000004 ; remark: %enzyme ; GO:0000005 ; remark: %alpha-alpha-trehalase ; GO:0000006 ; remark: ; EC:3.2.1.28 %alpha-alpha-trehalose-phosphate synthase (UDP-forming) ; GO:0000007 ; remark: ; EC:2.4.1.15 %alpha-L-fucosidase ; GO:0000008 ; remark: ; EC:3.2.1.51 %alpha-N-acetylglucosaminidase ; GO:0000009 ; remark: ; EC:3.2.1.50 %alpha-amylase ; GO:0000010 ; remark: ; EC:3.2.1.1 %alpha-glucosidase II ; GO:0000011 ; remark: ; EC:3.1.2.20 %alpha-ketoacid dehydrogenase complex ; GO:0000012 ; remark: <oxoglutarate dehydrogenase (lipoamide) ; GO:0000013 ; remark: ; EC:1.2.4.2

....

%DNA-directed DNA polymerase ; GO:0000054 ; remark: ; EC:2.7.7.7 %nuclear DNA-directed DNA polymerase ; GO:0000055 ; remark: %alpha DNA polymerase ; GO:0000056 ; remark: <alpha DNA polymerase, 180Kd-subunit ; GO:0000057 ; remark:

ma11> wc gene_ontology.v4.1 3081 22643 192480 gene_ontology.v4.1

Banbury Center, CSH Labs, August 1998The founding meeting of the Gene Ontology Consortium

Problems with the GO:

is_a and part_of relationships are poorly definedand not used consistently.

carries a baggage of implicit ontologies.

lack of relationships between the three GOontologies.

Problems with the GO:

is_a and part_of relationships are poorly definedand not used consistently.

carries a baggage of implicit ontologies.

lack of relationships between the three GOontologies.

• cysteine biosynthesis (ChEBI)• myoblast fusion (Cell Type Ontology)• hydrogen ion transporter activity (ChEBI)• snoRNA catabolism (Sequence Ontology)• wing disc pattern formation (Drosophila anatomy)• epidermal cell differentiation (Cell Type Ontology)• regulation of flower development (Plant anatomy)• interleukin-18 receptor complex (not yet in OBO)• B-cell differentiation (Cell Type Ontology)

Implicit ontologies within the GO:

B-cell

differentiation

lymphocytedifferentiati

onlymphocyte

B-cell

GO CL

is_a

cell differentiationbloodcell

B-cellactivatio

n

Integrating ontologies

[Term]id: GO:0030183name: B-cell differentiationis_a: GO:0042113 ! B-cell activationis_a: GO:0030098 ! lymphocyte differentiationintersection_of: is_a GO:0030154 ! cell differentiationintersection_of: has_participant CL:0000236 ! B-cell

[Term]id: CL:0000236name: B-cellis_a: CL:0000542 ! lymphocytedevelops_from: CL:0000231 ! B-lymphoblast

Augmented GO

CELL Ontology

Problems with the GO:

is_a and part_of relationships are poorly definedand not used consistently.

carries a baggage of implicit ontologies.

lack of relationships between the three GOontologies.

molecular_function

biological_process

obo

obo.sf.net

www.bioontology.org

obofoundry.org

• To create the conditions for a step-by-step evolution towards robust gold standard reference ontologies in the biomedical domain.

• To introduce some of the features of scientific peer review into biomedical ontology development.

The OBO Foundry

The OBO Foundry

A subset of OBO ontologies whose developers agree in advance to accept a common set of principles designed to assure

– intelligibility to biologist curators, annotators, users– formal robustness – stability– compatibility– interoperability – support for logic-based reasoning

• The ontology is open and available to be used by all.

• The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. The importance of community collaboration cannot be overstated.

• The ontology is in, or can be instantiated in, a common formal language.

• The ontology possesses a unique identifier space within OBO.

• The ontology provider has procedures for identifying distinct successive versions.

The OBO Foundry

• The ontology has a clearly specified and clearly delineated content.

• The ontology includes textual definitions for all terms.

• The ontology is well-documented.

• The ontology has a plurality of independent users.

• The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.

The OBO Foundry

Foundational relationsis_apart_of

Spatial relationslocated_incontained_inadjacent_to

Temporal relationstransformation_ofderives_frompreceded_by

Participation relationshas_participanthas_agent

regulates

Genome Biology 6:R46, 2005.

Good ontologies require:Consistent use of terms, supported by logically coherent (non-circular) definitions, in equivalent human-readable and computable formats

Coherent shared treatment of relations to allow cascading inference both within and between ontologies

Ontology = A Representation of Types

Each node of an ontology consists of:

• preferred term

• term identifier

• synonyms

• definition, glosses, comments

Ontology = A Representation of Types

Nodes in an ontology are connected by relations:

primarily: is_a (= is subtype of) and part_of

designed to support search, reasoning and annotation

The aims of SO

1. Develop a shared set of terms and concepts to annotate biological sequences.

2. Apply these in our separate projects to provide consistent query capabilities between them.

3. Provide a software resource to assist in the application and distribution of SO.

The scope of the SO

1. Features that can be located on a sequence with coordinates. exon, promoter, binding_site

2. Properties of these features:– Sequence attributes

• Maternally_imprinted_gene

– Consequences of mutation• mutation_affecting_editing

– Chromosome variation• aneuploid

What is a pseudogene?

• Human– Sequence similar to known protein but contains

frameshift(s) and/or stop codons which disrupts the ORF.

• Neisseria– A gene that is inactive - but may be activated by

translocation (e.g. by gene conversion) to a new chromosome site.

– - note such a gene would be called a “cassette” in yeast.

Give me all the dicistronic genes

• Define a dicistronic gene in terms of the

cardinality of the transcript to open-reading-

frame relationship and the spatial arrangement

of open-reading frames.

• ISA—927 relationships • PARTOF—186 relationships

holonym meronym

Relationships allow reasoning.

• VALIDATION - We can check the internal consistency of an annotation against the ontology. We can also check that any topological assertions are true.

3’ UTR part_of mRNA

intron part_of mRNA

• The formal properties of parts:

1. If A is a proper part of B then B is not a part of A

(nothing is a proper part of itself)

2. If A is a part of B and B is a part of C then A is a part of

C

• Because of these rules, we can apply functions

to parts…

Classical Extensional Mereology

EM operation Definition

Overlap

(x○y)

x and y overlap if they have a part in common.

Disjoint

(xιy)

x and y are disjoint if they share no parts in common.

Binary Product

(x.y)

The parts that x and y share in common.

Difference

(x–y)

The largest portion of x which has no part in common with y.

Binary Sum

(x+y)

The set consisting of individuals x and y

Extensional Mereology (EM) : a formal theory of parts

http://www.geneontology.org

Gene Ontology Consortium

The Pathogen Group

Schizosaccharomyces pombeGenome Sequencing Project

DictyBaseQuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.