GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson...

23
GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    2

Transcript of GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson...

Page 1: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

GeneWeaver: A prototype for bioinformatics

Michael LuckUniversity of Southampton, UKKevin Bryson and David Jones, UCLMike Joy, University of Warwick

Page 2: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

The Structure of DNAThe Structure of DNA

Page 3: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

The Result of 15 The Result of 15 Years Hard WorkYears Hard Work

> contig 1TAAGTTATTATTTAGTTAATACTTTTAACAATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATAGTTAAATACCTTCCTTAATACTGTTAAATTATATTCAATCAATACATATATAATATTATTAAAATACTTGATAAGTATTATTTAGATATTAGACAAATACTAATTTTATATTGCTTTAATACTTAATAAATACTACTTATGTATTAAGTAAATATTACTGTAATACTAATAACAATATTATTACAATATGCTAGAATAATATTGCTAGTATCAATAATTACTAATATAGTATTAGGAAAATACCATAATAATATTTCTACATAATACTAAGTTAATACTATGTGTAGAATAATAAATAATCAGATTAAAAAAATTTTATTTATCTGAAACATATTTAATCAATTGAACTGATTATTTTCAGCAGTAATAATTACATATGTACATAGTACATATGTAAAATATCATTAATTTCTGTTATATATAATAGTATCTATTTTAGAGAGTATTAATTATTACTATAATTAAGCATTTATGCTTAATTATAAGCTTTTTATGAACAAAATTATAGACATTTTAGTTCTTATAATAAATAATAGATATTAAAGAAAATAAAAAAATAGAAATAAATATCATAACCCTTGATAACCCAGAAATTAATACTTAATCAAAAATGAAAATATTAATTAATAAAAGTGAATTGAATAAAATTTTGGGAAAAAATGAATAACGTTATTATTTCCAATAACAAAATAAAACCACATCATTCATATTTTTTAATAGAGGCAAAAGAAAAAGAAATAAACTTTTATGCTAACAATGAATACTTTTCTGTCAAATGTAATTTAAATAAAAATATTGATATTCTTGAACAAGGCTCCTTAATTGTTAAAGGAAAAATTTTTAACGATCTTATTAATGGCATAAAAGAAGAGATTATTACTATTCAAGAAAAAGATCAAACACTTTTGGTTAAAACAAAAAAAACAAGTATTAATTTAAACACAATTAATGTGAATGAATTTCCAAGAATAAGGTTTAATGAAAAAAACGATTTAAGTGAATTTAATCAATTCAAAATAAATTATTCACTTTTAGTAAAAGGCATTAAAAAAATTTTTCACTCAGTTTCAAATAATCGTGAAATATCTTCTAAATTTAATGGAGTAAATTTCAATGGATCCAATGGAAAAGAAATATTTTTAGAAGCTTCTGACACTTATAAACTATCTGTTTTTGAGATAAAGCAAGAAACAGAACCATTTGATTTCATTTTGGAGAGTAATTTACTTAGTTTCATTAATTCTTTTAATCCTGAAGAAGATAAATCTATTGTTTTTTATTACAGAAAAGATAATAAAGATAGCTTTAGTACAGAAATGTTGATTTCAATGGATAACTTTATGATTAGTTACACATCGGTTAATGAAAAATTTCCAGAGGTAAACTACTTTTTTGAATTTGAACCTGAAACTAAAATAGTTGTTCAAAAAAATGAATTAAAAGATGCACTTCAAAGAATTCAAAetc etc etc

Page 4: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Flow of Biological DataFlow of Biological Data

DNA

Protein Sequence

Protein Structure

Protein Function

… ATG GAT TTC ...

Met Asp Phe ...

Page 5: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Data AnalysisData Analysis

Lots of primary data -- need to discover gene function. Scan databases for similar sequences Collect matching sequences and alignments Infer function from annotations of matched proteins. Analysis by range of existing programs. Interpret results.

Additional factors: some programs/results available over WWW/email; continual updates of primary databases -- need for

reassessment.

Page 6: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Biological DatabasesBiological Databases

DNA Databases GenBANK(Genomes) EMBL

NDBJ

Protein Sequence SwissProtDatabases PIR

Protein Structure PDBDatabases SCOP

CATH

Pattern Databases PROSITEPRINTSBLOCKS

Page 7: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

SwissProt EntrySwissProt EntryID PRIO_BOVIN STANDARD; PRT; 264 AA.AC P10279;DT 01-MAR-1989 (Rel. 10, Created)DT 01-NOV-1991 (Rel. 20, Last sequence update)DT 15-JUL-1998 (Rel. 36, Last annotation update)DE MAJOR PRION PROTEIN 1 PRECURSOR (PRP) (MAJOR SCRAPIE-ASSOCIATED FIBRILDE PROTEIN 1).GN PRNP.OS Bos taurus (Bovine).OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;...CC -!- FUNCTION: THE FUNCTION OF PRP IS NOT KNOWN. PRP IS ENCODED IN THECC HOST GENOME AND IS EXPRESSED BOTH IN NORMAL AND INFECTED CELLS.CC -!- SUBUNIT: PRP HAS A TENDENCY TO AGGREGATE YIELDING POLYMERS CALLEDCC "RODS".CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR.CC --------------------------------------------------------------------------CC This SWISS-PROT entry is copyright. It is produced through a collaborationCC between the Swiss Institute of Bioinformatics and the EMBL outstation -...SQ SEQUENCE 264 AA; 28614 MW; DEA01B4E CRC32; MVKSHIGSWI LVLFVAMWSD VGLCKKRPKP GGGWNTGGSR YPGQGSPGGN RYPPQGGGGW GQPHGGGWGQ PHGGGWGQPH GGGWGQPHGG GWGQPHGGGG WGQGGTHGQW NKPSKPKTNM KHVAGAAAAG AVVGGLGGYM LGSAMSRPLI HFGSDYEDRY YRENMHRYPN QVYYRPVDQY SNQNNFVHDC VNITVKEHTV TTTTKGENFT ETDIKMMERV VEQMCITQYQ RESQAYYQRG ASVILFSSPP VILLISFLIF LIVG//

Page 8: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

PDB EntryPDB EntryHEADER PRION PROTEIN 20-SEP-99 1QM3 TITLE HUMAN PRION PROTEIN FRAGMENT 121-230 COMPND MOL_ID: 1; COMPND 2 MOLECULE: PRION PROTEIN; COMPND 3 CHAIN: A; COMPND 4 SYNONYM: PRP, MAJOR PRION PROTEIN, PRP27-30, PRP33-35C, COMPND 5 (ASCR).PRP; COMPND 6 FRAGMENT: RESIDUES 121-230; COMPND 7 ENGINEERED: YES; COMPND 8 MUTATION: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 ORGAN: BRAIN; ...ATOM 1 N LEU A 125 5.041 -9.143 -1.920 1.00 0.00 N ATOM 2 CA LEU A 125 4.764 -7.837 -1.351 1.00 0.00 C ATOM 3 C LEU A 125 5.308 -7.848 0.071 1.00 0.00 C ATOM 4 O LEU A 125 4.554 -8.101 1.013 1.00 0.00 O ATOM 5 CB LEU A 125 3.275 -7.484 -1.391 1.00 0.00 C ATOM 6 CG LEU A 125 2.781 -7.205 -2.821 1.00 0.00 C ATOM 7 CD1 LEU A 125 1.683 -8.197 -3.182 1.00 0.00 C ATOM 8 CD2 LEU A 125 2.266 -5.774 -2.970 1.00 0.00 C ATOM 9 H LEU A 125 4.919 -9.913 -1.281 1.00 0.00 H ATOM 10 HA LEU A 125 5.307 -7.076 -1.916 1.00 0.00 H ATOM 11 1HB LEU A 125 2.703 -8.290 -0.932 1.00 0.00 H ...

Page 9: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Homology/Similarity PSI-BLASTSearching BLAST

FASTA

Sequence Clustal-WAlignment GCG Pileup

Motif/Pattern PROSITESearching HMMer

Secondary Structure PSIPREDPrediction PHD

DSC

Analysis ToolsAnalysis Tools

Page 10: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

BLAST OutputBLAST Output

...Database: pdb_seq 14,442 sequences; 3,011,261 total letters

Searching..................................................done

Score ESequences producing significant alignments: (bits) Value

pdb|1NBD|1 cftrfragment: nbd1, first (or n-terminal) nucleotide-... 79 6e-16pdb|1WAI|1 DNA polymerase(t4 gp43)DNA substrate (tttt)DNA 28 1.0…

>pdb|1NBD|1 cftrfragment: nbd1, first (or n-terminal) nucleotide-binding domain; (cftr nbd1, cystic fibrosis transmembrane conductance regulator nucleotide-binding domain 1) Length = 214 Score = 78.8 bits (191), Expect = 6e-16 Identities = 37/40 (92%), Positives = 39/40 (97%)

Query: 4 TTLLVTSKMEHLKKADKILILHEGSSYFYGTFSELQNLRP 43 T +LVTSKMEHLKKADKILILHEGSSYFYGTFSELQNL+PSbjct: 175 TRILVTSKMEHLKKADKILILHEGSSYFYGTFSELQNLQP 214...

Page 11: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Alignment OutputAlignment Output

*>>>TRANSGELIN : TRANSGELIN SEQUENCE*P1;A60598 : actin-associated protein p27 - mouse*>>>SM22_RAT : SMOOTH MUSCLE PROTEIN 22-ALPHA (SM22-ALPHA)....

MANKGPSYGMSREVQSKIEKKYDEELEERLVEWIIVQCGPDVGRPDRGRLGFQVWLKNGVILSKLVNSLYPDGSKPVKVPMANKGPSYGMSREVQSKIEKKYDEELEERLVEWIVVQCGPDVGAPDRGRLGFQVWLKNGVILSKLVNSLYPEGSKPVKVP ANKGPSYGMSREVQSKIEKKYDEELEERLVEWIVMQCGPDVGRPDRGRLGFQVWLKNGVILSKLVNSLYPEGSKPVKVPMANKGPSYGMSREVQSKIEKKYDEELEERLVEWIVMQCGPDVGRPDRGRLGFQVWLKNGVILSKLVNSLYPEGSKPVKVP ANKGPAYGMSRDVQSKIEKKYDDELEDRLVEWIVAQCGSSVGRPDRGRLGFQVWLKNGIVLSQLVNSLYPDGSKPVKIPMANKGPAYGMSRDVQSKIEKKYDDELEDRLVEWIVAQCGSSVGRPDRGRLGFQVWLKNGIVLSQLVNSLYPDGSKPVKIP ANKGPSYGMSREVQSKIEKKYDEELEERLVEWIIVQCGPDVGRPDRGPLGFQVWLKNGVILSKLVNSLYPDGSKPVKVPMANKGPSYGMSREVQSKIEKKYDEELEERLVEWIIVQCGPDVGRPDRGRLGFQVWLKNGVILSKLVNSLYPEGSKPVKVPMANRGPAYGLSREVQQKIEKQYDADLEQILIQWITTQCRKDVGRPQPGRENFQNWLKDGTVLCELINALYPEGQAPVKKIMANRGPSYGLSREVQEKIEQKYDADLENKLVDWIILQCAEDIEHPPPGRTHFQKWLMDGTVLCKLINSLYPPGQEPIPKI MSLERAVRAKIAGKRNPEMDKEAQEWIEAIIAEKFPAGQS YEDVLKDGQVLCKLINVLSPNA VPKV EFPPSGLSYQVKKKLEGKRDKDQENEALEWIEALTGLKLDRSKL YEDILKDGTVLCKLMNSIKPGC IKKI MELWRQCTHWLIQCRVLPPSHRVTWDGAQVCELAQALRDGVLLCQLLNNLLPHAINLREVN MELWRQCTHWLIQCRVLPPSHRVTWEGAQVCELAQALRDGVLLCQLLNNLLPQAINLREVN MSMEGISYTNSNPSATPNMEDTLLTFSMGILPITMDCDPVTQLSQLFQQGAPLCILFNSVKPQF KLP

ENPPSMVFKQMEQVAQFLKAA EDYGVTKTDMFQTVDLFEGKDMAAVQRTVMALGSLAVTKNDGHYRGDPNWFMKKAQEHENPPSMVFKQMEQVAQFLKAA EDYGVIKTDMFQTVDLYEGKDMAAVQRTLMALGSLAVTKNDGNYRGDPNWFMKKAQEHENPPSMVFKQMEQVAQFLKAA EDYGVTKTDMFQTVDLFEGKDMAAVQRTVMALGSLAVTKNDGHYRGDPNWFMKKAQEHENPPSMVFKQMEQVAQFLKAA EDYGVTKTDMFQTVDLFEGKDMAAVQRTVMALGSLAVTKNDGHYRGDPNWFMKKAQEH...

Page 12: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Determining Protein FunctionDetermining Protein Function

Protein Sequence (Genome)

Remove regions oflow complexity (SEG)

Rapid similaritysearch against allknown proteins (PSI-BLAST)

Slower, more sensitiveprotein categorysearch (HMMer)

Rapid protein analysis tools,i.e. motif search (ScanProsite)

Consistent and sensible (Human)

Annotate function.

E < 0.001

Page 13: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Primary database agents manage remote primary sequence databases, providing up-to-date data in various common formats.Non-redundant database agents filter and combine data from various primary database agents into non-redundant data sources. Calculation agents encapsulate pre-existing methods or tools for the analysis of data to determine function.Genome agents manage genome information for a particular organism and use other agents to derive annotations.Broker agents provide information about agents registered within the agent community.

Agent ClassesAgent Classes

Page 14: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

SwissAgent (PrimaryDB)

PDBAgent (PrimaryDB)

PIRAgent (PrimaryDB)

Non-redundant Protein Agent (NRDB)

BrokerAgent (Broker)

HInfAgent (Genome)

Web

HInfAgent (Genome)

HInfAgent (Genome)

BlastAgent (Calculation)

ClustalAgent (Calculation)

GeneWeaver Agent CommunityGeneWeaver Agent Community

Page 15: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

register Register with a broker.unregister Cancel a registration with a broker.ask Ask about data.derive Request an agent to derive particular data.tell Inform another agent about data.deny Inform another agent about lack of data.subscribe Obtain regular updates of certain data.unsubscribe Stop receiving regular updates of data.ok Indicates success.sorry Indicates failure on the agent’s part.error Indicates problem with protocol or other error.

BAL PerformativesBAL Performatives

Page 16: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Metadata

AgentInfo General information about an agent.ProviderInfo Information about a provider protocol.SkillInfo Information about a skill.PlanInfo Information about a plan.

Data

Genome A genome.SeqFile A sequence file.SeqEntry A sequence entry.

Example Types of BAL DataExample Types of BAL Data

Page 17: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

BAL Message ExampleBAL Message Example

Sender: //localhost.localdomain/7/HInfReceiver: //localhost.localdomain/0/BrokerTransport: rmiLanguage: balPerform: registerRef: hinf77f001_0Content:AgentInfo(TYPE = Genome,OWNER = hinf77f001,UPD_TIME = 962601420367,MOD_TIME = 962601420367,ID = HInf,DESCRIPTION = "H. Influenzae Genome Agent")

Page 18: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

RStart

Register Conversation ClassRegister Conversation Class

Requester

Provider

RRegistering RRegistered

RDeclined RDoneRError RTimeout

> register< ok

RUnregistering

> unregister

< sorry< sorry

< ok

PStart PRegistering PRegistered

PDeclined PDone

PError PTimeout

< register> ok

PUnregistering

< unregister

> sorry> sorry

> ok

Page 19: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

SwissAgent (PrimaryDB)

Sequences

NonRedundant Protein Agent NRDB

Subscribed to:Broker for PrimaryDB infoPrimaryDBs for Sequences

Swiss SequencesPDB SequencesPIR Sequences

PDBAgent (PrimaryDB)

Sequences

PIRAgent (PrimaryDB)

Sequences

Web

SwissAgent (PrimaryDB)

Sequences

tellSequences

NonRedundant Protein Agent NRDB

Subscribed to:Broker for PrimaryDB infoPrimaryDBs for Sequences

Swiss SequencesPDB SequencesPIR Sequences

FlyDBAgent (PrimaryDB)

Fly Sequences

register FlyDBAgentok

BrokerAgent (Broker)

Subscribed to:All agents for Info

SwissAgent Info PDBAgent Info PIRAgent InfoNRDBAgent Info

subscribe Agent Infotell Agent Info

BrokerAgent (Broker)

Subscribed to:All agents for Info

SwissAgent Info PDBAgent Info PIRAgent InfoNRDBAgent Info FlyDBAgent Info tell

FlyDBAgent Info

Subscribe Sequencestell Sequences

NonRedundant Protein Agent NRDB

Subscribed to:Broker for PrimaryDB infoPrimaryDBs for Sequences

Swiss SequencesPDB SequencesPIR SequencesFly Sequences

Agent Interaction: ExampleAgent Interaction: Example

Page 20: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Higher Level GoalsDeriveGoalAgent should try to derive data with particular properties.UpdateGoalAgent should try to update data matching a given template.RelationGoalAgent should attempt to establish the given type ofrelationship with another agent.

Lower Level Goals DoGoal, QueryGoal, TellGoal.

GoalsGoals

Page 21: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Communication Other Agents

Interaction

Messages

Control

Goal Manager Plan Library

GoalsInteractions

Meta-Store Motivation

Action Data-StoreAnalysis Tools

Goals MetaData

ActionsData

Agent ArchitectureAgent Architecture

Page 22: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

Genome Agent1 In response to a higher level motivation, DeriveGoal(SeqFunction) is created to annotate any sequences with annotated function confidence < 0.5.2 Using a plan from the plan library, DeriveGoal(SeqFunction) is decomposed into RelationalGoal(derive), DeriveGoal(Homologue) and function assignment using the homologue if confidence > 0.5.3 A suitable agent with DeriveProvider and ‘homology’ skill is located.4 Derive requester interaction used to accomplish RelationGoal(derive).

Blast Agent5 Skill used to satisfy DoGoal(Homologue).

Annotate Function ExampleAnnotate Function Example

Page 23: GeneWeaver: A prototype for bioinformatics Michael Luck University of Southampton, UK Kevin Bryson and David Jones, UCL Mike Joy, University of Warwick.

SummarySummary

Applications: bioinformatics problem not created by the technologies used to solve it practical developments to inform conceptual infrastructure

Tensions between biological sciences and computer science

Work remaining Consolidation of existing prototype Inclusion of multiple calculation agents Evaluation of implementation infrastructure Staged and full deployment

Future work: agent marketplace with calculation agents competing?