Eric Just – BOSC 2007 - July 20, 2007
Modware: An Object-oriented Perl Interface to the Chado
Schema
Eric Just
Senior Bioinformatics Scientist
dictyBase: http://dictybase.org
Northwestern University
Generic Model Organism Database Project (GMOD)
Eric Just – BOSC 2007 - July 20, 2007
Agenda• What?
– What is Chado?– What is Modware?
• Why?• How?
– Example: a protein coding gene• Storing in Chado• Writing a Web Page with Modware
• Try– Getting Modware
Eric Just – BOSC 2007 - July 20, 2007
What is Chado?
• Standardized genomics database schema• Developed by FlyBase• Adopted and distributed by GMOD project• Extremely flexible and compact
– Heavy ontology usage– Entity-Attribute-Value tables
• Modularized for different areas of bioinformatics
See Talk on Chado at ISMB (Paper 65)
Eric Just – BOSC 2007 - July 20, 2007
What is Modware
• Open-source tool for programmers who write software to QUERY AND UPDATE Chado
• Object-oriented API with semantically sensible classes and methods
• Developed at dictyBase • Working with GMOD
Eric Just – BOSC 2007 - July 20, 2007
Agenda• What?
– What is Chado?– What is Modware?
• Why?• How?
– Example: a protein coding gene• Storing in database• Writing a Web Page with Modware
• Try– Getting Modware
Eric Just – BOSC 2007 - July 20, 2007
Why Modware Exists
• Chado has many business rules• Modware encapsulates many Chado business rules• Faster, more efficient development time• More readable code• UI changes, logic does not• Leverage GMOD/Chado/Open Source community
Eric Just – BOSC 2007 - July 20, 2007
Agenda• What?
– What is Chado?– What is Modware?
• Why?• How?
– Example: a protein coding gene• Storing in database• Writing a Web Page with Modware
• Try– Getting Modware
Eric Just – BOSC 2007 - July 20, 2007
A Simple Gene ExampleA gene is a region on a chromosome that encloses one or more transcript objects. An mRNA is a protein-coding transcript is composed of one or more exons which have coordinates on a chromosome.
Chromosome 3
Eric Just – BOSC 2007 - July 20, 2007
Storing mlcE in Chado
srcfeature feature_id fmin fmax strand
100 101 67979 68706 1
100 102 67979 68706 1
100 103 67979 67982 1
100 104 68256 68706 1
Feat_id Feat_type name
100 250 Chr 3
101 251 mlcE
102 252 DDB0214813
103 253 _DDB0214813_exon_1
104 253 _DDB0214813_exon_2
cv_id cv
1 Sequence Ontology
cvterm_id cvterm cv
250 chromosome 1
251 gene 1
252 mRNA 1
253 exon 1
Located on
part of
part of
cv_id cv
1 Sequence Ontology
2 Relationsip Ontology
cvterm_id cvterm cv
250 chromosome 1
251 gene 1
252 mRNA 1
253 exon 1
301 part_of 2
subject_id type_id object_id
102 301 101
103 301 102
104 301 102
CV (controlled vocabulary)
Feature
Featureloc
Feature_relationship
CVterm
Eric Just – BOSC 2007 - July 20, 2007
A Simple Gene Page
Eric Just – BOSC 2007 - July 20, 2007
#!/usr/bin/perluse Modware::Feature;use CGI;my $id = CGI::param(‘primary_id');my $count = 1;
# Get all data from databasemy $feature = new Modware::Feature( -primary_id => $id );my $chromosome = $feature->reference_feature()->name();my $gene = $feature->gene()->name();my @exons = $feature->bioperl()->exons();my $sequence = $feature->sequence( -type => ’protein', -format => 'fasta' );
# print the reportprint CGI->header;print "<pre>";
print $id." is on chromosome $chromosome";print " and is the gene $gene\n";
# print the number and position of each exonforeach my $exon (@exons) { print "Exon $count. start=".$exon->start(). " end=".$exon->end()."\n"; $count++;}print $sequence;print "</pre>";
Eric Just – BOSC 2007 - July 20, 2007
Modware::Features
Modware Feature Classes
• Gene
• mRNA
• ncRNA
• Contig
• Chromosome
• EST
• Generic (catch-all)
Modware can manage the following annotations
• Sequence• Location• Name• Synonyms• Description• Public identifiers• External identifiers
(dbxrefs)
Eric Just – BOSC 2007 - July 20, 2007
Modware::Search
These classes retrieve groups (iterators) of features
Modware::Search::Gene->Search_by_name_and_synonym(‘*kinase*’);
Modware::Search::Feature->Search_overlapping_feats_by_range( ‘Chr3’, 100000, 500000, ‘mRNA’);
Location searchesFind all protein-coding genes on Chromosome 3 between bases 100,000 and 500,000
Text searchesRetrieve all kinase genes
Eric Just – BOSC 2007 - July 20, 2007
Updating a Gene Name, add Synonym
# get genemy ($gene) = new Modware::Search::Gene->Search_by_name(‘mlcA' );
# change the name$gene->name( ‘newname' );
# add a synonym$gene->add_synonym( ‘mlcA' );
# write changes to database$gene->update();
Eric Just – BOSC 2007 - July 20, 2007
Modware Goals
• Future releases will include:– Literature annotations– GO annotations– Phenotype annotations
• Incorporate feedback from users
Eric Just – BOSC 2007 - July 20, 2007
Agenda• What?
– What is Chado?– What is Modware?
• Why?• How?
– Example: a protein coding gene• Storing in database• Writing a Web Page with Modware
• Try– Getting Modware
Eric Just – BOSC 2007 - July 20, 2007
Getting Modware
http://gmod-ware.sourceforge.net
• Download the NEW Virtual Machine
• Modware is preinstalled and ready for you!
• See me if you have any questions or want a demo
• Visit Poster N41 at ISMB
Eric Just – BOSC 2007 - July 20, 2007
Online Documentationhttp://gmod-ware.sourceforge.net/doc/
Eric Just – BOSC 2007 - July 20, 2007
DankesThe organizers of BOSC 2007O|B|F
dictyBase
• PIs– Rex Chisholm, PhD– Warren Kibbe, PhD
• Programmer– Sohel Merchant
• Curators– Petra Fey– Pascale Gaudet, PhD
Other Groups
• Funding– NIH (NIGMS and NHGRI)
• GMOD– Scott Cain– Brian O’Connor
• Chado developers
• Bioperl developers
Eric Just – BOSC 2007 - July 20, 2007
# USE CASE: Add a description, dbxref, and an exon
my $transcript = new Modware::Feature( -primary_id => 'DDB0233595' );$transcript->description( 'Gene model derived from AU12345' );$transcript->add_external_id( -source => 'GenBank Accession Number', -id => 'AU12345' );
# call the bioperl method to retrieve bioperl representation of object# need this to view/edit exon structure$bioperl = $transcript->bioperl();
# here, we are manipulating a Bio::SeqFeature::Gene object# shift the last exon back a little bit (to lose stop codon)[$bioperl->exons()]->[2]->start( 281050 );
# create a new exon and add it to the featuremy $exon = Bio::SeqFeature::Gene::Exon->new( -start => 280921, -end => 280959, -strand => -1 );$exon->is_coding(1);$bioperl->add_exon($exon);
# update writes everything to the database$transcript->update();
Eric Just – BOSC 2007 - July 20, 2007
Modware::Feature
Modware::Feature::GENE
Modware::Feature::MRNA
Bio::SeqFeature::Gene::Transcript Bio::Seq
Modware::Feature::CHROMOSOME
Bio::SeqFeature::Gene::Exon
Eric Just – BOSC 2007 - July 20, 2007
Feature
ncRNA mRNA Contig ChromosomegetOverlappingFeatures()getOverlappingAlignments()
Bio::SeqFeature::Gene::Transcript Bio::SeqFeature::GenericBio::SeqFeature::Generic Bio::Seq