The GUS 3.0 Perl Object Layer CBIL Jonathan Schug June 18 2002.

30
The GUS 3.0 Perl Object Layer CBIL Jonathan Schug June 18 2002

Transcript of The GUS 3.0 Perl Object Layer CBIL Jonathan Schug June 18 2002.

The GUS 3.0 Perl Object Layer

CBILJonathan SchugJune 18 2002

Outline

• Overview• Objects• Using Objects• Superclasses• GA - GusApplication• Plugins

A Programmer's View of GUS3.0

• The application programmer interacts with GUS via the GusApplication (GA) Perl program.

• The GA is a general framework for connecting to GUS30.• Specific tasks are performed by individual plugins.• Plugins use either table-specific classes or SQL access.• Low-level database access is provided by DBI classes.

RAD TESSDoTS

CoreSResDBIPlugin

ClassClassClassClassClass S

uperC

lass

es

SQL

GusApplication

A GUS3.0 Table

Primary key

Table-specific attributes

GUS overhead attributes

Parents - pointed to by this table

Children - point to this table

A GUS3.0 Perl Object Layer Class

GUS30/DoTS/Clone.pm

package DoTS::Clone;use strict;use GUS30::DoTS::gen::Clone_gen;use vars qw (@ISA);@ISA = qw (DoTS::Clone_gen);1;

• Relies on _gen class for accessor methods.• This is stub for hand-edited domain-specific methods.

The _gen Class - I

GUS30/DoTS/gen/Clone_gen.pm

package DoTS::Clone_gen;use strict;use GUS30::dbiperl_utils::RelationalRow;use vars qw (@ISA);@ISA = qw (RelationalRow);sub setDefaultParams { ... }

• Inherits from RelationalRow

• setDefaultParams to determine if versionable and updateable.

The _gen Class - II

GUS30/DoTS/gen/Clone_gen.pm

sub setCloneId { ... } sub getCloneId { ... } sub setLibraryId { ... } sub getLibraryId { ... } sub setImageId { ... } sub getImageId { ... } sub setDbestCloneUid { ... }sub getDbestCloneUid { ... } sub setWashuName { ... } sub getWashuName { ... } sub setGdbId { ... } sub getGdbId { ... } sub setMgiId { ... } sub getMgiId { ... } sub setDbestLength { ... }sub getDbestLength { ... }sub setWashuLength { ... }sub getWashuLength { ... }

• There is an accessor for each column.

• Note the case change and loss of underscores.

The _gen Class - IIIGUS30/DoTS/gen/Clone_gen.pm

sub setModificationDate { ... }sub getModificationDate { ... }sub setUserRead { ... }sub getUserRead { ... }sub setUserWrite { ... }sub getUserWrite { ... }sub setGroupRead { ... }sub getGroupRead { ... }sub setGroupWrite { ... }sub getGroupWrite { ... }sub setOtherRead { ... }sub getOtherRead { ... }sub setOtherWrite { ... }sub getOtherWrite { ... }sub setRowUserId { ... }sub getRowUserId { ... }sub setRowGroupId { ... }sub getRowGroupId { ... }sub setRowProjectId { ... }sub getRowProjectId { ... }sub setRowAlgInvocationId { ... }sub getRowAlgInvocationId { ... }

• There is an accessor for each column.

• Note the case change and loss of underscores.

Hand Edited Methods

• Edit main class file, e.g., GUS30/DoTS/Clone.pm

• Typically placed in GUS30/DoTS/hand_edited/

• Symlink in GUS30/DoTS.• Mostly used in DoTS

section.

DoTS/AAFeature.pm:4DoTS/AASequence.pm:2DoTS/Assembly.pm:76DoTS/AssemblySequence.pm:29DoTS/Evidence.pm:6DoTS/GeneFeature.pm:4DoTS/Gene.pm:9DoTS/IndexWordSimLink.pm:2DoTS/NAFeature.pm:5DoTS/NASequence.pm:4DoTS/RNAFeature.pm:4DoTS/RNA.pm:3DoTS/Similarity.pm:9DoTS/SimilaritySpan.pm:6DoTS/SplicedNASequence.pm:1DoTS/TranslatedAAFeature.pm:3DoTS/TranslatedAAFeatureSegment.pm:2DoTS/TranslatedAASequence.pm:1DoTS/VirtualSequence.pm:1

Creating Objects# get the class use GUS30::DoTS::Clone;…# create new objectmy $clone_gus = DoTS::Clone->new({ washu_length => 5,});

# adjust a column value$clone_gus->setDbestUid(‘A123456’);

# print some values.print $clone_gus->getDbestUid, “\n”;print $clone_gus->toXML, “\n”;

# submit to database$clone_gus->submit;

Connecting Objects

use GUS30::DoTS::Clone;use GUS30::DoTS::CloneLibrary;

My $clone_lib_gus = DoTS::CloneLibrary->new({…});While (<>) {

chomp; my @parts = split /\t/; my $clone_gus = DoTS::Clone->new({…}); # this $clone_lib_gus->addChild($clone_gus); # or this $clone_gus->setParent($clone_lib_gus);}$clone_lib_gus->submit;

Retrieving Objects

Use GUS30::DoTS::CloneLibrary;

My $clone_lib_gus = DoTS::CloneLibrary->new({ clone_library_id => 12345});If ($clone_lib_gus->retrieveFromDB) { $clone_lib_gus->set…(…); $clone_lib_gus->submit; print “found it!\n”;}Else { print “did not find any unique row!\n”;}

Traversing Object Relations - I

Use GUS30::DoTS::CloneLibrary;Use GUS30::DoTS::Clone;

My $clone_lib_gus = DoTS::CloneLibrary({ clone_library_id => 12345});If ($clone_lib_gus->retrieveFromDB) { my @clones = $clone_lib_gus->getChildren(‘DoTS.Clone’,1); foreach (@clones) { … }}

Traversing Object Relations - II

Use GUS30::DoTS::CloneLibrary;Use GUS30::DoTS::Clone;

My $clone_lib = DoTS::Clone->new({ clone_id => 12345});If ($clone_gus->retrieveFromDB) { my $clone_lib_gus = $clone_gus->getParent(‘DoTS.CloneLibrary’,1); . . .}

Deleting ObjectsUse GUS30::DoTS::CloneLibrary;Use GUS30::DoTS::Clone;

My $clone_lib_gus = DoTS::CloneLibrary({ clone_library_id => 12345});If ($clone_lib_gus->retrieveFromDB) { $clone_lib_gus->markDeleted; my @clones = $clone_lib_gus->getChildren(‘DoTS.Clone’,1); foreach (@clones) { $_->markDeleted; } $clone_lib_gus->submit;}

• Recursively deletes children as well.

The Object Cache

• A cache of objects is maintained so that getParents and getChildren always return the same instance of a row.

• Cache is limited in size to avoid large memory requirements.

• Cache is cleared with undefPointerCache method on object or plugin

• Cache size is increased with setMaximumNumberOfObjects method.

Dbiperl_utils

Support and base classes for object classes.

• RelationalRow• DbiRow• DbiTable • DbiDatabase

RelationalRow.pm

• Contains 176 methods in these categories:– Accessors for default overhead values– Accessors for debugging and verbose modes– Pointer cache maintenance– Class information– Parent/child information– XML management– Deletion marking– Submission management– Similarity and Evidence management

• Isa DbiRow

DbiRow.pm

• Contains 43 methods in these categories:– Get/Set methods to support class-specific

accessors– Accessors for table and class names– Attribute information– Tracking attribute value changes– retrieveFromDB– IdentityInsert management– Get DbHandle, MetaHandle, and Database

DbiTable.pm

• 76 methods for– Various table names– Attribute information– Relations information– Primary keys and ids– Others

DbiDatabase.pm

• 103 methods covering these areas:– Database handles– Login information– Database and section names– Transaction management– Table and view names– Object cache– Counters

Overhead Columns

Contain information about:

•History

•Ownership

•Access permissions

•Data provenance

Who manages these columns?

GusApplication (GA)

• Purpose is to standardize database access application

• Provides:– Database login– Default ownership and permissions– Algorithm and parameter tracking– Command line access

Algorithms & Stuff

Algorithm

AlgorithmImplementation

AlgorithmInvocation

AlgorithmParamKey

AlgorithmParamKeyType

AlgorithmParam

Tracks what programs implementing what algorithms were run with what parameters.

GA populates these tables.

GA Usage

ga [<mode>] [<plugin_class>] <plugin_class_options>• <mode> is one of

– +create : creates Algorithm, AlgorithmImplementation, and AlgorithmParamKey

– +update : creates AlgorithmImplementation and AlgorithmParamKey

– +history : lists invocations– +run : runs the plugin (default)

• <plugin_class>– From hierarchical namespace, e.g.,

Utils::UpdateGusFromXML

• <plugin_class_options>– Defined by plugin plus some generic GA options.– E.g., --file data.tab --commit --verbose

Plugins

• A plugin is just a package that inherits from GUS30::GA_plugins::Plugin.

package GUS30::GA_plugins::Utils::UpdateGusFromXML;@ISA = qw(GUS30::GA_plugins::Plugin);

• It must have two methods:– new - to create and initialize the plugin

object– run - perform actions of plugin

The new Method

• Must initialize certain important plugin attributes:

sub new {

my $Class = shift;

my $m = bless {}, $Class;

$m->setUsage(‘what this algorithm does’);

$m->setVersion('2.0');

$m->setRequiredDbVersion({ Core => ‘3’, DoTS => ‘3’ });

$m->setDescription(‘what is new in implementation);

$m->setEasyCspOptions(…); # command line options

return $m

}

Command Line Options

• A hash describing a parameter:– h => hint for user– t => parameter data type (boolean, string, integer,

float)– d => default value– l => is a list if true– e => list of legal reg-exps– r => required if true– o => command line flag

• E.g., { h => 'start label ordinals with this value', t => 'integer', d => 0, o => 'FirstOrdinal', },

GA-Supplied Comand-line Options

• GA adds these options:– commit– verbose– debug– user– group– project– comment– database– server– implementation– algoinvo

• Pink ones also read from config file .gus30.cfg

Example: TESS::LoadMultinomialLabelSet

TESS::MultinomialLabelSet TESS::MultinomialLabel

•Task is to maintain entries in these two tables

•MultinomialLabelSet stores sets of labels for multinomial observations, e.g., DNA, AA, or dimer gaps.

•Can also be DNA or AA dimers, trimers, etc.

•MultinomialLabel stores individual names.