Henrik Bengtsson [email protected] Mathematical Statistics, Centre for Mathematical Sciences Lund...

22
Henrik Bengtsson [email protected] Mathematical Statistics, Centre for Mathematical Sciences Lund University, Sweden DSC-2003, Vienna. March 20-22, 2003 The R.oo package: The R.oo package: Robust object-oriented Robust object-oriented design & implementation design & implementation with support for references with support for references

Transcript of Henrik Bengtsson [email protected] Mathematical Statistics, Centre for Mathematical Sciences Lund...

Henrik [email protected]

Mathematical Statistics, Centre for Mathematical Sciences

Lund University, Sweden

DSC-2003, Vienna. March 20-22, 2003

The R.oo package:The R.oo package:Robust object-oriented Robust object-oriented

design & implementation design & implementation with support for referenceswith support for references

2 of 22 http://www.maths.lth.se/help/R/

Outline

• Purpose and what the package is and is not.• RCC: R Coding Conventions (draft).• Reference variables.• The root class Object.• setMethodS3() & setConstructorS3().• Rdoc comments.• Static methods. • Virtual fields.• trycatch() - exception handling based on class.

3 of 22 http://www.maths.lth.se/help/R/

Purposes• End user (the most important person at the end of the day!)

– Provide consistent object-oriented APIs across different packages, e.g. by having a well defined naming convention for classes, methods, fields and variables.

– Make class inheritance more explicit.– Provide a simpler API, e.g. less arguments.– More memory efficient packages.

• Developer / programmer– Provide reference variables to reduce memory req.'s and data redundancy.– R Coding Convention, e.g. naming conventions.– Create generic functions automatically.– Make code cleaner and remove the need for tedious code repetitions.– Minimize the risk for package conflicts.– More code checking when creating methods and classes to catch errors early on.– Catch rare but “classical” bugs, e.g. using reserved words in method names. – Make help pages more up to date with the source code by allowing Rd document

to be placed together with the code in the source files.

4 of 22 http://www.maths.lth.se/help/R/

Real world example

# Read all GenePix Result filesgpr <- MicroarrayData$read(pattern=“*.gpr”)

# Extract the foreground & background signals of the red and# the green channels. The slide layout is also included.raw <- as.RawData(gpr)

# Get the background corrected signal as M=log(R/G) and A=log(RG)/2.ma <- getSignal(raw, bgSubtract=TRUE)

normalizeWithinSlide(ma, method=“p”) # print-tip normalization.

knownGenes <- c(50,194,3433,5541,6384)plot(ma); highlight(ma, knownGenes) # highlights the data points from theplotPrintorder(ma); highlight(ma, knownGenes) # correct slide in the correct space.plotSpatial(ma); highlight(ma, knownGenes)plotSpatial3d(gpr, field=“area”, col=getColor(ma))

# Write the normalized data to a tab-delimited filewrite(ma, “NormalizedExpressions.dat”)

5 of 22 http://www.maths.lth.se/help/R/

What the package is and isn’t

• Is not supposed to replace S3 or S4, but• is an extra layer on top of S3 (eventually S4), to• move the focus from S3 and S4 details to object-

oriented design and implementation.

R.oo

R environment(S3 and eventually S4)

• It has been tested and verified for > 2 years!

6 of 22 http://www.maths.lth.se/help/R/

RCC: R Coding Conventions (draft)• Standardizes the coding style

– Example of the naming conventions:• Variables, objects, fields and methods

should verbs starting with a lower case letter, e.g. shape$side and normalize().

• Classes should be nouns starting with an upper case letter, e.g. MicroarrayData.

• Constants should be in all upper case, e.g. Colors$RED.HUE.

• Similar to Java.

• Standards– make the code (and the design) easier to

read, share and maintain.– reduce the risk for bugs and

misunderstandings.

http://www.maths.lth.se/help/R/RCC/

7 of 22 http://www.maths.lth.se/help/R/

Reference variables

• Memory efficient.• Minimizes the amount redundant data.• Very useful for some data structures, e.g. graphs.• References in R.oo are implemented using the

environment data type.– Collected by the R garbage collector.

• (More user friendly methods interfaces since methods can “communicate” with each other by updating the state of the object.)

8 of 22 http://www.maths.lth.se/help/R/

A common root class: Object1. All classes should have the common root class Object.

– A similar idea exists in R today, e.g. print(), as.character() etc, but a common root class makes it more explicit.

Object

$(name): ANY$<-(name, value)[[(name): ANY[[<-(name, value)as.character(): characterattach(private=FALSE, pos=2)clone(): Objectdata.class(): characterdetach()equals(other): logicalextend(this, ...className, ...): Objectfinalize()getFields(private=FALSE): character[]hashCode(): integerll(...): data.framestatic load(file): ObjectobjectSize(): integerprint()save(file=NULL, ...)

9 of 22 http://www.maths.lth.se/help/R/

Object – the common root class

Object

Exception

RccViolationException

R.oo

MicroarrayData GenePixData

ImaGeneData

QuantArrayData

ScanAlyzeData

SpotData

SpotFinderData

MAData

RawData

RGData

TMAData

Layout

GalLayout

com.braju.sma

Reporter

HtmlReporter

LaTeXReporter

TextReporter

MultiReporter

R.io

File

FileFilter

RspEngine

BitmapImage

MonochromeImage

GrayImage

RGBImage

R.graphics

Color Device

10 of 22 http://www.maths.lth.se/help/R/

A common root class: Object1. All classes should have the common root class Object.

– A similar idea exists in R today, e.g. print(), as.character() etc, but a common root class makes it more explicit.

2. Fields of an Object can be accessed as elements of a list, e.g.:

– square$side and– square[[“side”]] <- 23

3. Methods can also be called as– square$getArea()

4. The implementation of reference variables is taken care of within the Object class. Under the hood, we roughly have:

”$.Object” <- function(object, name) { get(name, envir=attr(object, “.env”))

}

”$<-.Object” <- function(object, name, value) { assign(name, value, envir=attr(object, “.env”))

}

Object

$(name): ANY$<-(name, value)[[(name): ANY[[<-(name, value)as.character(): characterattach(private=FALSE, pos=2)clone(): Objectdata.class(): characterdetach()equals(other): logicalextend(this, ...className, ...): Objectfinalize()getFields(private=FALSE): character[]hashCode(): integerll(...): data.framestatic load(file): ObjectobjectSize(): integerprint()save(file=NULL, ...)

11 of 22 http://www.maths.lth.se/help/R/

• Defines a method of a class.

• Creates a generic function automatically iff missing.• RCC:

– Methods should start with a lower case letter.– Asserts that a correct method name is used; reserved words and

names of basic functions that must not be overwritten or redefined are protected.

setMethodS3()Does not

require theObject class

setMethodS3(“plotPrintorder”, “MAData”, function(object, ...) { ...})

setMethodS3(“next”, “Iterator”, function(object, ...) { ... })

Error: [2003-03-18 16:28:00] RccViolationException: Method names must not be same as a reserved keyword in R: next, cf. http://www.maths.lth.se/help/R/RCC/

12 of 22 http://www.maths.lth.se/help/R/

Problems with generic functions

• Hard to check if function (generic or not) already exists.• Ad hoc solutions for creating generic function “automatically”.• Under the S3 schema, it is possible to create generic functions that are

truly generic:

normalize <- function(...) UseMethod(“normalize”)

Note that the first argument is omitted. If not, it would be impossible to have default functions with no arguments, e.g. search().

• The R.oo package automatically creates generic functions as above.• We are not aware of how to do the same in S4 (this is the main reason for

why R.oo is currently staying with S3).

13 of 22 http://www.maths.lth.se/help/R/

• Defines the constructor method of a class, but also the class.

• RCC:– Asserts that a correct class name is used; reserved words and names

of basic functions that must not be overwritten or redefined are protected.

– Class and constructor names should start with an UPPER CASE letter.– Constructors should be named the same as the class.

setConstructorS3()

setConstructorS3(“MAData”, function(M, A, layout=NULL) { extend(MicroarrayData(layout=layout), “MAData”, M = as.matrix(M), A = as.matrix(A) )})

Constructor/class definition hybrid:Creates an object of the super class, which isthen “extended” into an MAData object with additional fields.

Does notrequire the

Object class

14 of 22 http://www.maths.lth.se/help/R/

Quick inspection of a class

• print(<class name>) or simply type the class name at the prompt and press ENTER, e.g.

> MADataMAData extends MicroarrayData, Object { public A public layout public M ... normalizeWithinSlide(...) ... public plot(what="MvsA", ...) public plot3d(...) public plotPrintorder(what="M", ...) ... public print(...) public save(file=NULL, path=NULL, ...)}

MicroarrayData

MAData

A: matrixM: matrix

as.RGData(): RGData...normalizeWithinSlide(...)normalizeAcrossSlides(...)...

Object

...plot(...)plot3d(...)plotPrintorder(...)...

Layout

ngrid.c: integerngrid.r: integernspot.c: integernspot.r: integer

...getName(...): charactergetId(...): character...nbrOfSpots(): integernbrOfGrids(): integer...

15 of 22 http://www.maths.lth.se/help/R/

• print(<object>) or simply <object> and ENTER at the prompt, which by default is equal to print(as.character(<object>)), e.g.

> ma[1] "MAData: M (5184x4), A (5184x4), Layout: Grids: 4x4 (=16), spots in grids:18x18 (=324), total number of spots: 5184. Spot name's are specified. Spot id's are specified."

• ll(<object>) gives details information about the (public) fields, e.g.

Quick inspection of an object

> ll(ma) member data.class dimension object.size

1 A SpotSlideArray c(5184,4) 1439402 layout Layout 1 4283 M SpotSlideArray c(5184,4) 143940

> ll(ma$layout) # or ll(getLayout(ma)) member data.class dimens2ion object.size1 geneGrps NULL 0 02 geneSpotMap NULL 0 03 id character 5184 638684 ngrid.c numeric 1 36... 11 printtipGrps NULL 0 0

16 of 22 http://www.maths.lth.se/help/R/

Rdoc: Source-to-Rd converter

• Rdoc comments are Rd documentation within the source files:– easy to generate complete

Rd files from source files.– less risk to forget to update

Rd files.– automatically generates

class hierarchy and method lists.

– extra tags to include external files, e.g. example code.

#####################################################################/**# @Class Matlab## \title{Matlab client for remote or local Matlab access}## \description{# @include "Matlab.declaration.Rdoc"# }## \usage{# matlab <- Matlab(host="localhost", port=9999, remote=FALSE)# }## \arguments{# \item{host}{Name of host to connect to. # Default value is \code{localhost}.}# \item{port}{Port number on host to connect to. # Default value is \code{9999}.}# \item{remote}{If \code{TRUE}, all data to and from the Matlab server will# be transferred through the socket connection, otherwise the data will# be transferred via a temporary file. Default value is \code{FALSE}.}# }## \section{Fields and Methods}{# @include "Matlab.methods.Rdoc"# @include "Matlab.inheritedMethods.Rdoc"# }## \examples{\dontrun{@include "Matlab.Rex"}}## \author{Henrik Bengtsson, \url{http://www.braju.com/R/}}## \seealso{# Stand-alone methods \code{\link{readMAT}()} and \code{\link{writeMAT}()}# for reading and writing MAT file structures.# }## @visibility public#*/######################################################################setConstructorS3("Matlab", function(host="localhost", port=9999, remote=FALSE) { extend(Object(), "Matlab", ...

Does notrequire the

Object class

17 of 22 http://www.maths.lth.se/help/R/

Static methods

• Methods that are specific to a class and do not belong to a certain object.

• Keeps the focus on classes/objects, not methods.– For instance, static method names are easy to remember for the

end user (“first class then method”), e.g.

• MicroarrayData$read(“slide1.gpr”)• Sound$read(“chime.wav”)• Colors$getHeatColors(1:10)

instead of • readMicroarrayData(“slide1.gpr”)• readSound(“chime.wav”)• getHeatColors(1:10)

which might not even be unique!

18 of 22 http://www.maths.lth.se/help/R/

Virtual fields• Virtual fields are fields that does not exist, but appears to do so

because of existing methods get<Field>() and set<Field>().

– Example 1: The virtual field area of the Square class is defined by defining getArea() and setArea():

• square$area will call getArea(square), which will return the area (´calculated from the field side or in some other way)

• square$area <- -12 will call setArea(square, -12), which then throws an OutOfRangeException.

– Example 2: Private fields, e.g. side, can be protected by defining setSide(), which throws an NoSuchFieldException.

– Example 3: The constant field RED.HUE can be write protected by defining setRED.HUE(), which throws an AssignmentException.

– Example 4: Provide cached fields that can be calculated from the other fields, but can be cached in case they are accessed often at it takes a long time to calculate them. The cache can be removed in case of low memory.

Henrik Bengtsson
apoAI = apolipoprotein AI

19 of 22 http://www.maths.lth.se/help/R/

Summary examplesetConstructorS3(“Square”, function(side=0) { # Creates an object of class Square. Square, whose fields are # defined at the same time, extends the class Shape. extend(Shape(), “Square”, side = side # ‘side’ is public )})

setMethodS3(“setSide”, “Square”, function(this, side) { # sq$side <- “a” will throw a NonNumericException if (!is.numeric(side)) throw(NonNumericException(“Trying to set the side of a square \ to a non-numeric value: “, side”))

# sq$side <- -12 will throw an OutOfRangeException if (!is.numeric(side)) throw(OutofRangeException(“The side of a square must be zero \ or greater: “, side”))

this$side <- side # Assignment remains also after returning!})

20 of 22 http://www.maths.lth.se/help/R/

Extended exception handling

• Throw Exception objects, which can be caught (quietly) based on class, e.g.

trycatch({ # Calls setArea(), which throws an OutOfRangeException. sq$side <- -12 }, NonNumericException = { cat(“The side of a square must be a numeric value.\n”)}, ANY = { # catches any other types of Exception (also try-error). print(Exception$getLastException())}, finally = { # always double the side whatever happens. sq$side <- 2*sq$side})

Object

Exception

RccViolationException

R.oo

OutOfRangeException

NonNumericException

Exception

static getLastException(): ExceptiongetMessage(): charactergetWhen(): POSIX timethrow()

Error: [2003-03-08 12:11:43] OutOfRangeException:The side of a square must be zero or greater: -12

Does notrequire the

Object class

21 of 22 http://www.maths.lth.se/help/R/

Future

• Make the API (even) more similar to the S4 API– Makes transitions to and from R.oo (and S4), easier.– Less confusing for beginners.

• Make an S4 version of the package– When the problem “generic functions are too restricted on matching

argument” is solved.

• Make it easier to declare private fields or constants.• Implement the mechanisms for field access in native code.• Publish R.oo on CRAN

– Requires a stable API. After 2+ years it is indeed very stable, but any major changes after v1.0 will be annoying for the user.

22 of 22 http://www.maths.lth.se/help/R/

Acknowledgements

• The R development team• People on the r-help mailing list• All users that have given feedback to the project

See http://www.maths.lth.se/help/R/ for

RCC, more documentation, help, examples, and installation of

R.classes bundle: R.audio, R.base, R.graphics, R.io, R.lang, R.matlab, R.oo, R.tcltk, R.ui,

cDNA microarray package: com.braju.sma.