Bioinformatics is … - the use of computers and information technology to assist biological studies...

18
Bioinformatics is - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters 1-4, Tisdall

Transcript of Bioinformatics is … - the use of computers and information technology to assist biological studies...

Page 1: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Bioinformatics is …

- the use of computers and information technology to assist biological studies

- a multi-dimensional and multi-lingual discipline

Chapters 1-4, Tisdall

Page 2: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Multiple platforms, multiple languages

• Windows, Mac, UNIX, Linux– UNIX remains the standard for bioinformatics

software development, while PC’s and Mac’s are typically end-users.

• Java, Python, CORBA, C++, Ruby, Perl– There’s more than one way of doing things.– Uniformity continues to be one of the biggest

problems faced in bioinformatics

Page 3: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Why Perl?

• Ease of use by novice programmers• Fast software prototyping

– Flexible language– Compact code (sometimes)

• Powerful pattern matching via “regular expressions”• Availability of program and modules (BioPerl)• Portability• Open Source – easy to extend and customize• No Licensing fees

Page 4: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Perl is easy to get…

• Many computers come with Perl already installed– Check by typing perl –v in a Unix, Linux, MacOSX shell,

or Windows MS-DOS shell

• If not, simply go to www.perl.com, or www.activestate.com to download a recent version of Perl (download binary whenever possible, source code requires compiling)

• ActiveState provides several tools for Perl developers (Although some think Perl is an “old” language, it is constantly undergoing revision and improvement

Page 5: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

What is Perl?

• Practical Extraction Report Language

• An interpreted programming language optimized for scanning text files, extracting information, and printing reports

• The string-based language of DNA and protein sequence data makes this an obvious choice

Page 6: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

What is a Perl program?

• A program consists of a text file containing a series of Perl statements– Perl programs can be written in a variety of text editors

including MS Word, WordPad, NotePad, or as you will use Komodo from ActiveState

• Perl statements are separated by semi-colons (;)• Multiple spaces, tabs, and blank lines are ignored• Anything following a # is ignored (comment line)• Perl is case sensitive

Page 7: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Perl has three data types

• $ - Scalar: holds a single value, which can be a number or string, $EcoRI = ‘GAATTC’;

• @ - Array: stores multiple scalar values [0, 1, 2, etc.]

• % - Hash: An associative array with keys and values

Page 8: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Using Scalar Variables

• Example 4-1 Tisdall provides a simple example, a thorough description of this exercise is supplied both in the text

Page 9: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Some additional comments regarding strings:

• Quotes:– ‘XYZ’ Text between a pair of single quotes is

interpreted literally

– To get a single-quote in a string precede it by a backslash

– To get a backslash into a single quoted string, precede backslash with backslash

• ‘hello’ #hello

• ‘can\’t’ #can’t

• ‘http:\\\\www’ # http:\\www

Page 10: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Double quotes interpolate variables

• “” variable names within the string are replaced by their current values– $x = 1;

print ‘$x’; #will print out $x

print “$x”; # will print out 1

Page 11: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Arithmetic operators

• + Addition

• - Subtraction

• * Multiplication

• ** Exponentiation

• / Division

• % Modulus

Page 12: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Other important operators

• = is an assignment operator

• == or eq is equals

• += or -= assignment operators that add or subtract, $a += 2; # means $a = $a +2;

• ++,, -- are autoincrement operators that add or subtract one from variable when following variable ($a++ = $a + 1)

Page 13: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

\n = newline• Often times you would like to introduce some spacing into your

output• \n introduces a blank line following any variable • Print “apple”;

print “grape”;Output looks like: apple

grape• Print “apple\n”;

print “grape\n”;

Output looks like: apple

grape

Page 14: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Chomp and Chop

• Chop removes the last character from a string– $a = “Dr. Barber is hip”;– Chop ($a); #$a is now “Dr. Barber is hi”

• Chomp removes a line from the end of the string– $a = “Dr. Barber is hip\n”;– Chomp ($a);#$a is now “Dr. Barber is hip”

Page 15: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

• Do examples 4-2, 4-3, 4-4

Page 16: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Working with Files

Biological data can come in a variety offile formats and our job is to utilize these filesand extract what we want

One such file format is FASTA

Page 17: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

Scalar vs. Array

• Example 4-5 provides a simple distinction between use of a scalar variable and an array, read it, but don’t necessarily do it

• Also, it shows how you use filehandles in association with your file

• < > are input operators, you will become better acquainted with this when we use <STDIN> later

Page 18: Bioinformatics is … - the use of computers and information technology to assist biological studies - a multi-dimensional and multi-lingual discipline Chapters.

adhI.pep

• Supplant NM_021964fragment.pep with adhI.pep, which can be downloaded from the web-site to a folder you need to create on your computer called “BIOS482”

• Do Example 4-7, if time permits write analogous code to the code that follows this example to test out arrays