Bioinformatics is … - the use of computers and information technology to assist biological studies...
-
Upload
cynthia-bryant -
Category
Documents
-
view
214 -
download
0
Transcript of Bioinformatics is … - the use of computers and information technology to assist biological studies...
Bioinformatics is …
- the use of computers and information technology to assist biological studies
- a multi-dimensional and multi-lingual discipline
Chapters 1-4, Tisdall
Multiple platforms, multiple languages
• Windows, Mac, UNIX, Linux– UNIX remains the standard for bioinformatics
software development, while PC’s and Mac’s are typically end-users.
• Java, Python, CORBA, C++, Ruby, Perl– There’s more than one way of doing things.– Uniformity continues to be one of the biggest
problems faced in bioinformatics
Why Perl?
• Ease of use by novice programmers• Fast software prototyping
– Flexible language– Compact code (sometimes)
• Powerful pattern matching via “regular expressions”• Availability of program and modules (BioPerl)• Portability• Open Source – easy to extend and customize• No Licensing fees
Perl is easy to get…
• Many computers come with Perl already installed– Check by typing perl –v in a Unix, Linux, MacOSX shell,
or Windows MS-DOS shell
• If not, simply go to www.perl.com, or www.activestate.com to download a recent version of Perl (download binary whenever possible, source code requires compiling)
• ActiveState provides several tools for Perl developers (Although some think Perl is an “old” language, it is constantly undergoing revision and improvement
What is Perl?
• Practical Extraction Report Language
• An interpreted programming language optimized for scanning text files, extracting information, and printing reports
• The string-based language of DNA and protein sequence data makes this an obvious choice
What is a Perl program?
• A program consists of a text file containing a series of Perl statements– Perl programs can be written in a variety of text editors
including MS Word, WordPad, NotePad, or as you will use Komodo from ActiveState
• Perl statements are separated by semi-colons (;)• Multiple spaces, tabs, and blank lines are ignored• Anything following a # is ignored (comment line)• Perl is case sensitive
Perl has three data types
• $ - Scalar: holds a single value, which can be a number or string, $EcoRI = ‘GAATTC’;
• @ - Array: stores multiple scalar values [0, 1, 2, etc.]
• % - Hash: An associative array with keys and values
Using Scalar Variables
• Example 4-1 Tisdall provides a simple example, a thorough description of this exercise is supplied both in the text
Some additional comments regarding strings:
• Quotes:– ‘XYZ’ Text between a pair of single quotes is
interpreted literally
– To get a single-quote in a string precede it by a backslash
– To get a backslash into a single quoted string, precede backslash with backslash
• ‘hello’ #hello
• ‘can\’t’ #can’t
• ‘http:\\\\www’ # http:\\www
Double quotes interpolate variables
• “” variable names within the string are replaced by their current values– $x = 1;
print ‘$x’; #will print out $x
print “$x”; # will print out 1
Arithmetic operators
• + Addition
• - Subtraction
• * Multiplication
• ** Exponentiation
• / Division
• % Modulus
Other important operators
• = is an assignment operator
• == or eq is equals
• += or -= assignment operators that add or subtract, $a += 2; # means $a = $a +2;
• ++,, -- are autoincrement operators that add or subtract one from variable when following variable ($a++ = $a + 1)
\n = newline• Often times you would like to introduce some spacing into your
output• \n introduces a blank line following any variable • Print “apple”;
print “grape”;Output looks like: apple
grape• Print “apple\n”;
print “grape\n”;
Output looks like: apple
grape
Chomp and Chop
• Chop removes the last character from a string– $a = “Dr. Barber is hip”;– Chop ($a); #$a is now “Dr. Barber is hi”
• Chomp removes a line from the end of the string– $a = “Dr. Barber is hip\n”;– Chomp ($a);#$a is now “Dr. Barber is hip”
• Do examples 4-2, 4-3, 4-4
Working with Files
Biological data can come in a variety offile formats and our job is to utilize these filesand extract what we want
One such file format is FASTA
Scalar vs. Array
• Example 4-5 provides a simple distinction between use of a scalar variable and an array, read it, but don’t necessarily do it
• Also, it shows how you use filehandles in association with your file
• < > are input operators, you will become better acquainted with this when we use <STDIN> later
adhI.pep
• Supplant NM_021964fragment.pep with adhI.pep, which can be downloaded from the web-site to a folder you need to create on your computer called “BIOS482”
• Do Example 4-7, if time permits write analogous code to the code that follows this example to test out arrays