A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing...

72
A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/20 12 Information Services,

Transcript of A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing...

Page 1: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

A Crash Course on Perl programming

Presented by:Shailender Nagpal, Al Ritacco

Research ComputingUMASS Medical School

09/17/2012Information Services,

Page 2: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

AGENDAPerl Basics: Scalars, Arrays, Expressions, PrintingBuilt-in functions, Blocks, Branching, LoopsHash arrays, String and Array operationsFile reading and writingWriting custom functionsRegular expressions: Find/replace/countProviding input to programsUsing Perl scripts with the LSF cluster

00/00/2010Information Services,2

Page 3: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

What is Perl?

• Perl is a high-level, general-purpose, interpreted, dynamic programming language

• Provides a simple iterative, top-down, left to right programming environment for users to create small, and larg’ish programs

• Originally developed by Larry Wall in 1987 at NASA• The acronym PERL means

– Practical Extraction and Reporting Language

00/00/2010Information Services,3

Page 4: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Features of Perl

• Perl code is portable between Linux, Mac, Windows• Easy to use and lots of resources (CPAN) are available• Procedural programming, not strongly “typed”• Similar programming syntax as other languages

– if, if-then-else, while, for, functions, classes, etc• Provides several methods to manipulate data

– Arrays, hash arrays, array of arrays, hash of hashes– What does this mean?

• Well, we can store things easily and compare easily

00/00/2010Information Services,4

Page 5: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Advantages of Perl

• Perl is a general-purpose programming language like C, Java, etc. But it is “higher” level, means is advantageous to use it in certain applications like Bioinformatics– Fewer lines of code than C, Java– No compilation necessary. Prototype and run!– Vast function library geared towards scientific computing– Save coding time and automate computing tasks– Intuitive. Code is concise, but human readable

• CPAN is a vast repository of Perl modules, reuse code

00/00/2010Information Services,5

Page 6: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

First Perl program• The obligatory “Hello World” program

#!/usr/bin/perl

# Comment: 1st program: variable, print$name = “World”;print “Hello $name”;

• Save these lines of text as a text file with the “.pl” extension, then at the command prompt (linux):$ perl hello.pl

Hello World

00/00/2010Information Services,6

Page 7: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Understanding the code

• The first line of a Perl script requires an interpreter location, which is the path to the Perl executable

#!/path/to/perl

• 2nd line: A comment, beginning with “#”• 3rd line: Declaration of a string variable• 4th line: Printing some text to the shell with a

variable, whose value is interpolated• The quotes are not printed, and $word is replaced by

“World” in the output. All statements end in “;”

00/00/2010Information Services,7

Page 8: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Second program

• Report summary statistics of DNA sequence#!/usr/bin/perl$DNA = “ATAGCAGATAGCAGACGACGAGA”;print “Length of DNA is “.length($DNA);print “Number of A bases are ”.($DNA=~tr/A//);print “Number of C bases are ”.($DNA=~tr/C//);print “Number of G bases are ”.($DNA=~tr/G//);print “Number of T bases are ”.($DNA=~tr/T//);print “Number of G+C bases are ”.($DNA=~tr/GC//);print “Number of GC dinucleotides are ”.($DNA=~s/GC//g);print “G+C percent content is “.($DNA=~tr/GC//)/length($DNA)*100);

• In 10 lines of code, we can summarize our data! • Can re-use this code to find motifs, RE sites, etc

00/00/2010Information Services,8

Page 9: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Comments

• Use “#” character at beginning of line for adding comments into your code

• Helps you and others to understand your thought process

• Lets say you intend to sum up a list of numbers

# (sum from 1 to 100 of X)

•The code would look like this:$sum = 0; # Initialize variable called “sum” to 0for($x=1; $x<=100; $x++) # Use “for” loop to iterate over 1 to 100{ $sum=$sum+$x; } # Add the previous sum to $xprint “The sum of 1..$x is $sum\n”; # Report the result

00/00/2010Information Services,9

Page 10: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Variables

• Variables – Provide a location to “store” data we are interested in

• Strings, decimals, integers, characters, lists, …– What is a character – a single letter or number– What is a string – a list of characters– What is an integer – a number 4.7 (sometimes referred to

as a real if there is a decimal point)

• Variables can be assigned or changed easily within a perl script

00/00/2010Information Services,10

Page 11: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Variables and built-in keywords

• Variable names should represent or describe the data they contain– Do not use meta-characters, stick to alphabets, digits and

underscores. Begin variable with alphabet

• Perl as a language has keywords that should not be used as variable names. They are reserved for writing syntax and logical flow of the program– Examples include: my, if, then, else, for, foreach, while, do,

unless, until, break, continue, switch, def, class

00/00/2010Information Services,11

Page 12: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Scalar and Array variables• Variables preceded by a “$” sign are called “Scalar”

variables. They hold a single value – could be a number or string, etc.$score = 5.3;$dna = “ATAGGATAGCGA”;$name = “Shailender”;

• Variables preceded by a “@” sign are called “Array” variables. They hold a list of values – could be a list of students in a class, scores from a test, etc@students = (“Alan”, ”Shailender”, ”Chris”);@scores = (89.1, 65.9, 92.4);@binding_pos = (9439984, 114028942);

00/00/2010Information Services,12

Page 13: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Printing text• Using double “ “ and single quotes ‘ ‘• Using double quotes process all items within them

– Ex: print “This \t is a test\nwith text”– “\t” is a tab delimiter. “\n” is a newline character.– Output:

This is a testWith text

• Single quotes are not processed at all. So all items are treated as actual text.– Ex: print ‘This \t is a test\nwith text’Output: This \t is a test\nwith text

00/00/2010Information Services,13

Page 14: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Printing scalar variables• Scalar variables can be printed easily within double-

quotes following a print statement. Variables names are “interpolated”, printing the values they contain$x = 5;$name = “John”;print “$name has $x dollars\n”;

• If you run this as a program, you get this outputJohn has 5 dollars

00/00/2010Information Services,14

Page 15: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Printing array variables• Array variables can also be printed as a list with a

default delimiter, but another way to print arrays is put them in a loop and print them as scalars@students = (“Alan”, ”Shailender”, ”Chris”);print “@students\n”; # Method 1foreach $name (@students) # Method 2{ print “Student name is $name\n” }

• If you run this as a program, you get this output:Alan Shailender Chris (Method 1)Student name is Alan (Method 2)Student name is Shailender (Method 2)Student name is Chris (Method 2)

00/00/2010Information Services,15

Page 16: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Math Operators and Expressions• Math operators

– Eg: 3 + 2 – + is the operator – We read this left to right– Basic operators such as + - / * ** ( ^ )– Variables can be usedprint “Sum of 2 and 3 is ”.(2+3);$x = 3;print “Sum of 2 and x is “.(2+$x);

• PEMDAS rules are followed to build mathematical expressions. Built-in math functions can be used

00/00/2010Information Services,16

Page 17: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Mathematical operations• $x=3• $y=5;• $z=$y+$x;

– Is this the same: $z=$x+$y ?– Yes, but not in-terms of computing it (LR grammar)

• $x=$x*z;• $y=$y+1;• $y++;

00/00/2010Information Services,17

Page 18: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl built-in functions

• The perl language comes with many built-in functions that can be used with variables to produce summary output, eg, – length: return length of a string– substr: return sub-string from a string– uc: convert string to upper case– reverse: reverse the contents of a list– sort: Sort a list of numbers or strings– pop: Return the last element of an array and remove it

• Many mathematical functions are also available

00/00/2010Information Services,18

Page 19: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Array Indexing

• Arrays can be indexed by number to retrieve individual elements (scalars)

• Indexes have range 0 to (n-1), where 0 is the index of the first element and n-1 is the last item’s index@array=(“A”,”C”,”G”,”T”,”U”);@nucleotides=(“adenine”, ”cytosine”, ”guanine”, ”thymine”, ”uracil”);$array[0] is equal to A$nucleotides[3] is equal to thymine$nucleotides[4] is equal to what?@array[0..2] is equal to what?

• Any element of an array can be re-assigned

00/00/2010Information Services,19

Page 20: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Array Operations

• Perl arrays are dynamic, they assume whatever values or size needed– Can dynamically lengthen or shorten arrays– May be defined, but empty– No predefined size or "out of bounds" error– unshift and shift add to and remove from the

front– push and pop add to and remove from the end

00/00/2010Information Services,20

Page 21: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Arrays operationsmy @fruits; # Undefined@fruits = qw(apples bananas cherries);# Assigned@fruits = (@fruits, "dates"); # Lengthen@fruits = (); # Emptyunshift @fruits, "acorn";# Add an item to the frontmy $nut = shift @fruits; # Remove from the frontprint "Well, a squirrel would think a $nut was a

fruit!\n";push @fruits, "mango"; # Add an item to the endmy $food = pop @fruits; # Remove from the endprint "My, that was a yummy $food!\n";Output: Well, a squirrel would think a acorn was a fruit!My, that was a yummy mango!

00/00/2010Information Services,21

Page 22: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Array operations

• Slices of an array (sub-array) is itself an array– Take an array slice with @array[@indices]– $array[0] is a scalar, that is, a single value - item– @array[0] is an array, containing a single scalar– Scalars always begin with $ - single item– Arrays always begin with @ - array itself

00/00/2010Information Services,22

Page 23: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

• Perl is very useful in handling string variables• The “split” command allows users to search for patterns in a

string and use them as a delimiter to break the string apart• For example, to extract the words in a sentence, we use the

space delimiter to capture the words@words = split(/ /, “This is a sentence”);

• Now @words contains the words$words[0] = “This”$words[1] = “is”$words[2] = “a”$words[3] = “sentence”

String Operations: Split

Page 24: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

• Another way to assign the results of a split command is to anticipate how many scalar variables will be created($first, undef, $third, $fourth) = split(/ /, “This is a sentence”);

• In this case, we anticipate 4 values to be returned by the split command but we choose to ignore the second one, which we assign to “undef”. Only 3 variables will get created

String Operations: Split (…contd)

Page 25: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

• The join command concatenates multiple variables in the order they are entered into the join command– A delimiter must be specified, such as space, tab, comma, etc– Empty delimiter “” results in full concatenation

• Syntax:$new_string = join(“:”, @names, “Paul”, “Debbie”);

• The value of $new_string shall be “Joe:John:David:Paul:Debbie”

• Try out:perl split_and_join.pl

String Operations: Join

Page 26: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

• Perl allows the extraction of a substring from a scalar string variable

• Syntax:$part_string = substr($fullstring, $start, $length);$part_string = substr(“This is a full string”, 10, 4);

• The $part_string value is “full”;• Remember that indexing within string variables begins with 0,

just like with arrays, but sub-strings cannot be extracted like arrays

String operations: substr

Page 27: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Iterating over Arrays with “foreach”

• Ok, so we have these arrays, but how do we work with each element automatically?– How can we iterate over them and perform the same

operation to each element?

• We use looping logic to work with the arrays• We use Perl’s “for”, more specifically foreachforeach $named_item (@array) {

$named_item = <some expression>;}

00/00/2010Information Services,27

Page 28: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Arrays, foreach cont.• Example:my @nucleotides=("adenine", "cytosine", "guanine",

"thymine", "uracil");foreach $nt (@nucleotides){ print "Nucleotide is: $nt\n";}

Output:Nucleotide is: adenineNucleotide is: cytosineNucleotide is: guanineNucleotide is: thymineNucleotide is: uracil

00/00/2010Information Services,28

Page 29: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Iterating over Arrays with “for”• Example:my @nucleotides=("adenine", "cytosine", "guanine",

"thymine", "uracil");for($i=0;$i<length(@nucleotides);$i++){ print "Nucleotide is: $nucleotides[$i]\n";}

Output:Nucleotide is: adenineNucleotide is: cytosineNucleotide is: guanineNucleotide is: thymineNucleotide is: uracil

00/00/2010Information Services,29

Page 30: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Iterating over Arrays with “while”• Example:

@nucleotides=("adenine", "cytosine", "guanine", "thymine", "uracil");$i = 0;while($i<length(@nucleotides) {

print "Nucleotide is: $nucleotides[$i]\n";$i++; }

• Output:Nucleotide is: adenineNucleotide is: cytosineNucleotide is: guanineNucleotide is: thymineNucleotide is: uracil

00/00/2010Information Services,30

Page 31: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

The infamous “$_”

• $_ is a buffer used by Perl to hold the current entry of a loop string selector

• Example@a= (1, 1, 2, 3, 5, 8, 11);foreach (@a) {

print $_; # “print” without $_ will work too

print “ ”;}

• Output:1 1 2 3 5 8 11

00/00/2010Information Services,31

Page 32: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Boolean Operations

• Boolean operators provide Boolean context• Many types of operators are provided

– Relational (<, >, lt, gt)– Equality (==, !=, eq, ne)– Logical (high precedence) (&&, ||, !)– Logical (low precedence) (and, or, not)– Conditional (?:)

00/00/2010Information Services,32

Page 33: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Commands blocks in Perl• A group of statements surrounded by braces {}• Creates a new context for statements and commands• Starts with “{“• Ends with “}”• Ex:

{print “Test\n”;

}

00/00/2010Information Services,33

Page 34: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Conditional operations with “if-then-else”

• If-then-else syntax allows programmers to introduce logic in their programs

• Blocks of code can be branched to execute only when certain conditions are metif(condition1 is true){ <execute these statements if condition1 is true>; }else{ <execute these statements if condition1 is false>; }

00/00/2010Information Services,34

Page 35: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl nested blocks• Blocks within blocks{

if ($x>1) {if ($y>2) {

print(“y>2\n”;}print (“x>1\n”;

}}

00/00/2010Information Services,35

Page 36: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Hash Arrays• What is a Hash Array?

– Associative arrays, also frequently called hashes, are the third major data type in Perl after scalars and arrays.

• Hashes work very similarly to a common data structure that programmers use in other languages--hash tables

• Hashes in Perl are actually a direct language supported data type

• All Perl hash variable names are prefixed with a % symbol, and can hold key-value pairs

00/00/2010Information Services,36

Page 37: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Hash Arrays (…contd)

• Example data pairs that are suited to be stored in a hash array (as opposed to storing them in 2 separate arrays)– Words (key) and their meanings (value)– Gene symbols (key) and their full names (value)– Country names (key) and their capitals/ currencies (value)

• Accessing a hash works the same as an array, instead of subscript, you provide a key to retrieve the value

• Looking up items is faster than searching through an array

00/00/2010Information Services,37

Page 38: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Hash Arrays, cont.• Example

%wheels = (

unicycle => 1, bike => 2, tricycle => 3, car => 4, semi => 18

);

• To print the number of wheels in a car, doprint “A car has $wheels{‘car’} wheels”;

00/00/2010Information Services,38

Key => Pair value

Page 39: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Hash Arrays, cont.• Other ways to assign Hash Arrays:%dessert = ("pie", "apple", "cake", "carrot", "sorbet",

"orange"); # Method 1%dessert = (pie => "apple“, cake => "carrot",

sorbet => "orange"); # Method 2$dessert{“pie”} = “apple”; # Method 3$dessert{“cake”} = “carrot”; # Method 3$dessert{“sorbet”} = “orange”; # Method 3print "I would like $dessert{pie} pie.\n";

Output:I would like apple pie.

00/00/2010Information Services,39

Page 40: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Hash Array operations• Certain key-value pairs can be deletedprint “Before deleting, the Unicycle has

$wheels{‘unicycle’} wheels\n“;delete $wheels{‘unicycle’}; # delete unicyle entryprint "After deleting, the Unicycle has

$wheels{‘unicycle’} wheels\n“;Output:The Unicycle has: 1 wheelsAfter the delete the Unicycle has wheels

• Keys and values of a hash array can be retrieved@vehicles = keys %wheels; @wheel_nums = values %wheels;

00/00/2010Information Services,40

Page 41: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Hash Array iteration

• Easy to iterate– “each” returns key/value pairs in random order– while loop can iterate over entire hash– we can create entites such as:

• ($vehicle, $wheels) = each %sounds

– when called on a hash in list context, returns a 2-element list consisting of the key and value for the next element of a hash

00/00/2010Information Services,41

Page 42: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Hash Array iteration (...contd)my %sounds = (cow => "moooo", duck => "quack“,

horse => "whinny", sheep => "baa", hen => "cluck“, pig => "oink");

my @barnyard_sounds = @sounds{"horse","hen","pig"};while (my ($animal, $noise) = each %sounds) { print "Old MacDonald had a $animal."; print " With a $noise! $noise! here...\n";}Output:Old MacDonald had a hen. With a cluck! cluck!

here...Old MacDonald had a cow. With a moooo! moooo!

here...Old MacDonald had a sheep. With a baa! baa! here...

00/00/2010Information Services,42

Page 43: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Numerical Operators, (more)• We have the ability to short cut operators such as

x=x+1, and x=4/2 as well as other more common operations

• Numeric operators provide numeric context• All common operators are provided

– Increment and decrement (++, --)– Arithmetic (+, *)– Assignment (+=, *=)– Bitwise Shifts (<<, >>)

00/00/2010Information Services,43

Page 44: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl File access

• What is file access?– set of Perl commands/syntax to work with data files

• Why do we need it?– Makes reading data from files easy, we can also create new

data files

• What different types are there?– Read, write, append

00/00/2010Information Services,44

Page 45: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl File access

• Access to files is similar to shell redirection– open() allows access to the file– Redirect characters (<, >) define access type– Can read, write, append, read & write, etc.– Filehandle refers to opened file– close() stops access to the file– $! contains IO error messages (similar to $?)

00/00/2010Information Services,45

Page 46: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl File access• Example# Open file for readingopen INPUT, "< datafile" or die "Can't open input file: $!"; # Open file for writingopen OUTPUT, "> outfile" or die "Can't open output file: $!"; # Appendopen LOG, ">> logfile " or die "Can't open log file: $!"; #RW Fileopen RWFILE, "+< myfile " or die "Can't open file: $!"; close INPUT;

00/00/2010Information Services,46

Page 47: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Files access (reading)

• Reading from files– Input operator <> reads one line from the file,

including the newline character– chomp will remove newline if you want– Can modify input recorder separator $/ to read

characters, words, paragraphs, records, etc.

00/00/2010Information Services,47

Page 48: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl File access (reading)

• Reading from files– Easy to loop over entire file– Loops will assign to $_ by default– Be sure that the file is open for reading first

• Input file:Lastname:Firstname:Age:Address:Apartment:City:State:ZIPSmith:Al:18:123 Apple St.:Apt.#1:Cambridge:MA:02139

00/00/2010Information Services,48

Page 49: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl File access Example• Example open(CUSTOMERS, "< mailing_list“) or die "Can't open input file: $!";while ($line = <CUSTOMERS>) {my @fields = split(":", $line); # Fields separated by colons print "$fields[1] $fields[0]\n“;# Display selected fieldsprint "$fields[3], $fields[4]\n";print "$fields[5], $fields[6] $fields[7]\n"; }

• Output:Al Smith 123 Apple St., Apt. #1 Cambridge, MA 02139

00/00/2010Information Services,49

Page 50: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl File access writing

• Writing to files– print writes to a file– print writes to a STDOUT by default– Be sure that the file is open for writing first

• Check for errors along the way

00/00/2010Information Services,50

Page 51: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl File access writing• Example writing to a file# Read fileopen CUSTOMERS, "< mailing_list" or die "Can't open input file: $!";# Output fileopen LABELS, "> labels" or die "Can't open output file: $!";while (my $line = <CUSTOMERS>) {

my @fields = split(":", $line);print LABELS "$fields[1] $fields[0]\n";print LABELS "$fields[3], $fields[4]\n";print LABELS "$fields[5], $fields[6]

$fields[7]\n";}

00/00/2010Information Services,51

Page 52: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Subroutines/ Functions

• What is a subroutine?– group related statements into a single task– segment code into logical blocks– avoid code and variable based collision– can be “called” by segments of other code

• Perl allows – both declared and anonymous subs– various ways of handling arguments– various ways of calling subs

00/00/2010Information Services,52

Page 53: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Subroutines• Subroutines return values

– Explicitly with the return command– Implicitly as the value of the last executed statement

• Return values can be a scalar or a flat list• One can pass arguments to subs by value• Arguments are passed into the @_ array

– @_ is the "fill in the blanks" array– Usually should copy @_ into local variables

00/00/2010Information Services,53

Page 54: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Subroutinessub add_one {

my ($n) = @_; # Copy first argument return ($n + 1); # Return 1 more than input }my ($a, $b) = (10, 0);add_one($a); # Return value is lost, nothing changes$b = add_one($a); # $a is 10, $b is 11

00/00/2010Information Services,54

Page 55: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Subroutines• Subroutine calls usually have arguments in

parentheses– Parentheses are not needed if sub is declared first– But using parentheses is often good style

• Subroutine calls may be recursive• Subroutines are another data type

– Name may be preceded by an & character– & is not needed when calling subs

00/00/2010Information Services,55

Page 56: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Subroutines• Example function

– Fib(n-1)+fin(n-2)…

• Compute – 1,1,2,3,5,8,13….

• Example Output:fibonacci(1) is 1fibonacci(2) is 1fibonacci(3) is 2fibonacci(4) is 3fibonacci(5) is 5

00/00/2010Information Services,56

sub fibonacci { ($n) = @_; die "Number must be positive" if $n <= 0; return 1 if $n <= 2; return (fibonacci($n-1) + fibonacci($n-2));}

foreach my $i (1..5) { my $fib = fibonacci($i); print "fibonacci($i) is $fib\n";}

Page 57: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Regular Expressions

• There are three kinds of regular expressions – Matching: returns T or F if a pattern is found in string$search_string =~ m/pattern/

– Substitution: matches a pattern, then substitutes it with another. Returns count of substitutions

$search_string =~ s/search_pattern/substitute_pattern/;

– Transliteration: matches a single character and translates it into another single character. Returns count of replaced

$search_string =~ tr/search_pattern/translate_pattern/;

00/00/2010Information Services,57

Page 58: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Regular expressions (…contd)

• RE’s are useful to Biologists looking for “patterns” in DNA and protein sequences such as restriction enzymes, motifs and single AA’s or DNA bases

• Transcribe a DNA sequence to RNA$dna = “ATTAGGACGAAGATTGA”;$dna =~ s/T/U/g;print $dna;

• Obtain the reverse compliment of DNA sequence$dna =~ tr/ATCG/TAGC/;print “Reverse compliment is “.reverse($dna)

00/00/2010Information Services,58

Page 59: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Providing input to programs

• Perl programs can certainly have variables containing parameter data, but it is sometimes convenient not to have to edit a program to change some of the data

• There are 2 ways of doing this– Reading data from shell directly into program variables by

requesting keyboard input– Command line arguments

00/00/2010Information Services,59

Page 60: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Requesting keyboard input• Exampleprint "What type of pet do you have? ";my $pet = <STDIN>; # Read a line from STDINchomp $pet; # Remove newlineprint "Enter your pet's name: ";my $name = <>; # STDIN is optional, and is impliedchomp $name;print "Your pet $pet is named $name.\n";

• Output:What type of pet do you have? parrot Enter your pet's name: Polly Your pet parrot is named Polly.

00/00/2010Information Services,60

Page 61: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

• Command line arguments are optional values that can be passed as input to the perl program– After the name of the program, string or numeric values

are placed, with spaces separating them– These values can be accessed by the @ARGV array variable

inside the program

• Examples:perl arguments.pl arg1 arg2 10 20

• Why do you need Command Line Arguments?– Specify inputs at runtime without re-editing the program

Command Line Arguments

Page 62: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Using Perl programs on the cluster

• Perl scripts can easily be submitted as jobs to be run on the MGHPCC infrastructure

• Basic understanding of Linux commands is required, and an account on the cluster

• Lots of useful and account registration information atwww.umassrc.org

• Feel free to reach out to Research Computing for [email protected]

00/00/2010Information Services,62

Page 63: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

What is a computing “Job”?

• A computing “job” is an instruction to the HPC system to execute a command or script– Simple linux commands or Perl/Python/R scripts that can

be executed within miliseconds would probably not qualify to be submitted as a “job”

– Any command that is expected to take up a big portion of CPU or memory for more than a few seconds on a node would qualify to be submitted as a “job”. Why? (Hint: multi-user environment)

63

Page 64: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

How to submit a “job”

• The basic syntax is:bsub <valid linux command>

• bsub: LSF command for submitting a job• Lets say user wants to execute a Perl script. On

a linux PC, the command isperl countDNA.pl

• To submit a job to do the work, dobsub perl countDNA.pl

64

Page 65: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Specifying more “job” options

• Jobs can be marked with options for better job tracking and resource management– Job should be submitted with parameters such as queue

name, estimated runtime, job name, memory required, output and error files, etc.

• These can be passed on in the bsub commandbsub –q short –W 1:00 –R rusage[mem=2048] –J “Myjob” –o hpc.out –e hpc.err perl countDNA.pl

65

Page 66: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Job submission “options”

66

Option flag or name

Description

-q Name of queue to use. On our systems, possible values are “short” (<=4 hrs execution time), “long” and “interactive”

-W Allocation of node time. Specify hours and minutes as HH:MM

-J Job name. Eg “Myjob”

-o Output file. Eg. “hpc.out”

-e Error file. Eg. “hpc.err”

-R Resources requested from assigned node. Eg: “-R rusage[mem=1024]”, “-R hosts[span=1]”

-n Number of cores to use on assigned node. Eg. “-n 8”

Page 67: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Why use the correct queue?

• Match requirements to resources• Jobs dispatch quicker• Better for entire cluster• Help GHPCC staff determine when new resources are

needed

67

Page 68: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

A bioinformatics demo

• Log on to the Umass server using Putty on windows or Terminal on Mac

• Request an interactive shell session on one of the compute nodes for this demo

$ bsub –q interactive –W 4:00 –Is bash

• Navigate to the training directory or copy the examples to your local directory

$ cd /project/umw_rcs/training/perl

OR$ cp /project/umw_rcs/training/perl/* .

Page 69: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

A bioinformatics demo (…contd)

• Lets say we have microRNA data across 3 different files– microRNA sequence FASTA file– Data file containing abundance of microRNA in sample– Annotation file containing targets of microRNA

• Our goal in this exercise is to bring all of this data together into a single report or table, alongwith some analysis

00/00/2010Information Services,69

Page 70: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Questions?

• How can we help further?• Please check out books we recommend as well as

web references (next 2 slides)

00/00/2010Information Services,70

Page 71: A Crash Course on Perl programming Presented by: Shailender Nagpal, Al Ritacco Research Computing UMASS Medical School 09/17/2012Information Services,

Perl Books

• Perl books which may be helpful– http://shop.oreilly.com/product/9780596000806.do

• Beginning Perl for Bioinformatics

– http://shop.oreilly.com/product/9780596003074.do • Mastering Perl for Bioinformatics

– http://shop.oreilly.com/product/9780596003135.do • Perl Cookbook

– http://shop.oreilly.com/product/9780596004927.do • Programming Perl – The Reference

00/00/2010Information Services,71