Perl Scripting

58
Introduction to PERL: Scripting Introduction to PERL: Scripting for UNIX made simple and for UNIX made simple and portable portable Yuk Sham MSI Consultant Phone: (612) 626 0802 (help) Email: [email protected]

description

Perl Scripting

Transcript of Perl Scripting

Page 1: Perl Scripting

Introduction to PERL: Scripting Introduction to PERL: Scripting for UNIX made simple and for UNIX made simple and

portableportable

Yuk ShamMSI Consultant

Phone: (612) 626 0802 (help)Email: [email protected]

Page 2: Perl Scripting

2

OutlineOutline• What is PERL?• Why would I use PERL instead of something else?• PERL features

– How to run PERL scripts– PERL syntax, variables, quotes– Flow control constructs– Subroutines

• Typical UNIX scripting tasks– Filter a file or a group of files– Searching/Matching– Naming file sequences– Executing applications– Parsing files

• More information

Page 3: Perl Scripting

3

What is PERL?What is PERL?• Practical Extraction Report Language

– Written by Larry Wall

• Combines capabilities of Bourne shell, csh, awk, sed, grep, sort and C

• To assist with common tasks that are too heavy or portable-sensitive in shell, and yet too weird or too complicated to code in C or other programming language.

• File or list processing - matching, extraction, formatting (text reports, HTML, mail, etc.)

Page 4: Perl Scripting

4

Why would I use PERL instead of Why would I use PERL instead of something else?something else?

• Interpreted language• Commonly used for cgi programs• Very flexible• Very automatic• Can be very simple for a variety of tasks• WIDELY available• HIGHLY portable

Page 5: Perl Scripting

5

PERL featuresPERL features

• C-style flow control (similar)• Dynamic allocation• Automatic allocation• Numbers • Lists• Strings• Arrays• Associative arrays (hashes)

Page 6: Perl Scripting

6

PERL featuresPERL features

• Very large set of publicly available libraries for wide range of applications

• Math functions (trig, complex) • Automatic conversions as needed• Pattern matching• Standard I/O• Process control• System calls• Can be object oriented

Page 7: Perl Scripting

7

How to run PERL scriptsHow to run PERL scripts

% cat hello.plprint "Hello world from PERL.\n";%

% perl hello.pl

Hello world from PERL.

Page 8: Perl Scripting

8

How to run PERL scriptsHow to run PERL scriptsOR ------------------% which perl/usr/bin/perl

% cat hello.pl#!/usr/bin/perlprint "Hello world from PERL.\n";

%chmod a+rx hello.pl% hello.pl

Hello world from PERL.

(the .pl suffix is just a convention - no special meaning - to perl)/usr/local/bin/perl is another place perl might be

linked at Institute

Page 9: Perl Scripting

9

PERL syntaxPERL syntax

• Free form - whitespace and newlines are ignored, except as delimiters

• PERL statements may be continued across line boundaries

• All PERL statement end with a ; (semicolon)• Comments begin with the # (pound sign) and end

at a newline– no continuation– may be anywhere, not just beginning of line

• Comments may be embedded in a statement– see previous item

Page 10: Perl Scripting

10

Example 1:

#!/usr/bin/perl# This is how perl says helloprint "Hello world from PERL.\n"; # It says hello onceprint "Hello world again from PERL.\n";# It says hello twice

Example 2:

#!/usr/bin/perlprint"Hello world from PERL.\n";print"Hello world again from PERL.\n";

Example 3:

#!/usr/bin/perlprint "Hello world from PERL.\n";print "Hello world again from PERL.\n";

Hello world from PERL.Hello world again from PERL.

Hello Hello worldworld

Page 11: Perl Scripting

11

PERL variablesPERL variables• Number or string

$count• Array

List of numbers and/or stringsIndexed by number starting at zero@an_array

• Associative array or hashList of numbers and/or stringsIndexed by anything%a_hash

Page 12: Perl Scripting

12

Strings Strings and arraysand arrays

$x = 27; $y = 35; $name = "john";@a = ($x,$y,$name);print “x = $x and y = $y\n”;print “The array is @a \n";

X = 27 and y = 35The array is 27 35 john

@a = ("fred","barney","betty","wilma");print "The names are @a \n";print "The first name is $a[0] \n";print "The last name is $a[3] \n";

The names are fred barney betty wilmaThe first name is fredThe last name is wilma

Page 13: Perl Scripting

13

Associative Associative arraysarrays

$a{dad} = "fred";$a{mom} = "wilma";$a{child} = "pebble";print "The mom is $a{mom} \n";

The mom is wilma

@keys = keys(%a);@values = values(%a);print “The keys are @keys \n”print “The values are @values \n";

The keys are mom dad childThe values are wilma fred pebble

Page 14: Perl Scripting

14

Operators Operators and functionsand functions

• increase or decrease existing value by 1 (++, --)• modify existing value by +, -, * or /

by an assigned value (+=, -=, *=, /=)

Example 1

$a = 1;$b = "a";++$a;++$b;print "$a $b \n";

2 b

Example 2

$a = $b = $c = 1;++$b;$c *= 3;print "$a $b $c\n";

1 2 3

Page 15: Perl Scripting

15

Operators Operators and functionsand functions

• Numeric logical operators==, !=, <, >, <=, >=

• String logical operatorseq, ne, lt, gt, le, ge

Page 16: Perl Scripting

16

• Add and remove element from existing array (Push, pop, unshift, shift)• Rearranging arrays (reverse, sort)@a = qw(one two three four five six);print "@a\n";

one two three four five six

unshift(@a,“zero"); # add elements to the array print "@a\n"; # from the left side

zero one two three four five six

shift(@a); # removes elements from the arrayprint "@a\n"; # from the left side

one two three four five six

@a = reverse(@a); # reverse the order of the arrayprint "@a\n";

six five four three two one

@a = sort(@a); # sort the array in alphabetical orderprint "@a\n";

five four one six three two

Operators Operators and functionsand functions

Page 17: Perl Scripting

17

• Removes last character from a string (chop)• Removes newline character, \n,from end of a

string (chomp)• Breaks a regular expression into fields (split)

and joints the pieces back (join)

$a = "this is my expression\n";print "$a";

this is my expression

chomp($a);print "$a …. ";@a = split(/ /,$a); # splits $a string into an array called @aprint "$a[3] $a[2] $a[1] $a[0]\n";

this is my expression…. expression my is this

$a = join(":",@_); # create a string called $a by joiningprint "$a \n"; # all the elements in the array @a and

# having “:” spaced between themthis:is:my:expression

Operators Operators and functionsand functions

Page 18: Perl Scripting

18

• Substituting a pattern (=~ s/…./…../)• Transliteration (=~ tr/…./…./)

$_ = "this is my expression\n";print "$_\n";

this is my expression

$_ =~ s/my/your/;print "$_\n";

this is your expression

$_ =~ tr/a-z/A-Z/;print "$_\n";

THIS IS YOUR EXPRESSION

Operators Operators and functionsand functions

Page 19: Perl Scripting

19

Flow controlFlow controlconstructsconstructs

Control_operator (expression(s) ) {statement_block;

}

Example:

if ( $i < $N ) {statement_block;

} else {statement_block;

}

foreach $i ( @list_of_items ) {statement_block;

}

Page 20: Perl Scripting

20

SubroutinesSubroutines@a = qw(1 2 3 4);print summation(@a),"\n";

sub summation {my $k = 0;foreach $i (@_) {

$k += $i;}return($k);

}

10

# assigns an array “@a”# prints results of subroutine # summation using “@a” as# input

# summing every element in# the array “@a” and return# the value as $k

Page 21: Perl Scripting

21

CommandCommand--line argumentsline arguments

#!/usr/bin/perl

print "Command name: $0\n";print "Number of arguments: $#ARGV\n";

for ($i=0; $i <= $#ARGV; $i++) {print "Arg $i is $ARGV[$i]\n";

}

% ./arguments.pl zero one two three

Number of arguments: 3Arg 0 is zeroArg 1 is oneArg 2 is twoArg 3 is three

Page 22: Perl Scripting

22

Concatenating StringsConcatenating Stringswith the . operatorwith the . operator

$firstname = “George”;$midname = “walker”;$lastname = “Bush”;

$fullname = $lastname . “, “ . $firstname . “ “. uc(substr $midname, 0, 1) . “.\n”;

print $fullname;

Bush, George W.

Page 23: Perl Scripting

23

UNIX Environment VariablesUNIX Environment Variables

print “ your username is $ENV{‘USER’} and \n”;print “ your machine name is $ENV{‘HOST’} and \n”;print “ your display is set to $ENV{‘DISPLAY’} and \n”;print “ your shell is $ENV{‘SHELL’} and \n”;print “ your timezone is $ENV{‘TZ’} etcetera.\n”;

your username is shamy and your machine name is cirrus.msi.umn.edu and your display is set to localhost:10.0 and your shell is /bin/tcsh and your timezone is CST6CDT, etcetera...

Page 24: Perl Scripting

24

Typical UNIX scripting tasksTypical UNIX scripting tasks

• Filter a file or a group of files• Searching/Matching• Naming file sequences• Executing applications• Parsing files

Page 25: Perl Scripting

25

Filtering standard inputFiltering standard input#!/usr/bin/perlwhile( <> ) { # read from stdin one line at a time

print "line $. : $_" ; # print current line to stdout}

print.txtSilicon Graphics' Info Search lets you find all the informationavailable on a topic using a keyword search. Info Search looksbeginthrough all the release notes, man pages, and/online books youdonehave installed on your system or on a networked server. Fromthe Toolchest on your desktop, choose Help-Info Search.begin

Quick Answers tells you how to connect to an Internet Service Provider (ISP).doneFrom the Toolchest on your desktop, chooseHelp > Quick Answers > How Do I > Connect to an Internet Service Provider.through all the release notes, man pages, and/online books youQuick Answers tells you how to connect to an Internet Service Provider (ISP).

Page 26: Perl Scripting

26

Filtering standard Filtering standard inputinput

./printlines.pl print.txt

line 1 : Silicon Graphics' Info Search lets you find all the informationline 2 : available on a topic using a keyword search. Info Search looksline 3 : beginline 4 : through all the release notes, man pages, and/online books youline 5 : doneline 6 : have installed on your system or on a networked server. Fromline 7 : the Toolchest on your desktop, choose Help-Info Search.line 8 : beginline 9 :line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).line 11 : doneline 12 : From the Toolchest on your desktop, chooseline 13 : Help > Quick Answers > How Do I > Connect to an Internet Service Provider.line 14 : through all the release notes, man pages, and/online books youline 15 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).

Page 27: Perl Scripting

27

Filtering standard inputFiltering standard input#!/usr/bin/perlwhile( <> ) {

print "line $. : $_" unless $. %2; # print only the even lines}

./printeven.pl print.txt

line 2 : available on a topic using a keyword search. Info Search looksline 4 : through all the release notes, man pages, and/online books youline 6 : have installed on your system or on a networked server. Fromline 8 : beginline 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).line 12 : From the Toolchest on your desktop, chooseline 14 : through all the release notes, man pages, and/online books you

Page 28: Perl Scripting

28

Filtering standard inputFiltering standard input#!/usr/bin/perlwhile( <> ) {

if( /begin/ .. /done/ ) {print "line $. : $_“; # prints any text that

} # starts with “begin”} # and finishes with “end”

./printpattern.pl print.text

line 3 : beginline 4 : through all the release notes, man pages, and/online books youline 5 : doneline 8 : beginline 9 :line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).line 11 : done

Page 29: Perl Scripting

29

Filtering standard inputFiltering standard input#!/usr/bin/perlwhile( <> ) {

if( /begin/ .. /done/ ) {unless( /begin/ || /done/ ) {

print "line $. : $_“; }

}}

./printpattern2.pl print.text

line 4 : through all the release notes, man pages, and/online books youline 9 :line 10 : Quick Answers tells you how to connect to an Internet Service Provider (ISP).

Page 30: Perl Scripting

30

Naming filesNaming files

• Files• Reformating files

Page 31: Perl Scripting

31

FilesFiles

#!/usr/bin/perl# touch.pl

foreach $i ( 0 .. 50 ) {print "touch gifdir/$i.gif\n";system("touch gifdir/$i.gif");

}

./touch.pl

Perl executes the following in unix:

touch gifdir/0.giftouch gifdir/1.giftouch gifdir/2.giftouch gifdir/3.giftouch gifdir/4.gif

.

.

.touch gifdir/48.giftouch gifdir/49.giftouch gifdir/50.gif

Page 32: Perl Scripting

32

FilesFiles% ls –lt gifdir/*.gif

-rw------- 1 shamy support 995343 Oct 21 18:50 50.gif-rw------- 1 shamy support 995343 Oct 21 18:50 49.gif-rw------- 1 shamy support 995343 Oct 21 18:50 48.gif-rw------- 1 shamy support 995343 Oct 21 18:50 47.gif-rw------- 1 shamy support 995343 Oct 21 18:50 46.gif

.

.

.-rw------- 1 shamy support 995343 Oct 21 18:50 4.gif-rw------- 1 shamy support 995343 Oct 21 18:50 3.gif-rw------- 1 shamy support 995343 Oct 21 18:50 2.gif-rw------- 1 shamy support 995343 Oct 21 18:50 1.gif-rw------- 1 shamy support 995343 Oct 21 18:50 0.gif

Page 33: Perl Scripting

33

FilesFiles#!/usr/bin/perl

foreach $i ( 0 .. 50 ) {$new = sprintf("step%3.3d.gif", $i); # naming the gif file withprint "mv gifdir2/$i.gif gifdir2/$new\n"; # with a 3 digit numberingsystem "mv gifdir2/$i.gif gifdir2/$new"; # scheme

}

./rename.pl

Perl executes the following in unix:

mv gifdir2/0.gif gifdir2/step000.gifmv gifdir2/1.gif gifdir2/step001.gifmv gifdir2/2.gif gifdir2/step002.gifmv gifdir2/3.gif gifdir2/step003.gifmv gifdir2/4.gif gifdir2/step004.gif

.

.mv gifdir2/47.gif gifdir2/step047.gifmv gifdir2/48.gif gifdir2/step048.gifmv gifdir2/49.gif gifdir2/step049.gifmv gifdir2/50.gif gifdir2/step050.gif

Page 34: Perl Scripting

34

FilesFilesls gifdir2 (before)

gifdir2:

0.gif 14.gif 2.gif 25.gif 30.gif 36.gif 41.gif 47.gif 7.gif1.gif 15.gif 20.gif 26.gif 31.gif 37.gif 42.gif 48.gif 8.gif10.gif 16.gif 21.gif 27.gif 32.gif 38.gif 43.gif 49.gif 9.gif11.gif 17.gif 22.gif 28.gif 33.gif 39.gif 44.gif 5.gif12.gif 18.gif 23.gif 29.gif 34.gif 4.gif 45.gif 50.gif13.gif 19.gif 24.gif 3.gif 35.gif 40.gif 46.gif 6.gif

ls gifdir2 (after)

gifdir2:

script step008.gif step017.gif step026.gif step035.gif step044.gifstep000.gif step009.gif step018.gif step027.gif step036.gif step045.gifstep001.gif step010.gif step019.gif step028.gif step037.gif step046.gifstep002.gif step011.gif step020.gif step029.gif step038.gif step047.gifstep003.gif step012.gif step021.gif step030.gif step039.gif step048.gifstep004.gif step013.gif step022.gif step031.gif step040.gif step049.gifstep005.gif step014.gif step023.gif step032.gif step041.gif step050.gifstep006.gif step015.gif step024.gif step033.gif step042.gifstep007.gif step016.gif step025.gif step034.gif step043.gif

Page 35: Perl Scripting

35

Parsing and Parsing and reformatingreformating FilesFilesHEADER CALCIUM-BINDING PROTEIN 29-SEP-92 1CLL 2COMPND CALMODULIN (VERTEBRATE) 1CLL 3REMARK 1 REFERENCE 1 1CLL 13REMARK 1 AUTH W.E.MEADOR,A.R.MEANS,F.A.QUIOCHO 1CLL 14RORIGX2 0.000000 0.018659 0.001155 0.00000 1CLL 143

.

.

.ATOM 1 N LEU 4 -6.873 21.082 25.312 1.00 49.53 1CLL 148ATOM 2 CA LEU 4 -6.696 22.003 26.447 1.00 48.82 1CLL 149ATOM 3 C LEU 4 -6.318 23.391 25.929 1.00 46.50 1CLL 150ATOM 4 O LEU 4 -5.313 23.981 26.352 1.00 45.72 1CLL 151ATOM 5 N THR 5 -7.147 23.871 25.013 1.00 46.77 1CLL 152ATOM 6 CA THR 5 -6.891 25.193 24.428 1.00 46.84 1CLL 153

.

.

.CONECT 724 723 1137 1CLL1440CONECT 736 735 1137 1CLL1441

Page 36: Perl Scripting

36

Parsing FilesParsing Files#!/usr/bin/perl

$pdbfile = shift;($pref = $pdbfile) =~ s/\.pdb//;

print "Converting $pdbfile to $pref.xyz \n";open(FILIN, "<$pdbfile" || die "Cannot open pdb file $pdbfile \n ");open(FILOUT,">$pref.xyz");

while (<FILIN>) {if (/^ATOM/) {chomp;split;printf FILOUT "%5d %4s %8.3f%8.3f%8.3f\n", $_[1], substr($_[2], 0, 1), $_[5], $_[6], $_[7];

}}

close(FILIN);close(FILOUT);

Page 37: Perl Scripting

37

ReformatingReformating FilesFiles

./pdb2xyz.pl foo.pdb

more foo.xyz

1 N -6.873 21.082 25.3122 C -6.696 22.003 26.4473 C -6.318 23.391 25.9294 O -5.313 23.981 26.3525 N -7.147 23.871 25.0136 C -6.891 25.193 24.428

.

.

.

Page 38: Perl Scripting

38

Executing applicationsExecuting applications#!/usr/bin/perl

$pdbfile = shift(@ARGV);($pref = $pdbfile) =~ s/.pdb//; #create a variable $pref using the prefix

#of the pdb filensystem ("rm -r $pref");system ("mkdir $pref"); #create a directory named after $prefchdir ("$pref"); #change directory into $prefopen(SCRIPT,">script"); #create a a file called script

print SCRIPT "zap\n";print SCRIPT "load pdb ../$pdbfile\n";print SCRIPT "background black\n";print SCRIPT "wireframe off\n";print SCRIPT "ribbons on\n";print SCRIPT "color ribbons yellow\n";for ($i = 0; $i <= 50; ++$i) { #assigns a value from 0 to 50$name = sprintf("%3.3d",$i); #create a file name based on this valueprint SCRIPT "rotate x 10\n"; #for every value, rotate 10 degreesprint SCRIPT "write $name.gif\n"; #generate a gif file for each value

}print SCRIPT "quit\n";close SCRIPT;

system("/usr/local/bin/rasmol < script"); #execute the rasmol programsystem("dmconvert -f mpeg1v -p video ###.gif out.mpeg"); #execute dmconvert to make moviechdir ("..");

Page 39: Perl Scripting

39

Executing Executing applicationsapplications

more foo/script

background blackwireframe offribbons oncolor ribbons yellowrotate x 10write 000.gifrotate x 10write 001.gifrotate x 10write 002.gif..

ls -lt foo

total 99699-rw------- 1 shamy support 256504 Oct 21 18:34 out.mpeg-rw------- 1 shamy support 995343 Oct 21 18:33 050.gif-rw------- 1 shamy support 995343 Oct 21 18:33 049.gif-rw------- 1 shamy support 995343 Oct 21 18:33 048.gif

.

.-rw------- 1 shamy support 995343 Oct 21 18:32 002.gif-rw------- 1 shamy support 995343 Oct 21 18:32 001.gif-rw------- 1 shamy support 995343 Oct 21 18:32 000.gif-rw------- 1 shamy support 1418 Oct 21 18:32 script

Page 40: Perl Scripting

40

Parsing a DNA sequenceParsing a DNA sequence>sequence1XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGGCACGAGGTGGAAAAGCAATATCTTAACATTTTAGGACTGATTTCAGAAATAGAAGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTGTGGTGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTACCATGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAATCAGCAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAAGCAATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX>sequence2XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXGGCGATGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTGTGGTGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTACCATGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAATCAGCAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAAGCAATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Page 41: Perl Scripting

41

Parsing a DNA sequenceParsing a DNA sequence#!/usr/bin/perl

while (<>) {if ($_ =~ /^\>/ || eof ) {if ($count > 0) {$line = join("",@line);print $seq;fixhead($line);fixtail($line);write stdout;

}$count = 0;$seq = $_;@line = "";

} else {chomp;++$count;push(@line,$_);

}}

format stdout =

~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<$line.

Page 42: Perl Scripting

42

Parsing a DNA sequenceParsing a DNA sequencesub fixhead {$length = length($line);for ($i = 0;$i <= $length; ++$i){if (substr($line,0,1) eq "X"){$line = substr($line,1,$length-1);

} else {return;

}}

}

sub fixtail {$length = length($line);for ($i = 0;$i <= $length; ++$i){if (substr($line,$length-($i+1),1) eq "X"){$line = substr($line,0,$length-($i+1));

} else {return;

}}

}

Page 43: Perl Scripting

43

Parsing a DNA sequenceParsing a DNA sequence>sequence1

GGCACGAGGTGGAAAAGCAATATCTTAACATTTTAGGACTGATTTCAGAAATAGAAGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTGTGGTGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTACCATGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAATCAGCAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAAGCAATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC

>sequence2

GGCGATGAGCAACAGAATATTCACAGTXXXXXXXXXXXXXXXAATGTCTGTGGTGGTGATTTTCCAAATTACTCTCTGTATGTXXXXXXXXXXACCTTTACCATGTTATTCTTCTGAGATTAAAAAGGAAAAAAAAATCATTGTCAAAATCAGCAATGTCTAGTGAGTGTGTATGCACAGGCTGTAACAGGCAATGGCAAGCAATAAAGTGATTAGCAAAGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC

Page 44: Perl Scripting

44

Creating DNA Creating DNA sequence fragmentssequence fragments

• >chr4• GGACAGCGGAATCCTCGACCCGGTTGAGGAATGGTCGACGAAAATCTATCGGGTTCGAGG• ATTCGTCGACCAGGTGTTGGAATCGTCGACCGAGTCTGAGAATTCGTAGACCAGGACGGC• GGAATCCTCGACAATGACGAGGTATGGTCGAGGAAAATCTATCGGGTTCGAGGATTCGTC• TACCAGGTGATGGAATCCTCGACCAGGACAAAGAATTCGTCGACCAGGGGTGGAATTGTT• GTTATTCCGATCATGAGAGCGGATATCAGTACAGATCCGACGCTGGTGAAAAAGATCACG• GCGATCGTGGATAGTATCAAGCCACCGAGAGTCTCGTATTCGGAGAAAGATCGGCCGATG• AGGATAAGAGGTCGATCGGATGGACGGAAGAGGTAGAGGAAGAGCCATGAAGCGGCGAGG• CATAGGAGGAGGATGAGCGAGAATGGGTGGGCGGGAAGAGAGAAACTGATGATCAGAGCG• ATGATGCAGACGTAATTCACCCTGAAATAAGAGGAGTTCTTCCAGAATCGCGTCATGGCC• TAAGGGTTAGGGGTTAAGGGTTAAGGGTTTAGGGTTAAGGGTTAAGGGTTTAGGGTTTAG• GGTTTAGG

Page 45: Perl Scripting

45

Parsing a DNA sequenceParsing a DNA sequence#!/usr/bin/perl#

$infile = $ARGV[0];$break_length = $ARGV[1];$overlap_length = $ARGV[2];

$seq_count = 0;$count = 0;$fileflag = 0;

open (IN, "< $infile" ) || die "can not open input file for reading: $!\n";

while (<IN>) {if (!(/^\>/ )) {

chomp;push(@line,$_);

}}

$seq = join("",@line);$length = length($seq);$nfrag = int($length/$break_length);$frag_length = $break_length + $overlap_length;

print "The break length = $break_length\n";print "The overlap length = $overlap_length\n";print "The total length of the sequence = $length\n";print "The total length of each fragment = $frag_length\n";print "The total number of fragments = $nfrag\n\n";

Page 46: Perl Scripting

46

Parsing a DNA sequenceParsing a DNA sequence

for ($i = 0;$i <= $nfrag; ++$i){$start = $i * $break_length;$stop = $i * $break_length+$frag_length;$frag = substr($seq,$start,$frag_length);

# $outfile = $infile.sprintf("_%5.5d_%5.5d",$start,$stop);$outfile = $infile."_".$start."_".$stop;open( OUT, "> $outfile" ) || die "Can not open output file\n";print "Writing framgment from $start to $stop to fragment file $outfile\n";print OUT "$outfile $start $stop\n";write OUT;

}

format OUT =~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<$frag.

Page 47: Perl Scripting

47

Parsing a DNA sequenceParsing a DNA sequenceyuk.pl short 50 5

The break length = 50The overlap length = 5The total length of the sequence = 608The total length of each fragment = 55The total number of fragments = 12

Writing framgment from 0 to 55 to fragment file short_0_55Writing framgment from 50 to 105 to fragment file short_50_105Writing framgment from 100 to 155 to fragment file short_100_155Writing framgment from 150 to 205 to fragment file short_150_205Writing framgment from 200 to 255 to fragment file short_200_255Writing framgment from 250 to 305 to fragment file short_250_305Writing framgment from 300 to 355 to fragment file short_300_355Writing framgment from 350 to 405 to fragment file short_350_405Writing framgment from 400 to 455 to fragment file short_400_455Writing framgment from 450 to 505 to fragment file short_450_505Writing framgment from 500 to 555 to fragment file short_500_555Writing framgment from 550 to 605 to fragment file short_550_605Writing framgment from 600 to 655 to fragment file short_600_655

.

Page 48: Perl Scripting

48

Parsing a DNA sequenceParsing a DNA sequence.

more short_*

short_0_55 0 55GGACAGCGGAATCCTCGACCCGGTTGAGGAATGGTCGACGAAAATCTATCGGGTT

...skipping...short_100_155 100 155

AATTCGTAGACCAGGACGGCGGAATCCTCGACAATGACGAGGTATGGTCGAGGAA...skipping...short_150_205 150 205

AGGAAAATCTATCGGGTTCGAGGATTCGTCTACCAGGTGATGGAATCCTCGACCA...skipping...short_200_255 200 255

GACCAGGACAAAGAATTCGTCGACCAGGGGTGGAATTGTTGTTATTCCGATCATG...skipping...short_250_305 250 305

TCATGAGAGCGGATATCAGTACAGATCCGACGCTGGTGAAAAAGATCACGGCGAT...skipping...short_300_355 300 355

GCGATCGTGGATAGTATCAAGCCACCGAGAGTCTCGTATTCGGAGAAAGATCGGC...skipping...

Page 49: Perl Scripting

49

Convert Convert seqseq to to fastafasta formatformat

ls *.seq

tc86660.seq tc86662.seq tc86664.seqtc86666.seq tc86668.seq tc86661.seqtc86663.seq tc86665.seq tc86667.seqtc86669.seq

Page 50: Perl Scripting

50

Convert Convert seqseq to to fastafasta formatformat

source /usr/local/gcg/gcgstartup

gcg

submitfasta.pl

Page 51: Perl Scripting

51

Convert Convert seqseq to to fastafasta formatformat

#!/usr/bin/perl

@list =`ls -1 *.seq`;

foreach $i (@list) {chomp($i);

system("/usr/local/gcg_10.3/solarisbin/gcgbin/execute/tofasta $i -Default");

}

Page 52: Perl Scripting

52

Convert Convert seqseq to to fastafasta formatformat

ls

README tc86661.tfa tc86664.tfa tc86667.tfasubmitfasta.pl* tc86662.seq tc86665.seq tc86668.seqsubmitfasta.pl~* tc86662.tfa tc86665.tfa tc86668.tfatc86660.seq tc86663.seq tc86666.seq tc86669.seqtc86660.tfa tc86663.tfa tc86666.tfa tc86669.tfatc86661.seq tc86664.seq tc86667.seq

Page 53: Perl Scripting

53

Parsing Blast outputParsing Blast output<?xml version="1.0"?><!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd"><BlastOutput>

<BlastOutput_program>blastn</BlastOutput_program><BlastOutput_version>blastn 2.2.5 [Nov-16-2002]</BlastOutput_version><BlastOutput_reference>~Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.

Schaffer, ~Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), ~&quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search~programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>

<BlastOutput_db>ecoli.nt</BlastOutput_db><BlastOutput_query-ID>lcl|QUERY</BlastOutput_query-ID><BlastOutput_query-def>sequence1</BlastOutput_query-def><BlastOutput_query-len>319</BlastOutput_query-len><BlastOutput_param>

<Parameters><Parameters_expect>10</Parameters_expect><Parameters_sc-match>1</Parameters_sc-match><Parameters_sc-mismatch>-3</Parameters_sc-mismatch><Parameters_gap-open>5</Parameters_gap-open><Parameters_gap-extend>2</Parameters_gap-extend><Parameters_filter>D</Parameters_filter>

</Parameters></BlastOutput_param>

Page 54: Perl Scripting

54

Parsing Blast outputParsing Blast output<BlastOutput_iterations>

<Iteration><Iteration_iter-num>1</Iteration_iter-num><Iteration_hits>

<Hit><Hit_num>1</Hit_num><Hit_id>gi|1789957|gb|AE000431.1|AE000431</Hit_id><Hit_def>Escherichia coli K-12 MG1655 section 321 of 400 of the complete genome</Hit_def><Hit_accession>AE000431</Hit_accession><Hit_len>11575</Hit_len><Hit_hsps>

<Hsp><Hsp_num>1</Hsp_num><Hsp_bit-score>30.2282</Hsp_bit-score><Hsp_score>15</Hsp_score><Hsp_evalue>1.12539</Hsp_evalue><Hsp_query-from>267</Hsp_query-from><Hsp_query-to>253</Hsp_query-to><Hsp_hit-from>8485</Hsp_hit-from><Hsp_hit-to>8499</Hsp_hit-to><Hsp_query-frame>1</Hsp_query-frame><Hsp_hit-frame>-1</Hsp_hit-frame><Hsp_identity>15</Hsp_identity><Hsp_positive>15</Hsp_positive><Hsp_align-len>15</Hsp_align-len><Hsp_qseq>GCTAATCACTTTATT</Hsp_qseq><Hsp_hseq>GCTAATCACTTTATT</Hsp_hseq><Hsp_midline>|||||||||||||||</Hsp_midline>

</Hsp></Hit_hsps>

</Hit><Hit>

<Hit_num>2</Hit_num><Hit_id>gi|1789185|gb|AE000366.1|AE000366</Hit_id>

Page 55: Perl Scripting

55

Parsing Blast outputParsing Blast output.more test.out.1.xls

sequence1</BlastOutput_query-def> 319

more test.out.1.xls

sequence1</BlastOutput_query-def> 1 Esch … AE000431 30.2282 1.12539 11575 1sequence1</BlastOutput_query-def> 2 Esch … AE000366 30.2282 1.12539 10405 1sequence1</BlastOutput_query-def> 3 Esch … AE000467 28.2459 4.44683 15633 1sequence1</BlastOutput_query-def> 4 Esch … AE000410 28.2459 4.44683 10826 1sequence1</BlastOutput_query-def> 5 Esch … AE000300 28.2459 4.44683 16939 1sequence1</BlastOutput_query-def> 6 Esch … AE000220 28.2459 4.44683 9780 1sequence1</BlastOutput_query-def> 7 Esch … AE000170 28.2459 4.44683 10627 1sequence1</BlastOutput_query-def> 8 Esch … AE000123 28.2459 4.44683 11093 1

more test.out.1.xls

sequence1</BlastOutput_query-def> 1 1 267 253 8485 8499 15 15 100.00 0 30.2282 1.12539sequence1</BlastOutput_query-def> 2 1 59 73 7824 7838 15 15 100.00 0 30.2282 1.12539sequence1</BlastOutput_query-def> 3 1 101 114 9628 9641 14 14 100.00 0 28.2459 4.44683sequence1</BlastOutput_query-def> 4 1 160 147 2067 2080 14 14 100.00 0 28.2459 4.44683sequence1</BlastOutput_query-def> 5 1 22 9 2971 2984 14 14 100.00 0 28.2459 4.44683sequence1</BlastOutput_query-def> 6 1 95 108 5209 5222 14 14 100.00 0 28.2459 4.44683sequence1</BlastOutput_query-def> 7 1 33 16 2344 2361 18 17 94.44 0 28.2459 4.44683sequence1</BlastOutput_query-def> 8 1 40 53 2390 2403 14 14 100.00 0 28.2459 4.44683

Page 56: Perl Scripting

56

Executing Executing applicationsapplicationsin a queuein a queue

foreach $i (@files) {++$count;print $i;chomp($i);($prefix = $i) =~ s/\.pdb//;`cp $dir/$i .`;&wait();`./run.pl $i`;

}

sub wait {loop: $check = `llq -u duany | grep " [IR] "|wc`;@check = split(/\t/,$check);print "There are $check[0] in the queue\n";if ($check[0] > 5) {

print "I am sleeping\n";sleep 60;goto loop;

} else {print "I am awake\n";print "I am right now working on $i\n";return;

}}

Page 57: Perl Scripting

57

More infoMore info

• CPAN - Comprehensive Perl Archive Network– http://www.cpan.org– Source, binaries, libs, scripts, FAQ’s, links

• Perl Resource Topics– http://www.perl.com/pub/q/resources

• Bioperl– http://bioperl.org/

• Others– http://www.netcat.co.uk/rob/perl/win32perltut.html– http://www.1001tutorials.com/perltut/index.html– http://www.perlmasters.com/tutorial– http://www-2.cs.cmu.edu/cgi-bin/perl-man– Countless more are available...

Page 58: Perl Scripting

58

Contact the Institute for Contact the Institute for additional helpadditional help

Yuk ShamComputational Biology/Biochemistry Consultant

Phone: (612) 624 7427 (direct)Email: [email protected]