Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with...

26
Perl II Part III: Motifs and Loops

Transcript of Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with...

Page 1: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Perl II

Part III: Motifs and Loops

Page 2: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Objectives

Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops Use basic regular expressions Responding to conditional tests Examining sequence data in detail

Page 3: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Conditional Tests if (1 == 1) { print “1 equals 1\n”; } if (1) { print “What does this evaluate to?\n”; } if (1 == 0) { print “1 equals 0\n”; } if (0) { print “1 evaluates to true\n”; }

Page 4: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Conditional if/else

if (1 == 1) {

print “1 equals 1\n\n”;

} else {

print “1 does not equal 1\n\n”;

}

unless (1 == 0) {

print “1 does not equal 0\n\n”;

} else {

print “1 does 0?\n\n”;

}

Conditionals also use:

==, !=, >=, <=, >, <, <>

For text:

“” and ‘’ evaluate to true

Page 5: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

More conditionals …

#!usr/bin/perl –w

#if-elseif-else

$word = “MNIDDKL”;

if ($word eq ‘QSTLV’) { print “QSTLV\n”; }

elseif ($word eq ‘MSRQQNKISDH’) { print “MSRQQNKISDH\n”; }

else { print “What is \”$word\”?\n”;

exit;

Page 6: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Using Loops to Open and Read Files

#!/usr/bin/perl –w

$proteinFilename = “NM_012345.pep”;

#open the file and catch the error

unless (open(MUPPETFILE, $proteinFilename) ) {

print “Could not open file $proteinFilename!\n”;

exit; }

#read data using a while loop, and print

while ($protein = <MUPPETFILE>) {

print “##### Here is the next line of the file:\t”;

print $protein,”\n”; }

close MUPPETFILE;

exit;

Page 7: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Motif finding –www.expasy.ch/prosite/

Something genuinely useful Program Flow –

• Reads in protein sequence from file

• Puts all sequence data into one string for easy searching

• Looks for motifs that the user types into the keyboard

Page 8: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

#!/usr/bin/perl –w

#searching for motifs

#Ask the user for the filename of the data file

print “Please type the filename of the data file: “;

$proteinFilename = <STDIN>;

chomp $proteinFilename;

#Open the file or exit

open (PROTEINFILE, $proteinFilename) or die (“Error: $!”);

#Read file into an array and close

@protein = <PROTEINFILE>;

close PROTEINFILE;

#Put data into a single string to make it easier to search

$protein = join(‘’, @protein);

$protein =~ s/[\s\t\f\r\n ]//g;

•Reading: ”<filename”

•Writing: “>filename”, discard current contents if it already exists

•Append: “>>filename”, open or create file for writing at end of file

•Update: “+<filename”, open a file for update (reading and writing)

•New Update: “+>filename”, create file for update is non-existent

This operator will read data in until it reached the special $/ character, which is set to default as \n

Page 9: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

#Ask the user for a motif, search for it, and report

#if it was found. Exit if no motif was entered.

do {

print “Enter a motif to search for: “;

$motif = <STDIN>;

chomp $motif;

if ($protein =~ m/$motif/) {

print “I found it!\n\n”;

} else {

print “I couldn’t find it!\n\n”;

}

#exit on user prompt

} until ($motif =~ /^\s*$/);

exit;

Page 10: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Regular Expressions

Very powerful methods for matching wildcards to strings

Very cryptic Perl reads =~ /n/ as =~ m/n/ The delimiter is flexible, it acccepts any

nonalphanumeric nonwhitespace character (eg. #({[,.’)

Page 11: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

MetasymbolsSymbol Meaning Symbol Meaning

\0 Null Char (ASCII NULL) \NNN Char given in octal, to 377\n Nth previously captured string \a Match the alarm character\A True, at the beginning of a string \b Match the backspace char\b True, at word boundary \B True when not a word boundary

\cX Match the control char \d Any digit\D Any nondigit \e Match the escape char\E \f Formfeed\G \n Match the newline (NL or CR)\r Match the return char \s Match any whitespace\S Match any nonwhitespace \t Match any tab (HT) char\w Match any ‘word’ char _AZ09 \W Match any nonword char

\x{abcd} Match the char given in hex \z True at end of string\Z \z or before newline () Pattern brackets[ ] Either/or pattern brackets | or

{n} Match only n characters . Any char but a newline* Zero or more times $ Occurring at end of string^ Beginning of string {n,m} Match n to m char, inclusive? Zero or one occurrences + One or more occurrences!~ Non-match

Page 12: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Look-behind assertion

(?<=value1)value2• $string = “English goodly spoken here”;

• $string =~ s/(?<=English )goodly/well/;

(?=value1)value2 : look ahead (!=value1)value2 : not look ahead (!<=value1)value2 : not look behind

Page 13: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Backreferences

Pattern == “2y 4 2 22y2y” $string =~ /(\d\w)\s+(\d)\s+(\d)\s\3\1\1/; backreferencing works within brackets

from left to right

Page 14: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

#!/usr/bin/perl –w

#determining the frequency of nucleotides

#Ask the user for the filename of the data file

print “Please type the filename of the data file: “;

$dnaFilename = <STDIN>;

? $dnaFilename;

#Open the file or exit

open (DNA, $dnaFilename) or die (“Error: ?”);

#Read file into an array and close

@dna = <DNA>;

close DNA;

#Put data into a single string to make it easier to search

$dna = join(‘’, @dna);

$dna =~ s/[\s\t\f\r\n ]//g;

Page 15: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

#Explode the $dna string into an array where it will be

#easier to iterate through them and count their numbers

@dna = split(‘’,$dna);

#Initialize the counts

$A_Number = 0;

$C_Number = 0;

$G_Number = 0;

$T_Number = 0;

$Errors = 0;

Page 16: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

#Loop through the bases, examine each to determine what

#each nucleotide is and increment the appropriate numberforeach $base (@dna) {

if ($base eq ‘A’) ++$A_Number;

elseif ($base eq ‘C’) ++$C_Number;

elseif ($base eq ‘G’) ++$G_Number;

elseif ($base eq ‘T’) ++$T_Number;

else {

print “Error: I don’t recognize the base\n”;

++$Errors;

}

}print “Base\tNumber\nA=\t$A_Number\nB=\t$B_Number\n”;

print “C=\t$C_Number\nG=\t$G_Number\n\n”;

Page 17: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

foreach $base (@dna) {

if ($base eq ‘A’) ++$A_Number;

elseif ($base eq ‘C’) ++$C_Number;

elseif ($base eq ‘G’) ++$G_Number;

elseif ($base eq ‘T’) ++$T_Number;

else {

print “Error: I don’t recognize the base\n”;

++$Errors;

}

}

foreach (@dna) {

if (/A/) ++$A_Number;

elseif (/C/) ++$C_Number;

elseif (/G/) ++$G_Number;

elseif (/T/) ++$T_Number;

else { Print “Error when reading base\n”; ++$Errors; }

}

Page 18: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Tricky little ifs

if ($string =~ /\d{3,4}/) print “the string is 3 to four characters long\n”;

=

print “the string is 3 to four characters long\n” if ($string =~ /\d{3,4}/);

Page 19: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

#!/usr/bin/perl –w

#determining the frequency of nucleotides

#Ask the user for the filename of the data file

print “Please type the filename of the data file: “;

$dnaFilename = <STDIN>;

chomp $dnaFilename;

#See if the file exists then open it

unless( -e $dnaFilename) { print “\”$dnaFilename\” does not exist”; exit; }

open (DNA, $dnaFilename) or die (“File Error”);

#Put data into a single string to make it easier to search

$dna = join(‘’, @dna);

$dna =~ s/[\s\t\f\r\n ]//g;

Let’s do the same thing but save on some memory by not creating an array

@dna = <DNA>;

close DNA;

Page 20: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

#Initialize the counts

$A_Number = 0;

$C_Number = 0;

$G_Number = 0;

$T_Number = 0;

$Errors = 0;

Page 21: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

#Loop through the bases, examine each to determine what

#each nucleotide is and increment the appropriate numberfor ($position=0; $position<length $dna; ++$position) {

$base = substr($dna, $position, 1); $_

if ($base eq ‘A’) ++$A_Number;

elseif ($base eq ‘C’) ++$C_Number;

elseif ($base eq ‘G’) ++$G_Number;

elseif ($base eq ‘T’) ++$T_Number;

else {

print “Error: I don’t recognize the base\n”;

++$Errors;

}

}

print “Base\tNumber\nA=\t$A_Number\nB=\t$B_Number\n”;

print “C=\t$C_Number\nG=\t$G_Number\n\n”;

while($base =~ /a/ig){$a++}

while($base =~ /c/ig){$c++}

while($base =~ /g/ig){$g++}

while($base =~ /t/ig){$t++}

while($base !~ /[acgt]/ig){$e++}

Page 22: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Writing to files

#All text data can be written to files

$outputfile = “results.txt”;

open(RESULTS, “>$ouputfile”) or die (“Error: $!”);

print RESULTS “These results are overwriting everything that existed in the file results.txt\n”;

Close RESULTS;

Page 23: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Command line arguments and subroutines

#!/usr/bin/perl –w

use strict;

#Arguments collected on the command line go into a special var

# called @ARGV and the program name resides in the var $0

my($title) = “$0 DNA\n\n”;

unless(@ARGV) {

print $title; exit;

}

my($input) = @ARGV[0];

print $input,”\n\n”;

exit;

Page 24: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Command line arguments and subroutines

#!/usr/bin/perl –w

use strict;

#Arguments collected on the command line go into a special var

# called @ARGV and the program name resides in the var $0

my($title) = “$0 DNA\n\n”;

unless(@ARGV) { print $title; exit; }

my($input) = @ARGV[0];

my($subRoutineResults) = Find_Length($input);

print “the length of your input is $subRoutineResults\n”;

exit;

sub Find_Length {

my($tmp) = @_;

$results = length($tmp);

$return $results;

}

Page 25: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

Passing by value vs reference Simple routines pass everything by value However, because of the subroutine array,

@_, values of arrays, hashes and scalers get flattened.

Ex. • my @i = (1..10);

• my @j = (1..23);

• reference_sub(@i, @j);

• sub {• my (@i, @j) = @_;

• print @i.”\n@j”;

• }

Page 26: Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.

my @i = (1..10);my @j = (1..23);reference_sub(\@i, \@j);#returned arrays can be referenced by @ but are globalprint “@i\n”;

sub {my ($i, $j) = @_;print $$j[2];push(@$i, ‘4’);}