PERL Variables and data structures Andrew Emerson, High Performance Systems, CINECA.
-
Upload
gwendoline-johnston -
Category
Documents
-
view
226 -
download
2
Transcript of PERL Variables and data structures Andrew Emerson, High Performance Systems, CINECA.
The “Hello World” programThe “Hello World” program
Consider the following:
#
# Hello World
#
$message=“Ciao, Mondo”;
print “$message \n”;
exit;
#
# Hello World
#
$message=“Ciao, Mondo”;
print “$message \n”;
exit;
Perl VariablesPerl Variables
$message is called a variable, something with a name used to hold one or more pieces of information.
All computer languages have the ability to create variables to store and manipulate data.
Perl differs from other languages because you do not specify the “type” (i.e. integer, real, character, etc.) only the “complexity” of the data.
Perl VariablesPerl Variables
Perl has 3 ways of storing data:
1. Scalar For single data items, like numbers or strings.
2. Arrays For ordered lists of scalars. Scalars indexed by
numbers.
3. Associative arrays or “hashes” Like arrays, but uses “keys” to identify the scalars.
Scalar VariablesScalar Variables
ExamplesExamples
#
$no_of_chrs=24; # integer
$per_cent_identity=0; # also integer
$per_cent_identity=99.50; # redefined as real
$pi = 3.1415926535; # floating point (real)
$e_value=1e-40; # using scientific notation
$dna=“GCCTACCGTTCCACCAAAAAAAA”; # string -double quotes
$dna=‘GCCTACCGTTCCACCAAAAAAAA’; # string -single quotes
#
$no_of_chrs=24; # integer
$per_cent_identity=0; # also integer
$per_cent_identity=99.50; # redefined as real
$pi = 3.1415926535; # floating point (real)
$e_value=1e-40; # using scientific notation
$dna=“GCCTACCGTTCCACCAAAAAAAA”; # string -double quotes
$dna=‘GCCTACCGTTCCACCAAAAAAAA’; # string -single quotes
Scalar VariablesScalar Variables
CASE is important, $DNA ≠ $dna; (true for all variables)
Scalars must be prefixed with a $ whenever they are used (is there a $? Yes → it is a scalar). The next character should be a letter and not a number (true for all variables).
Scalars can be happily redefined at any time (e.g. integer → real → string):
# unlikely example
$dna = 0; # integer
$dna = “GGCCTCGAACGTCCAGAAA”; # now it’s a # string
# unlikely example
$dna = 0; # integer
$dna = “GGCCTCGAACGTCCAGAAA”; # now it’s a # string
Doing things with scalars..Doing things with scalars..#
$a =1.5;
$b =2.0; $c=3;
$sum = $a+$b*$c; # multiply by $b by $c, add to $a
#
while ($j<100) {
$j++; # means $j=$j+1, i.e. add 1 to j
print “$j\n”;
}
#
$dna1=“GCCTAAACGTC”;
$polyA=“AAAAAAAAAAAAAAAA”;
$dna1 .= $polyA; # add one string to another
# (equiv. $dna1 = $dna1.$polyA)
$no_of_bases = length($dna2); # length of a scalar
#
$a =1.5;
$b =2.0; $c=3;
$sum = $a+$b*$c; # multiply by $b by $c, add to $a
#
while ($j<100) {
$j++; # means $j=$j+1, i.e. add 1 to j
print “$j\n”;
}
#
$dna1=“GCCTAAACGTC”;
$polyA=“AAAAAAAAAAAAAAAA”;
$dna1 .= $polyA; # add one string to another
# (equiv. $dna1 = $dna1.$polyA)
$no_of_bases = length($dna2); # length of a scalar
More about strings..More about strings..
There is a difference between strings with ‘ and “
#
$nchr = 24;
$message=“chromosones in human cell =$nchr”;
print $message;
$message = ‘chromosones in human cell =$nchr’;
print $message;
exit;
#
$nchr = 24;
$message=“chromosones in human cell =$nchr”;
print $message;
$message = ‘chromosones in human cell =$nchr’;
print $message;
exit;single quotes
double quotes
OUTPUT
chromosones in human cell =24
chromosones in human cell =$nchr
OUTPUT
chromosones in human cell =24
chromosones in human cell =$nchr
More about stringsMore about strings
Double quotes “ interpret variables, single quotes ‘ do not:
$dna=‘GTTTCGGA’;
print “sequence=$dna”;
print ‘sequence=$dna’;
$dna=‘GTTTCGGA’;
print “sequence=$dna”;
print ‘sequence=$dna’;
OUTPUT
sequence=GTTTCGGA
sequence=$dna
OUTPUT
sequence=GTTTCGGA
sequence=$dna
Normally you would want double quotes when using print.
@days_in_month=(31,28,31,30,31,30,31,31,30,31,30,31);
@days_of_the_week=(‘mon’, ‘tue’, ‘wed’ ,’thu’,’fri’,’sat’,’sun’);
@bases = (‘adenine’, ‘guanine’, ‘thymine’, ‘cytosine’, ‘uracil’);
@GenBank_fields=( ‘LOCUS’,
‘DEFINITION’,
‘ACCESSION’,
...
);
@days_in_month=(31,28,31,30,31,30,31,31,30,31,30,31);
@days_of_the_week=(‘mon’, ‘tue’, ‘wed’ ,’thu’,’fri’,’sat’,’sun’);
@bases = (‘adenine’, ‘guanine’, ‘thymine’, ‘cytosine’, ‘uracil’);
@GenBank_fields=( ‘LOCUS’,
‘DEFINITION’,
‘ACCESSION’,
...
);
ArraysArraysCollections of numbers, strings etc can be stored in arrays. In Perl arrays are defined as ordered lists of scalars and are represented with the @ character.
Initializing arrays with lists
Arrays - elementsArrays - elements
To access the individual array elements you use [ and ] :
@poly_peptide=(‘gly’,’ser’,’gly’,’pro’,’pro’,’lys’,’ser’,’phe’);
# now mutate the peptide
$poly_peptide[0]=‘val’;
$i=0;
# print out what we have
while ($i<8) {
print “$poly_peptide[$i] “;
$i++;
}
@poly_peptide=(‘gly’,’ser’,’gly’,’pro’,’pro’,’lys’,’ser’,’phe’);
# now mutate the peptide
$poly_peptide[0]=‘val’;
$i=0;
# print out what we have
while ($i<8) {
print “$poly_peptide[$i] “;
$i++;
}
Look
array index
The numbers used to identify the elements are called indices.
Arrays - elements Arrays - elements
When accessing array elements you use $ - why ? Because array elements are scalar and scalars must have $;
@poly_peptide=(..);
$poly_peptide[0] = ‘val’;
@poly_peptide=(..);
$poly_peptide[0] = ‘val’;
This means that you can have a separate variable called $poly_peptide because $poly_peptide[0] is part of @poly_peptide, NOT $poly_peptide.
This may seem a bit weird, but that's okay, because it is weird.
Unix Perl Manual
Array indices start from 0 not 1 ;
Array elementsArray elements
$poly_peptide[0]=‘var’;
$poly_peptide[1]=‘ser’;
$poly_peptide[7]=‘phe’;
$poly_peptide[0]=‘var’;
$poly_peptide[1]=‘ser’;
$poly_peptide[7]=‘phe’;
The last index of the array can be found from $#name_of_array, e.g. $#poly_peptide. You can also use negative indices: it means you count back from the end of the array. Therefore
$poly_peptide[-1]= $poly_peptide[$#poly_peptide] = $poly_peptide[7]
$poly_peptide[-1]= $poly_peptide[$#poly_peptide] = $poly_peptide[7]
Array propertiesArray properties
Length of an array:
$len = $#poly_peptide+1;$len = $#poly_peptide+1;
The size of the array does not need to be defined – it can grow dynamically:
# begin program
$i=0;
while ($i<100) {
$polyA[$i]=‘A’;
$i++;
}
# begin program
$i=0;
while ($i<100) {
$polyA[$i]=‘A’;
$i++;
}
Useful Array functionsUseful Array functions
PUSH and POPPUSH and POP
Functions commonly used for manipulating a stack:
PUSHPOP
F.I.L.O = First In Last Out
Very common in computer programs
Array functions – PUSH and POPArray functions – PUSH and POP
# part of a program that reads a database into an array
# open database etc first..
@dblines=(); # resets @dblines
while ($line=<DB>) {
push @dblines,$line; # push $line onto array
}
...
while (@dblines) {
$record = pop @dblines; # pop line off and use it
.... do something
}
# part of a program that reads a database into an array
# open database etc first..
@dblines=(); # resets @dblines
while ($line=<DB>) {
push @dblines,$line; # push $line onto array
}
...
while (@dblines) {
$record = pop @dblines; # pop line off and use it
.... do something
}
Scalar ContextsScalar Contexts
If you provide an expression (e.g. an array) when Perl expects a scalar, Perl attempts to evaluate the expression in a scalar context. For an array this is the length of an array:
$length=@poly_peptide;$length=@poly_peptide;
$length=$#poly_peptide+1;$length=$#poly_peptide+1;
This is equivalent to
Hence:
while (@dblines) {
..
while (@dblines) {
..
array in scalar context = length of array
Special variablesSpecial variables
$_Set in many situations such as reading from a file or in a foreach loop.
$0Name of the file currently being executed.
$]Version of Perl being used.
@_Contains the parameters passed to a subroutine.
@ARGVContains the command line arguments passed to the program.
Perl defines some variables for special purposes, including:
Some are read-only and cannot be changed: see man perlvar for more details.
Associative Arrays (Hashes)Associative Arrays (Hashes)
Similar to normal arrays but the elements are identified by keys and not indices. The keys can be more complicated, such as strings of characters.
Hashes are indicated by % and can be initialized with lists like arrays:
%hash = (key1,val1,key2,val2,key3,val3..);%hash = (key1,val1,key2,val2,key3,val3..);
Associative Arrays (Hashes)Associative Arrays (Hashes)
Examples
%months=(‘jan’,31,’feb’,28,’mar’,31,’apr’,30);%months=(‘jan’,31,’feb’,28,’mar’,31,’apr’,30);
Alternatively,
%months=(‘jan’=> 31,
’feb’=> 28,
’mar’=> 31,
’apr’=> 30);
%months=(‘jan’=> 31,
’feb’=> 28,
’mar’=> 31,
’apr’=> 30);
=> is a synonym for ,
keyvalue
Associative Arrays (Hashes)Associative Arrays (Hashes)
Further examples
#
%classification = (‘dog’ => ‘mammal’, ‘robin’ => ‘bird’, ‘snake’ => ‘reptile’);
%genetic_code = (
‘TCA’ => ‘ser’,
‘TTC’ => ‘phe’,
‘TTA’ => ‘leu’,
‘TTA’ => ‘STOP’
‘CCC’ => ‘pro’,
...
);
#
%classification = (‘dog’ => ‘mammal’, ‘robin’ => ‘bird’, ‘snake’ => ‘reptile’);
%genetic_code = (
‘TCA’ => ‘ser’,
‘TTC’ => ‘phe’,
‘TTA’ => ‘leu’,
‘TTA’ => ‘STOP’
‘CCC’ => ‘pro’,
...
);
The elements of a hash are accessed using curly brackets, { and } :
Associative Arrays (Hashes) - elementsAssociative Arrays (Hashes) - elements
$genetic_code{TCA} = ‘ser’;
$genetic_code{CCC} = ‘pro’;
$genetic_code{TGA} = ‘STOP’;
$genetic_code{TCA} = ‘ser’;
$genetic_code{CCC} = ‘pro’;
$genetic_code{TGA} = ‘STOP’;
Note the $ sign: the elements are scalars and so must be preceded by $, even though they belong to a % (just as for arrays).
Associative Arrays (Hashes) – useful Associative Arrays (Hashes) – useful functionsfunctions
existsindicates whether a key exists in the hash
if (exists $genetic_code{$codon}) {
...
}else {
print “Bad codon $codon\n”;
exit;
}
if (exists $genetic_code{$codon}) {
...
}else {
print “Bad codon $codon\n”;
exit;
}
Associative Arrays (Hashes) – useful Associative Arrays (Hashes) – useful functionsfunctions
keys and valuesmakes arrays from the keys and values of a hash.
@codons = keys %genetic_code;
@amino_acids = values %genetic_code;
@codons = keys %genetic_code;
@amino_acids = values %genetic_code;
Often you will see code like the following:
foreach $codon (keys %genetic_code) {
if ($genetic_code{$codon} eq ‘STOP’) {
last; # i.e. stop translating
} else {
$protein .= $genetic_code{$codon};
}
foreach $codon (keys %genetic_code) {
if ($genetic_code{$codon} eq ‘STOP’) {
last; # i.e. stop translating
} else {
$protein .= $genetic_code{$codon};
}