CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install...

30
CSB472H1: Computational Genomics and Bioinformatics Tutorial #8 Alex Nguyen, 2014 [email protected] ESC-4075 Copyright © 2014, Alex N. Nguyen Ba

Transcript of CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install...

Page 1: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

CSB472H1: Computational Genomics and Bioinformatics Tutorial #8 Alex Nguyen, 2014 [email protected] ESC-4075

Copyright © 2014, Alex N. Nguyen Ba

Page 2: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

What we have seen so far…

• Variables • A way to store values into memories.

• Functions • Print, string functions

• If/else, for, while • Conditionals • Comparisons

• Arrays and hashes • Subroutines • Regex • Input/Output

Copyright © 2014, Alex N. Nguyen Ba

Page 3: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Subroutines

• Code sharing is an important aspect of the community

Algorithm modifications (Blast derivatives etc.)

Problem: small number of programmers code in PERL. How to effectively transmit an algorithm?

Copyright © 2014, Alex N. Nguyen Ba

Page 4: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Logic

• Algorithm overview can be given by logic charts

initialization

Procedure

Condition

False

Input

True

Output

Copyright © 2014, Alex N. Nguyen Ba

Page 5: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Formulas

• Algorithm description is often given as formulas

For example…

Formulas are often the fastest way to get the point across.

Formulas are preferred by the bioinformatics community, however they can lead to ugly notations:

If($x == 1){return 5;} elsif($x == 0) {return 10;}

5 * (x) + 10 * (1-x) 5x * 101-x

Copyright © 2014, Alex N. Nguyen Ba

Page 6: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Pseudocodes

• Pseudocodes are code abstractions

Pseudocodes are ‘structured’ as codes, but are not meant to be interpreted by machines.

Code-specific syntax can, and should, be left out

When reading a pseudocode, I should not be able to say: “It looks like PERL.”

Do not write: open(FILE,$filename); Spot all the problems

Copyright © 2014, Alex N. Nguyen Ba

Page 7: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Pseudocodes

Input: string a of length m, string b of length n F[0,0] := 0 d := penalty for all i: F[i,0] := -i * d for all j: F[0,j] := -j * d for i = 1 to m: for j = 1 to n: F[i,j] = max(F[i-1,j-1] + S[a[i],b[j]], F[i-1,j]-d,F[i,j-1]-d)

There is no true standard for pseudocode syntax. KEEP IT CONSISTENT.

structure

Mathematics do not have to be explained

Blocks should be clear

Copyright © 2014, Alex N. Nguyen Ba

Page 8: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Pseudocodes

The pseudocode has to provide enough information for: - Solving the problem in any programming language - The complete understanding of every ‘variable’ used

Copyright © 2014, Alex N. Nguyen Ba

Page 9: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Pseudocodes

Class exercise (previously been an assignment)

Copyright © 2014, Alex N. Nguyen Ba

diff := Pa - Pb na := len(a) nb := len(b) if diff < 0 swap(a,b) swap(na,nb) diff := abs(diff) k := diff output_read := a[0 .. diff-1] while k < na r1 := a[k .. k] r2 := b[k-diff .. k-diff] if r1 != "." output_read := output_read + r1 elsif r2 != "." and r2 != "" output_read := output_read + r2 else output_read := output_read + "." k := k + 1 output_read := output_read + b[na-diff .. nb-1]

a[0 .. 9] corresponds to the first ten letters of read a a[0 .. 0] corresponds to the first letter of read a only a[na .. na] is empty where na is the length of a (remember 0th index) Pa and Pb are in 0th index but this is not required abs() is the absolute value len() is the length of the read swap() exchanges the variables

CA.TG..C.GT.G.T..AC.G..GA

3rd position (0-index) in read_1

1rst position (0-index) in read_2

CA.TG.AC.GT.GA

Page 10: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

What we have seen so far…

• Variables • A way to store values into memories.

• Functions • Print, string functions

• If/else, for, while • Conditionals • Comparisons

• Arrays and hashes • Subroutines • Regex • Input/Output • Pseudocodes

Copyright © 2014, Alex N. Nguyen Ba

Page 11: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

• While code is often shared by pseudocode, the vast majority of coding languages contain libraries of code written by the community

Modules are pieces of code that other people have written in your language

In many cases, they will provide more efficient and broad case code.

Copyright © 2014, Alex N. Nguyen Ba

Page 12: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

• What can modules do?

The community has coded a wide range of codes that you can take advantage of

Manipulation of Window’s mouse… Writing Excel files… Drawing images… Complex mathematical functions…

Copyright © 2014, Alex N. Nguyen Ba

Page 13: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

• Why shouldn’t you use a module?

Any module you use requires that anyone who uses your code have that module

Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific cases, your code might be faster No easy way of modifying the function

Copyright © 2014, Alex N. Nguyen Ba

Page 14: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

• How to install modules

Copyright © 2014, Alex N. Nguyen Ba

Page 15: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

It can take awhile to find the module you want… It is maybe best you browse some forums and see what other people have used

Copyright © 2014, Alex N. Nguyen Ba

Page 16: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

Perl modules require installation

On Windows, this installation is done through PPM

A nice interface should open called the “Perl Package Manager”

Copyright © 2014, Alex N. Nguyen Ba

Page 17: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

Perl modules require installation

On linux based OS

You will have to type things instead… both do practically the same thing

Type: install packagename

Copyright © 2014, Alex N. Nguyen Ba

Page 18: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

Let’s install the Excel-Writer module

Copyright © 2014, Alex N. Nguyen Ba

Page 19: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

Let’s install the Excel-Writer module

Copyright © 2014, Alex N. Nguyen Ba

Page 20: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

Let’s install the Excel-Writer module

Copyright © 2014, Alex N. Nguyen Ba

Page 21: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

Excel-Writer module… done!

How to use it?

Copyright © 2014, Alex N. Nguyen Ba

Page 22: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Modules

use Excel::Writer::XLSX; my $workbook = Excel::Writer::XLSX->new('test.xlsx'); my $worksheet = $workbook->add_worksheet(); for(my $i = 0;$i < 100;++$i){ $worksheet->write($i, 0, sqrt($i)); }

Copyright © 2014, Alex N. Nguyen Ba

Page 23: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

BioPERL Modules

BioPERL is a collection of perl modules written by biologists

Many of the tools you have learned about in class can be used via BioPERL

You even have access to some algorithmic functions like the Needleman algorithm

Installing BioPERL should be fairly straightforward

Copyright © 2014, Alex N. Nguyen Ba

Page 24: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

BioPERL Modules

BioPERL is a collection of perl modules written by biologists

Copyright © 2014, Alex N. Nguyen Ba

Page 25: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

BioPERL Modules

use Bio::SeqIO; my $file = "CSB472-2012-assignment_1.fasta"; my $seqio_object = Bio::SeqIO->new(-file => $file); my $seq_object = $seqio_object->next_seq; print $seq_object->display_id; print "\n"; print $seq_object->seq; print "\n";

Automatic format recognition, automatic handling of files…

http://www.bioperl.org/wiki/Module:Bio::SeqIO

Copyright © 2014, Alex N. Nguyen Ba

Page 26: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Object Oriented programming

OO programming vs procedural

my $sub = substr($string,1,2);

“Subject”

Object oriented programming places the subject as the owner of the function

my $sub = $string->substr(1,2); Note that this is not right.

It is, however, how almost all modules work Copyright © 2014, Alex N. Nguyen Ba

Page 27: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Object Oriented programming

Let’s use BioPERL to create sequence objects.

use Bio::Seq; use Bio::SeqIO; $seq_obj = Bio::Seq->new(-seq => 'atgcggctg', -display_id => 'new_dna', -desc => 'random_protein'); $seqio_obj = Bio::SeqIO->new(-file => 'test.fasta', -format => 'fasta'); $seqio_obj->write_seq($seq_obj);

Copyright © 2014, Alex N. Nguyen Ba

Page 28: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Web programming

Bioinformatics resources can usually be ran online

User Interface Inputs Parameters

Scripts and programs

Output

Copyright © 2014, Alex N. Nguyen Ba

Page 29: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

Web programming

A series of input/output programs

User Interface Inputs Parameters

Scripts and programs

Output The output is usually interpreted by your browser (HTML code)

Copyright © 2014, Alex N. Nguyen Ba

Page 30: CSB472H1: Computational Genomics and Bioinformatics · Some modules may not be trivial to install Some modules are incompatible with the user’s Perl version If you code for specific

What we have seen this semester

• Variables • $scalars, @arrays, %hashes.

• Functions • If/else, for, while

• Conditionals • Comparisons

• Subroutines • Regex • Input/Output • Pseudocode • BioPERL • Command line arguments • Web programming

Copyright © 2014, Alex N. Nguyen Ba