Python – Essential characteristics think Monty, not snakes! Key Advantages: Open source & free...

49
Python – Essential characteristics Python – Essential characteristics think Monty, not snakes! think Monty, not snakes! Key Advantages: • Open source & free (thank you Guido van Rossum!) • Portable – works on Unix, Linux, Win32 & 64, MacOS etc. • Easy to learn and logically consistent • Lends itself to rapid development • So, good for “quick and dirty” solutions & prototypes • But also suitable for full fledged applications • Hides many low-level aspects of computer architecture • Elegant support of object-orientation and

Transcript of Python – Essential characteristics think Monty, not snakes! Key Advantages: Open source & free...

Python – Essential characteristicsPython – Essential characteristicsthink Monty, not snakes!think Monty, not snakes!

Key Advantages:• Open source & free (thank you Guido van Rossum!)

• Portable – works on Unix, Linux, Win32 & 64, MacOS etc.• Easy to learn and logically consistent• Lends itself to rapid development

• So, good for “quick and dirty” solutions & prototypes• But also suitable for full fledged applications

• Hides many low-level aspects of computer architecture• Elegant support of object-orientation and data structures• Extensive library support – a strong standard library• Dynamic “duck typing” paradigm is very flexible• Language is minimalistic, only 31 keywords

Python – Essential characteristicsPython – Essential characteristics

Some Disadvantages:• It's not very fast (but often better than PERL!)• Relatively inefficient for number crunching• Can have high memory overhead• Being “far from the metal” has disadvantages – systems or kernal programming is impractical• Dynamic typing can be both a blessing and a curse• Some key libraries are still developing (e.g. BioPython)• Version 3 breaks compatibility to prior versions• Some find the whitespace conventions annoying• Tends towards minimalism in favour of expressiveness

Becoming a PythonistaBecoming a Pythonista

Windows and MacOS X installers available at:www.python.org/getit

Note that BNFO602 will be using version 2.73, notmore recent 3.xx distributions

Even if your machine supports 64 bit, a 32- bit install is generally a safer choice for compatibility

Linux users may possibly need to download a source tarball and compile themselves

A Python IDE for BNFO602A Python IDE for BNFO602

Windows, MacOS X, and Linux installers at:

We are using the Free community edition

www.jetbrains.com/pycharm

An IDE is an Integrated Development Environment

While not strictly required, IDEs ease and facilitate the creation and management of larger programs.

IDLE is the built-in IDE and is another option

Python can also be run interactively.

Documents for PythonDocuments for Python

For version 2.X, official documentation and tutorials are here:

docs.python.org/2

While a notable weakness of Python in the past,the online documentation and tutorials for Pythonare now quite good!

StackOverflow.com also has good information:

stackoverflow.com/tags/python/info

docs.python.org/2

The Building Blocks of Python -The Building Blocks of Python -Hello World!Hello World!

print "Hello World"

Keywords

Function Argument

No semicolon!

Python 2.7 has only 31 keywords in the language. It is minimalistic.

Hello World!Hello World!

if True: print "Hello" print "World"

Statement Block

If statements are the sentences of Python, then statement blocks are analogous to paragraphs.

Unlike PERL, python is somewhat fussy about how we use whitespaces (spaces, tabs, line breaks).....

Does NOT use curly brackets to delimit statement blocks!Use colon after conditional statement

Statement blocks are nestedStatement blocks are nestedusing whitespaceusing whitespace

#Demo of nested blocks

print "Outer level"if True: print "\tMiddle level #1" if True: print "\t\tInner level" print "\tMiddle level #2" pass print "Outer level #2"

Whitespace delimits statement blocks!Preferred practice is to use exactly four spaces

Don't use tabs unless your editor maps these to spaces!

Comments begin with #

Escape sequence for “tab” (but no variable interpolation as w/ PERL)

Dummy statement

Statement blocks can be nestedStatement blocks can be nested

Outer level Middle level #1 Inner level Middle level #2

Outer level #2

Output

Yes, this is a trivial example. Note: scoping withinthese simple blocks is a little different than PERL as

there is no “my” statement for local variables

Data Types in PythonData Types in PythonSome basic data types

"Hello World!"

42

3.1459

2+6j

False, True

None

StringInteger

Floating point

Some types, like strings, are hard-coded and cannot be directly changed! They are “immutable”

String delimiters

Boolean

Null

Complex

Data Types in PythonData Types in PythonSome compound data types

["A", "C", "G", "T"]

("A", “C", "G", "T")

{"A":"T", "C":"G",

"G":"C", "T":"A"}

list

tuple

A tuple is essentially an immutable listwhereas a dict is like a PERL hash

delimiters

dict

Variables in PythonVariables in Python

dna_sequence = "AGCTAGC" seq_len = 9symbols = ["A", "G", "C", "T"]empty_dict = {}symbols = {"A":"Adenine"}

Variables in Python are NOT associated to a type They are just identifiers that name some object

Identifiers begin with a letter or underscore

Declaration and definition are usually coincident

Data Types and identifiersData Types and identifiers

[42, 32, 64]The answer is 42

Data types are actually implemented as a classes that know how to print their own instance objects. Later we'll see how to make our own classes and types

A = [42, 32, 64]print Aprint "The answer is ", A[0]

Output

Index notationalways uses square bracketseven if a tuple or a dict

Operators, Operands & ExpressionsOperators, Operands & Expressions

var = 12 * 10

Expressions consist of valid combinations of operands and operators, and a sub-expression can

act as an operand in an expression

operands

operatorsexpression

subexpression

Very similar to PERL, but some operators vary, especially for the logical operators. Also string concatenation uses "+", not "."

ExpressionsExpressions

Expressions can use the result of a function (or the result of a method of a class)

as an operand

foo = somefunction(foo2)foo = somefunc(foo2) * foo3foo = somefunc(foo2) + somefunc2(foo3)foo = somefunc(somefunc2(foo2))

All of the above are possibly legal Python expressions depending on the functions

Some Python OperatorsSome Python Operators

Common operators

+-/*+

Operators follow a strict order of operations: e.g. 2 + 7 * 2 = 16See documentation for complete details

Addition subtractiondivisionmultiplicationconcatenation

4 + 2 = 64 – 2 = -24 / 2 = 24 * 2 = 8"4" + "2" = "42"

= assignment Does NOT denote equivalenceUse == for testing equivalence!

The Assignment OperatorThe Assignment Operator

Unlike in algebra, does not imply that both sides of the equation are equal!

The following is a valid Python statement:

var = var + 1

This says “take the current value of var and add one to it, then store the result back in var”

This also does the same thing:

var += 1

*=, -=, /=, all work the same way.

Incrementing and DecrementingIncrementing and Decrementing

The following are functionally equivalent statements:

var = var + 1var += 1

var = var - 1;var -= 1But NOT: var++, ++var or var--, --var

Similarly:

No PERL style autoincrement/decrement!

Increment by shown amount

The Equivalence OperatorThe Equivalence Operator

Python does have an equivalence operator

Print "Is 2 equal to 4:", 2 == 4print "Is 2 equal to 2:", 2 == 2

equivalence operator

Output:

Is 2 equal to 4: False Is 2 equal to 2: True

Python has a built-inBoolean type!

0, Boolean False, None, empty lists, null strings, and empty dicts are all evaluated as false

Comparison OperatorsComparison Operators

The equivalence operator is just one of the comparison operators

== equal to< less than> greater than<= less than or equal to>= greater than or equal to!= or <> not equal to

These are the comparison operators for everythingUse caution when testing floating point numbers, especially

for exact equivalence!

Flow Control – Flow Control – ifif, , else else and and conditional expressionsconditional expressions

Comparison operators enable program flow controldna = "GATCTCTT"dna2 = "GATCTCCC"if dna == dna2: print "Sequences identical:", dna

Conditional expression note the colon

else: print "Sequences different"

Output:Sequences different

Flow Control – Flow Control – ifif, , else else and and conditional expressionsconditional expressions

Comparison operators at work #2dna = "ATGCATC"if dna: print "Sequence defined"

else: print "Sequence not defined"

Output:Sequence defined

non-None, non-zero, non-False, & non-empty results are logically “true”

Flow Control – Flow Control – if, else if, else andand conditional expressionsconditional expressions

Comparison operators at workdna = ""if dna == "ATG": print "Sequence is ATG start codon"

else: print "Sequence not defined"

Output:Sequence not defined

Remember, empty lists and null strings are logically equivalent to “false”

Multi-way branching using Multi-way branching using elifelif

dna = "ATG"if dna == "GGG": print "All Gs" elif dna == "AAA": print "All As"elif dna == "TTT": print "All Ts"elif dna == "CCC": print "All Cs"else print "Something else:", dna

Output: Something else: ATG

Several elif blocks in a row is OK!

Loops with the Loops with the while while statementstatement

dna = "ATGCATC"while dna == "ATGCATC":

print "The sequence is still", dna

The sequence is still ATGCATCThe sequence is still ATGCATC The sequence is still ATGCATCThe sequence is still ATGCATCThe sequence is still ATGCATCThe sequence is still ATGCATCThe sequence is still ATGCATC The sequence is still ATGCATCThe sequence is still ATGCATC

etc…

Conditional expression

Output:while statements will execute their statement block forever unless the

conditional expression becomes false.

Therefore the variable tested in the conditional expression is normally

manipulated within the statement block..

Loops with the Loops with the while while statementstatement

dna = "ATGCATGC"while len(dna):

print "The sequence is:", dna dna = dna[0:-1]print "done"

The sequence is ATGCATGCThe sequence is ATGCATGThe sequence is ATGCATThe sequence is ATGCAThe sequence is ATGCThe sequence is ATGThe sequence is ATThe sequence is Adone

conditional expression

Output:

returns the length of a string

More on “slice notation” later when discussing lists. Here we remove the last character of a string

Use Use break break to simulate PERL to simulate PERL untiluntil

dna = "A"while True:

if len(dna) > 3:

break print "The sequence is:", dna

dna += "A"print "done"

The sequence is AThe sequence is AAThe sequence is AAAdone

Output:

string concatenation and assignment

There is no native “do-while” or “until” in PythonPython is minimalistic

len is one of several built-in functions

Loops with the Loops with the for for statementstatement

nt_list = ("A", "C", "G", "T")

for nt in nt_list: print "The nt is:", nt

The sequence is AThe sequence is CThe sequence is GThe sequence is T

Output:

for loops iterate over list-like (“iterable”) data typesand are similar to PERL foreach, not the PERL or C for

Loops with the Loops with the for for statementstatement

nt = ("A", "C", "G", "T")

for index in range(len(dna)): print "The nt is:", dna[index]

The sequence is AThe sequence is CThe sequence is GThe sequence is T

Output:

for loops can have a definite number of iterationstypically using the range or xrange built-in function

Try this example with a string instead of a list!

Caution! range in 2.x instantiates an actual list. Use xrange if iteration is big

Data Types in Python -Data Types in Python -StringsStrings

Strings are string-like iterables with a rich collection of methods for their manipulation

dna = "ACGT"

Some useful methods are:join, split, strip, upper, lower, count

dna = "ACGT"dna2 = dna.lower()# will give "acgt"

“attribute” notation! These are methods specific to the string type, not of general utility like built-ins

Data Types in Python -Data Types in Python -StringsStrings

Strings are string-like iterables with a rich collection of methods for their manipulation

dna = "ACGT"

Some useful methods are:join, split, strip, upper, lower, count

dna = "AACGTA"print dna.count(“A”)# will give 3

Data Types in Python -Data Types in Python -ListsLists

A list is simply a sequence of objects enclosed in square brackets that we can iterate

through and access by index. They are array-like.

["A","G","C","T"]

Unlike PERL, pretty much anything can be putinto a list, including other lists!! Mirabile dictu!

[42,"groovy", dna, 3.14, var1-var2, ["A", "G", "C", "T"]]

Try printing item 5 from the above list….how does this differ from the result you would get in PERL?

Data Types in Python -Data Types in Python -listslists

A list is a powerful type for manipulating lists:

bases = ["A","G","C","T"]

No “@” token to distinguish list variables!!

list elements can be accessed by an index:

index = 2print bases[0], bases[index]

Output: AC Note that first element is index 0

Assigning to a non-existent element raises an error exceptionThere is no PERL-style “autovivication” (although we can fake this)

Data Types in Python -Data Types in Python -ListsLists

Lists also have rich collection of methods

Some useful methods are:len, sort, reverse, in, max, min, count

pi = 3.14my_list = ["ACGT", 0, pi]print min(list)# will print 0

min and max are built-ins

Note that some are built-in functions while others use attribute notation

Data Types in Python -Data Types in Python -ListsLists

Lists also have rich collection of methods

Some useful methods are:len, sort, reverse, in, max, min, count

my_list = ["A", "C", "G", "T"]my_list.reverse()print my_list# will print ["T", "G", "C", "A"]

attribute notation

Note that some are built-in functions while others use attribute notation

Data Types in Python -Data Types in Python -ListsLists

Lists also have rich collection of methods

Some useful methods are:len, sort, reverse, in, max, min, count

my_list = ["A"] * 4 #init with 4 "A"s print my_list.count("A") # prints 4my_list.append("C")if "C" in my_list: print 'The list contained "C"\n'

testing for inclusion with in is a common operation with all iterable types

Lists and slice notationLists and slice notation

bases = ["A","G","C","T"]size = len(bases) # will be equal to four

var1, var2, var3, var4 = bases #var1="A" & var2="G", etc.

subarray = bases[0:2] #subarray = ["A","G"]

Array “slices” can be assigned to a subarray

subarray = bases[0:-1] #subarray = ["A","G","C"]

subarray = bases[1:] #subarray = ["G","C","T"]

subarray = bases[1:len(bases)] #subarray = ["G","C","T"]

Slices allow us to specify subarrays

Slice indices refer to the space between elements!

Lists modification and methodsLists modification and methods

bases = ["A","G","C"]bases.append("T") # bases = ["A","G","C","T"]bases.sort() # bases = ["A","C","G","T"]num_of_As = bases.count("A") # num_of_As = 1

Slice notation can be used to modify a list!Try this on the previously defined bases list and see what happens

bases[:0] = ["a","g","c","t"]

Some useful list methods are:append, insert, del, sort, remove, count,

reverse, etc.

Data Types in Python -Data Types in Python -dictionaries dictionaries a.k.a.a.k.a. dicts dicts

no PERL “%” token to distinguish hash identifiers!!

dicts are associative arrays similar to PERL hashes:

complement = {"A" : "T", "C" : ”G", "G" : ”C”, "T" : ”A”}

The left hand is the dict key and must be unique, “hashable”, and “immutable” (this will become clearer later)

On right hand is the associated value. It can be almost ANY type of object! Nice.

Working with DictsWorking with Dicts

Output:

complement of A is: Tcomplement of C is: G

#A dict for complementing a DNA nucleotidecomp = {"A" : "T",

"C" : "G", "G" : "C", "T" : "A"}

print "complement of A is:", comp["A"]print "complement of C is:", comp["C”] It’s easy to add new pairs to the hash:

comp["g"] = "c" Or to delete pairs in the hash:

comp.del("g")

dicts are a preferred data type in Python

Other dict methodsOther dict methods

Some useful dict methods are:keys, values, items, del, in,

copy, etc.

#A hash for complementing a DNA nucleotidecomp = {"A" : "T",

"C" : "G", "G" : "C", "T" : "A"}

print comp.keys() # might return.. ["A","C”,"G","T"]

No assertion is made as to order of key/value pairs!

Dicts are iterableDicts are iterable

#Iterating over hashescomp = {"A": "T",

"C" : "G","G" : "C",

"T" : "A"}for k, v in comp.items(): print 'complement of', k, 'is', v

Output could be:

complement of A is Tcomplement of C is Gcomplement of G is Ccomplement of T is A

Or output could be:

complement of C is G complement of A is Tcomplement of T is Acomplement of G is C

The point is that dicts are unordered, and no guarantees are made!!

iterate over both keys and values together!

.items() returns a two-element tuple that is “unpacked” here into k and v

Tuples are essentially immutable listsTuples are essentially immutable lists

nucleotides = ("A", "C","G", "T")

for NT in nucleotides: print NT , "is a nucleotide symbol"

The immutable nature of tuples means they do not need to support all list operations. They can therefore be implemented differently, are consequently more efficient for certain operations.And only immutable objects can serve as hash keys

tuples are delimited by ()

Why Tuples?

In most read-only contexts, they work just like lists you just can't change their value

Packing and unpacking:

(one, two, three) = (1, 2, 3)print one # prints 1

Sparse matricesSparse matrices

Standard multidimensional array:matrix = [ [3,0,-2,0], [0,9,0,0], [0,7,0,0], [0,0,0,-5] ]print matrix[0][2] # This will print -2# Not very memory efficient if there are many zero valued # elements in a very large matrix!!!

An example of tuples as dict keys

3 0 -2 00 9 0 00 7 0 00 0 0 -5

Sparse matrix representation:matrix = { (0,0): 3, (0,2): -2, (1,1): 9, (2,1):7, (3,3):-5 }print matrix.get( (0,2), 0) # prints -2# The get method here returns 0 if the key is undefined# Much more memory efficient, since zero values not stored

FunctionsFunctions

Q: Why do we need Functions?

Repeatedly typing out the code for a chore that is used over and over again (or even

only a few times) would be a waste of timeand space, and makes the code hard to read

A: Because we are lazy! Functions are the foundation of reusable code

Functions in Python akin to subroutines in PERL as well as procedures in some other languages

FunctionsFunctions

Minimally, all we need is a statement block of Python code that we have named

Defining a function

def I_dont_do_much: #any code you like!! pass return

A return value is optional, None is default if value isn’t specified or

no explicit final return statement

Capital letters OK

Once defined, functions are called (“invoked”) just by stating its name, and passing any required arguments:

I_dont_do_much()

FunctionsFunctions

def expand_name (amino_acid):

convert = {"R" : "Arg", "A" : "Ala", etc.}

if amino_acid in convert: three_letter = convert[amino_acid] else:

three_letter = "Ukn"

return three_letter

expand_name(“R”)

Python has several flexible ways to pass arguments to function. This example is just the most basic way!

Output: Arg

No messing with @_ weirdness like in PERL

convert is local to the function(i.e. in lexical scope)

Note indentation – line is not part of function definition, but rather is an invocation of the function

Warning! Python passes objects to functions by reference, never by copy.Changes to mutable objects in the function change the starting object!!

Using external functionsUsing external functionsPython includes many useful libraries

or, it can be code that you have written

In Python its easy to use functions (or indeed other variables or objects) that are defined in some other file…

Option 1:

import module_name# use the module name when calling the function..# i.e. module_name.function(arg)

Option 2:

from module_name import name1, name2, name3# imports just the names you want# no need to refer to module name when calling

Option 3:from module_name import *# imports all of the public names in a module

Putting it all together -Putting it all together -An in-class challengeAn in-class challenge

Write a program that:

Defines a function that generates random DNA sequencesof some specified length given a dict describing the probability

distribution of A, C, G, T -- should be familiar from BNFO601

You’ll need the rand function from the math library!!

This is a real-world chore that is frequently encountered in bioinformatics

Get Python up and running, try “Hello world!” then…