Programming in Python

1

Programming in Python

Michael Schroeder

Andreas Henschel

{ms, ah}@biotec.tu-dresden.de

2

Motivation

mkdntvplkliallangefhsgeqlgetlgmsraainkhiqtlrdwgvdvftvpgkgyslpepmktvrqerlksivrilerskepvsgaqlaeelsvsrqvivqdiaylrslgynivatprgyvlaggkaltarqqevfdlirdhisqtgmpptraeiaqrlgfrspnaaeehlkalarkgvieivsgasrgirllqeemrssakqeelvkafkallkeekfssqgeivaalqeqgfdninqskvsrmltkfgavrtrnakmemvyclpaelgvpttgqrhikireiimsndietqdelvdrlreagfnvtqatvsrdikemqlvkvpmangrykyslpsdqrfnplqklkrkgqrhikireiitsneietqdelvdmlkqdgykvtqatvsrdikelhlvkvptnngsykyslpadqrfnplsklkrdvtgriaqtllnlakqpdamthpdgmqikitrqeigqivgcsretvgrilkmledqnlisahgktivvygtdikqriagffidhanttgrqtqggvivsvdftveeianligssrqttstalnslikegyisrqgrghytipnlvrlkaaaiderdkiileilekdartpfteiakklgisetavrkrvkaleekgiiegytikinpkklgelqaiapevaqslaeffavladpnrlrllsllarselcvgdlaqaigvsesavshqlrslrnlrlvsyrkqgrhvyyqlqdhhivalyqnaldhlqecmntlkkafeildfivknpgdvsvseiaekfnmsvsnaykymvvleekgfvlrkkdkryvpgyklieygsfvlrrflfneiiplgrlihmvnqkkdrllneylsplditaaqfkvlcsircaacitpvelkkvlsvdlgaltrmldrlvckgwverlpnpndkrgvlvklttggaaiceqchqlvgqdlhqeltknltadevatleyllkkvlpnypvnpdlmpalmavfqhvrtriqseldcqrldltppdvhvlklideqrglnlqdlgrqmcrdkalitrkirelegrnlvrrernpsdqrsfqlfltdeglaihqhaeaimsrvhdelfapltpveqatlvhlldqclaaqtdilreigmiaraldsisniefkelsltrgqylylvrvcenpgiiqekiaelikvdrttaaraikrleeqgfiyrqedasnkkikriyatekgknvypiivrenqhsnqvalqglseveisqladylvrmrknvsedwefvkkgmskindindlvnatfqvkkffrdtkkkfnlnyeeiyilnhilrsesneisskeiakcsefkpyyltkalqklkdlkllskkrslqdertvivyvtdtqkaniqkliseleeyiknaitkindcfellsmvtyadklkslikkefsisfeefavltyisenkekeyylkdiinhlnykqpqvvkavkilsqedyfdkkrnehdertvlilvnaqqrkkiesllsrvnkritmiimeeakkliielfselakihglnksvgavyailylsdkpltisdimeelkiskgnvsmslkkleelgfvrkvwikgerknyyeavdgfssikdiakrkhdliaktyedlkkleekcneeekefikqkikgiermkkisekilealndldaqspagfaeeyiiesiwnnrfppgtilpaerelseligvtrttlrevlqrlardgwltiqhgkptkvnnfwetseekrsstgflvkqraflklymitmteqerlyglkllevlrsefkeigfkpnhtevyrslhellddgilkqikvkkegaklqevvlyqfkdyeaaklykkqlkveldrckkliekalsdnfhmqaeilltlklqqklfadprrisllkhialsgsisqgakdagisyksawdainemnqlsehilveratggkggggavltrygqrliqlydllaqiqqkafdvlsdddalplnsllaaisrfslqtsskvtyiikasndvlnektatilitiakkdfitaaevrevhpdlgnavvnsnigvlikkglveksgdgliitgeaqdiisnaatlyaqenapellksprivqsndlteaayslsrdqkrmlylfvdqirksdgtlqehdgiceihvakyaeifgltsaeaskdirqalksfagkevvfyrpeedagdekgyesfpwfikpahspsrglysvhinpylipffiglqnrftqfrlsetkeitnpyamrlyeslcqyrkpdgsgivslkidwiieryqlpqsyqrmpdfrrrflqvcvneinsrtpmrlsyiekkkgrqtthivfsfrditlglekrdreilevlilrfgggpvglatlatalsedpgtleevhepylirqgllkrtprgrvatelarrhllglekrdreilevlilrfgggpvglatlatalsedpgtleevhepylirqgllkrtprgrvatelayrhlgypppvegldefdrkilktiieiyrggpvglnalaaslgveadtlsevyepyllqagflartprgrivtekaykhlkyevpiseevliglplheklfllaivrslkishtpyitfgdaeesykivceeygerprvhsqlwsylndlrekgivetrqnkrgegvrgrttlisigtepldtleavitklikeelrkyeltlqrslpfiegmltnlgamklhkihsflkitvpkdwgynritlqqlegylntladegrlkyiangsyeivpmkteqkqeqetthknieedrklliqaaivrimkmrkvlkhqqllgevltqlssrfkprvpvikkcidiliekeylervdgekdtysylagspekilaqiiqehregldwqeaatraslsleetrkllqsmaaagqvtllrvendlyaisteryqawwqavtraleefhsryplrpglareelrsryfsrlparvyqalleewsregrlqlaantvalagftpsfsetqkkllkdledkyrvsrwqppsfkevagsfnldpseleellhylvregvlvkindefywhrqalgeareviknlastgpfglaeardalgssrkyvlplleyldqvkftrrvgdkrvvvgnvpkrvywemlatnltdkeyvrtrralileilikagslkieqiqdnlkklgfdevietiendikglintgifieikgrfyqlkdhilqfvipnrgvtkqlvirtfgwvqnpgkfenlkrvvqvfdrnskvhnevknikiptlvkeskiqkelvaimnqhdliytykelvgtgtsirseapcdaiiqatiadqgnkkgyidnwssdgflrwahalgfieyinksdsfvitdvglaysksadgsaiekeilieaissyppairiltlledgqhltkfdlgknlgfsgesgftslpegilldtlanampkdkgeirnnwegssdkyarmiggwldklglvkqgkkefiiptlgkpdnkefishafkitgeglkvlrrakgstkftr

All these sequences are winged helix DNA binding domains. How can we group them into families?

3

Motivation: Let's rebuild SCOP families

• Given a SCOP superfamily and its sequences, how can we divide it into families?

• First, we need dynamic programming to determine the sequence similarity

• Then we do the following:– For all pairs of sequences, call the sequence

similarity algorithm and record the similarity into a distance matrix

– Next, run hierarchical clustering to cluster the sequences.

4

Python for BioinformaticsLecture 1: Datatypes and Loops

Slides derived fromIan Holmes

Department of StatisticsUniversity of Oxford

5

Goals of this course

• Concepts of computer programming

• Rudimentary Python (widely-used language)

• Introduction to Bioinformatics file formats

• Practical data-handling algorithms

• Exposure to Bioinformatics software

6

Literature/Material

• Textbook: Python in a Nutshell, Alex Martelli• Textbook: Python Cookbook, Alex Martelli,

David Ascher (both published by O'Reilly)• Python Course in Bioinformatics, K. Schuerer/C.

Letondal, Pasteur University (pdf)• a lot of online material (see course homepage

http://www.biotec.tu-dresden.de/schroeder/group/

teaching/bioinfo2/python.html)

7

Style of this lecture

• The color scheme for programs, output and text files:

• Interaction with the Python shell: very handy for quick tests. Helps beginners to overcome physiological barrier: Go ahead, try things out!

The main program The program outputFiles areshown inyellow

The filenamegoes here

>>> (Python Expression)(immediate Python result)

Prompt, (python expects input here) Press Enter

8

General principles of programming

• Make incremental changes• Test everything you do

– use the Python shell for testing expressions/functions interactively

– the edit-run-revise cycle• Write so that others can read it

– (when possible, write with others)• Think before you write• Use a good text editor (emacs)

9

Python/Emacs IDE

10

Python: Motivation

• Well suited for scripting (better syntax than Perl)• However, capable of Object Orientation• Hence complex data types and large projects

feasible, reuse of code (BioPython)• Universal language, Applications in and beyond

bioinformatics: Amber, ProHit, PyRat, PyMOL, Gene2EST/Google, CGI, Zope

• Compatible with most software technologies: GUI, MPI, OpenGL, Corba, RDB

• Test complicated expressions in python shell

11

Python basics

• Basic syntax of a Python program:

# Elementary Python programprint "Hello World"

print statement tells Python to print the following stuff to the screen

Single or double quotesenclose a "string literal"

Linesbeginningwith "#" arecomments,and are ignoredby Python

Hello World

12

Variables

• We can tell Python to "remember" a particular value, using the assignment operator "=":

• The x is referred to as a "scalar variable".Variable names can contain alphabetic characters, numbers(but not at the start of the name), and underscore symbols "_"

x = 3print x

3

x = "ACGCGT"print x

ACGCGT

Binding site for yeasttranscription factor MCB

13

Variables and Objects

• Everything in Python is an object• An object models a real-world entity• objects possess methods (also called functions)

that are typically applied to the object, possibly parameterized

• objects can also possess variables, that describe their state

• e.g. x.upper()is a parameter-less method, that works on the string object x

Object . Method or variable

14

Arithmetic operations…

• Basic operators are + - / * %x = 14y = 3print "Sum: ", x + yprint "Product: ", x * yprint "Remainder: ", x % y

Sum: 17Product: 42Remainder: 2

x = 5print "x started as", xx = x * 2print "Then x was", xx = x + 1print "Finally x was" ,x

x started as 5Then x was 10Finally x was 11

Could writex *= 2

Could writex += 1

15

… Or interactively

>>> x = 14>>> y = 3>>> x + y17>>> x * y42>>> x % y2>>> x = 5>>> print "x started as", xx started as 5>>> x *= 2>>> print "Then x was", xThen x was 10>>> x += 1>>> print "Finally x was", xFinally x was 11>>>

• This way, you can use Python as a calculator

• Can also use += -= /= *=

16

String operations

• Concatenation+ +=

• Can find the length of a string using the function len(x)

a = "pan"b = "cake"a = a + bprint a

pancake

a = "soap"b = "dish"a += bprint a

soapdish

mcb = "ACGCGT"print "Length of %s is "%mcb, len(mcb)

Length of ACGCGT is 6

17

String formatting

• Strings can be formatted with place holders for inserted strings (%s) and numbers (%d for digits and %f for floats)

• Use Operator % on strings:

>>> "aaaa%saaaa%saaa"%("gcgcgc","tttt")'aaaagcgcgcaaaattttaaa' >>> "A range written like this: (%d - %d)" % (2,5)'A range written like this: (2 - 5)'>>> "Or with preceeding 0's: (%03d - %04d)" % (2,5)"Or with preceeding 0's: (002 - 0005)">>> "Rounding floats %.3f" % math.pi'Rounding floats 3.142'>>> "Space holders: _%-7s_ and _%7s_" %("left", "right")'Space holders: _left _ and _ right_'

Formatted String % Insertion Tuple

18

More string operations

x = "A simple sentence"print xprint x.upper()print x.lower()xl=list(x)xl.reverse()print "".join(xl)x = x.replace("i", "a")print xprint len(x)

A simple sentenceA SIMPLE SENTENCEa simple sentenceecnetnes elpmis AA sample sentence17

Convert to upper case

Convert to lower case

Convert the string to a list

Translate "i"'s into "a"'s

Calculate the length of the string

Reverse the listJoin all list members

19

Concatenating DNA fragments

dna1 = "accacgt"dna2 = "taggtct"print dna1 + dna2

"Transcribing" DNA to RNA

accacguuaggucu

dna = "accACgttAGGTct"rna = dna.lower().replace("t", "u")print rna

Make it alllower case

DNA string is a mixtureof upper & lower case

Replace "t" with "u"

accacgttaggtct

20

Conditional blocks

• The ability to execute an action contingent on some condition is what distinguishes a computer from a calculator. In Python, this looks like this:

x = 149y = 100if x > y: print x,"is greater than",yelse: print x,"is less than", y

149 is greater than 100

These indentationstell Python whichpiece of codeis contingent onthe condition.

if condition: action

else: alternative

Consistent, level-wiseindenting important

21

Conditional operators

• Numeric: > >= < <= != ==

• The same operators work on strings as alphabetic comparisons

x = 5 * 4y = 17 + 3if x == y: print x, "equals", y 20 equals 20

Note that the testfor "x equals y" isx==y, not x=y

(x, y) = ("Apple", "Banana")if y > x: print y, "after", x Banana after Apple

"does not equal"

Shorthand syntax forassigning more thanone variable at a time

22

Logical operators• Logical operators: and and or

• The keyword not is used to negate what follows. Thus not x < y means the same as x >= y

• The keyword False (or the value zero) is used to represent falsehood, while True (or any non-zero value, e.g. 1) represents truth. Thus:if True: print "True is true"if False: print "False is true"if -99: print "-99 is true"

True is true-99 is true

x = 222if x % 2 == 0 and x % 3 == 0:

print x, "is an even multiple of 3"

222 is an even multiple of 3

23

x = 0while x < 10: print x, x+=1

0 1 2 3 4 5 6 7 8 9

The indented code is repeatedlyexecuted as longas the conditionx<10 remainstrue

Loops

• Here's how to print out the numbers 0 to 9:

• This is a while loop.The code is executed while the condition is true.

Equivalent tox = x + 1

24

A common kind of loop

• Let's dissect the code of the while loop again:

• Alternatively, the for loop construct iterates through a list

x = 0while x < 10: print x, x+=1

Initialisation

Test for completion

Continuation

for x in range(10): print x,

Iteration variable Generates a list[0,1, …,9]

25

For loop features

• Loops can be used with all iteratable types, ie.: lists, strings, tuples, iterators, sets, file handlers

• Stepsizes can be specified with the 3. argument of the slice constructor (negative values for iterating backwards)

>>> for nucleotide in "actgc":... print nucleotide,a c t g c

>>> for number in range(50)[::7]:... print number,0 7 14 21 28 35 42 49>>> for nucleotide in "actgc"[::-1]:... print nucleotide,c g t c a

26

Reading Data from Files

• To read from a file, we can conveniently iterate through it linewise with a for-loop and the open function. Internally a filehandle is maintained during the loop.

This code snippet opens a file called"sequence.txt" in the in the current directory, and iterates through it line by line

for line in open("sequence.txt"):print line,

>CG11604TAGTTATAGCGTGAGTTAGTTGTAAAGGAACGTGAAAGATAAATACATTTTCAATACC

>CG11604TAGTTATAGCGTGAGTTAGTTGTAAAGGAACGTGAAAGATAAATACATTTTCAATACC

sequence.txt

The comma prevents print's automatic newline

27

Python for BioinformaticsLecture 2: Sequences and Lists

28

Summary: scalars and loops

• Assignment operator

• Arithmetic operations

• String operations

• Conditional tests

• Logical operators

• Loops

• Reading a file

x = 5

y = x * 3

if y > 10: print s

s = "Concatenating " + "strings"

if y > 10 and not s == "": print s

for x in range(10): print x

for line in open("sequence.txt"):print line,

29

Pattern-matching

• A very sophisticated kind of logical test is to ask whether a string contains a pattern

• e.g. does a yeast promoter sequence contain the MCB binding site, ACGCGT?

name = "YBR007C"dna="TAATAAAAAACGCGTTGTCG"if "ACGCGT" in dna: print name, "has MCB!"

20 bases upstream ofthe yeast gene YBR007C

The membership operator in

The pattern for the MCB binding site

YBR007C has MCB!

30

FASTA format

• A format for storing multiple named sequences in a single file

• This file contains 3' UTRsfor Drosophila genes CG11604,CG11455 and CG11488

>CG11604TAGTTATAGCGTGAGTTAGTTGTAAAGGAACGTGAAAGATAAATACATTTTCAATACC>CG11455TAGACGGAGACCCGTTTTTCTTGGTTAGTTTCACATTGTAAAACTGCAAATTGTGTAAAAATAAAATGAGAAACAATTCTGGT>CG11488TAGAAGTCAAAAAAGTCAAGTTTGTTATATAACAAGAAATCAAAAATTATATAATTGTTTTTCACTCT

Name of sequence ispreceded by > symbol

NB sequences canspan multiple lines

Call this file fly3utr.txt

31

Printing all sequence names in a FASTA database

for line in open("fly3utr.txt"): if line.startswith(">"): print line,

>CG11604>CG11455>CG11488

32

Finding all sequence lengthslength=0name=""for line in open("/home/bioinf/ah/tmp/sequence.txt"): line=line.rstrip() if line.startswith(">"): if name and length: print name, length name=line[1:] length=0 else: length+=len(line)print name, length

CG11604 58CG11455 83CG11488 69

The rstrip statementtrims the white space charactersoff the right end.Try it without this andsee what happens – and if you can work out why

>CG11604TAGTTATAGCGTGAGTTAGTTGTAAAGGAACGTGAAAGATAAATACATTTTCAATACC>CG11455TAGACGGAGACCCGTTTTTCTTGGTTAGTTTCACATTGTAAAACTGCAAATTGTGTAAAAATAAAATGAGAAACAATTCTGGT>CG11488TAGAAGTCAAAAAAGTCAAGTTTGTTATATAACAAGAAATCAAAAATTATATAATTGTTTTTCACTCT

33

Reverse complementing DNA

def revcomp(dna): replaced=list(dna.lower(). replace("a","x").replace("t","a"). replace("x", "t").replace("g","x"). replace("c","g").replace("x", "c")) replaced.reverse() return "".join(replaced)

print revcomp("accACgttAGgtct ")

agacctaacgtggt

Start by making string lower caseagain. This is generally good practice

Reverse the list

Replace 'a' with 't', 'c' with 'g','g' with 'c' and 't' with 'a'

• A common operation due to double-helix symmetry of DNA

34

Lists• A list is a list of variables

• We can think of this as a list with 4 entries

nucleotides = ['a', 'c', 'g', 't']print "Nucleotides: ", nucleotides

Nucleotides: ['a', 'c', 'g', 't']

a c g telement 0

element 1 element 2 element 3

the list is theset of all four elements

Note that the elementindices start at zero.

35

List literals

• There are several, equally valid ways to assign an entire array at once.

a = [1,2,3,4,5]print "a = ",ab = ['a','c','g','t']print "b = ",bc = range(1,6)print "c = ",cd = "a c g t".split()print "d = ", d

a = [1,2,3,4,5] b = ['a','c','g','t'] c = [1,2,3,4,5] d = ['a','c','g','t']

This is the most common: a comma-separated list, delimited by squared brackets

36

Accessing lists

• To access list elements, use square brackets e.g. x[0] means "element zero of list x"

• Remember, element indices start at zero!• Negative indices refer to elements counting from

the end e.g. x[-1] means "last element of list x"

x = ['a', 'c', 'g', 't']i=2print x[0], x[i], x[-1] a g t

37

List operations• You can sort and reverse lists...

• You can read the entire contents of a file into an array (each line of the file becomes an element of the array)

x = ['a', 't', 'g', 'c']print "x =",xx.sort()print "x =",xx.reverse()print "x =",x

x = a t g cx = a c g tx = t g c a

seqfile = open, "C:/sequence.txt"x = <FILE>

38

Applying Methods to Objects

• Instances of lists, strings, etc. are objects with built-in methods

• Explore available methods using dir:>>> dir("hello")['__add__', … ,'__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'replace', 'rfind', 'rindex', 'rjust', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']>>> help("hello".count) (…)Return the number of occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.>>> "hello".count("l")2

String object Method

. (dot) applies method to object

List ofapplicablemethods

39

List operations>>> x=[1,0]*5>>> x[1, 0, 1, 0, 1, 0, 1, 0, 1, 0]>>> while 0 in x: print x.pop(),

0 1 0 1 0 1 0 1 0>>> x[1]>>> x.append(2)>>> x[1, 2]>>> x+=x>>> x[1, 2, 1, 2]>>> x.remove(2)>>> x[1, 1, 2]>>> x.index(2)2

pop removes the lastelement of a list

append adds an elementto the end of a list

Multiplying lists with *

concatenating lists with +or +=

Removing the first occurrence of an element

Position of an element

40

for loop revisited

• Finding the total of a list of numbers:

• Equivalent to:

val = [4, 19, 1, 100, 125, 10]total = 0for x in val: total += xprint total

259

val = [4, 19, 1, 100, 125, 10]total = 0for i in range(len(val)): total += val[i]print total

259

for statementloops through eachentry in a list

41

Modules

• Additional functionality, that is not part of the core language, is in modules like– sys (system) – re (regular expressions)– math (mathematics)

• Load modules with import

• You can write your own modules and import them

>>> import math>>> help(math)Help on built-in module math:…

42

The sys.argv list

• A special list is sys.argv• This contains the command-line arguments if the

program is invoked at the command line• It's a way for the user to pass information into

the program, if you don't have a graphical interface with which to do this

import sysprint sys.argv

ah@studipool1> python args.py abc 123['args.py', 'abc', '123']

File args.py

Output at command line

43

Converting a sequence into a list

• The underlying programming language C treats all strings as lists

>>> dna="acgtcgaga">>> list(dna)['a', 'c', 'g', 't', 'c', 'g', 'a', 'g', 'a']>>> """You can also makeuse of long stringsand thesplitfunction""".split()['You', 'can', 'also', 'make', 'use', 'of', 'long', 'strings', 'and', 'the', 'split', 'function']

Data types can be converted. Here the list function converts a string into a list.

Triple quotes allow for strings that stretch over several lines

44

Taking a slice of a list

• The syntax x[i:j] returns a list containing elements i,i+1,…,j-1 of list x

nucleotides = ['a', 'g', 'c', 't']purines = nucleotides[0:2] # nucleotides[:2] also workspyrimidines = nucleotides[2:4]# nucleotides[2:] also worksprint "Nucleotides:", nucleotidesprint "Purines:", purinesprint "Pyrimidines:", pyrimidines

Nucleotides: ['a', 'g', 'c', 't']Purines: ['a', 'g']Pyrimidines: ['c', 't']

45

Applying a function to a list

• The map command applies a function to every element in an array

• Similar syntax to list: map(EXPR,LIST) applies EXPR to every element in LIST

• EXPR can be arbitrary function, defined elsewhere or lambda calculus expression

• Lambda calculus: provides "anonymous" function, constructed with keyword lambda, a set of parameters, and an expression with these

• Example: multiply every number by 3

>>> map(lambda x: x*3, [1,2,3])[3, 6, 9]

46

Python for BioinformaticsLecture 3: Patterns and Functions

47

Review: pattern-matching

• The following code:

prints the string "Found MCB binding site!" if the pattern "ACGCGT" is present in the string variable "sequence"

• We can replace the first occurrence of ACGCGT with the string _MCB_ using the following syntax:

• We can replace all occurrences by omitting the optional count argument

if "ACGCGT" in dna: print "Found MCB binding site!"

dna.replace("ACGCGT","_MCB_")

dna.replace("ACGCGT","_MCB_", 1)

countpattern replacement

48

Regular expressions

• Python provides a pattern-matching engine

• Patterns are called regular expressions

• They are extremely powerful

• Often called "regexps" for short

• import module re

49

Motivation: N-glycosylation motif

• Common post-translational modification

• Attachment of a sugar group

• Occurs at asparagine residues with the consensus sequence NX1X2, where

– X1 can be anything (but proline inhibits)

– X2 is serine or threonine

• Can we detect potential N-glycosylation sites in a protein sequence?

50

Building regexps

• In general square brackets denote a set of alternative possibilities

• Use - to match a range of characters: [A-Z]

• . matches anything• \s matches spaces or tabs• \S is anything that's not a space or tab• [^X] matches anything but X

51

Using Regular Expressions

• Compile a regular expression object (pattern) using re.compile

• pattern has a number of methods (try dir(pattern)), eg. – match (in case of success returns a Match object,

otherwise None)– search (scans through a string looking for a match)– findall (returns a list of all matches)

>>> import re>>> pattern = re.compile('[ACGT]')>>> if pattern.match("A"): print "Matched"Matched>>> if pattern.match("a"): print "Matched">>>

successful match

unsuccessful, returns None

by default case sensitive

52

Matching alternative strings

• /(this|that)/ matches "this" or "that"

• ...and is equivalent to /th(is|at)/

>>> pattern=re.compile("(this|that|other)", re.IGNORECASE)>>> pattern.search("Will match THIS") ## success<_sre.SRE_Match object at 0x00B52860>>>> pattern.search("Will also match THat") ## success<_sre.SRE_Match object at 0x00B528A0>>>> pattern.search("Will not match ot-her") ## will return None>>>

case unsensitive search pattern

Python returns a description of the match object

53

Matching multiple characters• x* matches zero or more x's• x+ matches one or more x's• x{n} matches n x's• x{m,n} matches from m to n x's

Word and string boundaries• ^ matches the start of a string• $ matches the end of a string• \b matches word boundaries

54

"Escaping" special characters

• \ is used to "escape" characters that otherwise have meaning in a regexp

• so \[ matches the character "["– if not escaped, "[" signifies the start of a list of

alternative characters, as in [ACGT]

• All special characters: . ^ $ * + ? { [ ] \ | ( )

55

Substitutions/Match Retrieval

• regexp methods can be used without compiling (less efficient but easier to use)

• Example re.sub (substitution):

• Example re.findall:

>>> re.sub("(red|blue|green)", "color", "blue socks and red shoes")'color socks and color shoes'

>>> e,raw,frm,to = re.findall("\d+", \"E-value: 4, \Raw Bit Score: 165, \Match position: 362-419")

>>> print e, raw, frm, to4 165 362 419

\ allows multiple line commandsalternatively, construct multi-line strings using triple quotes """ …"""

The result, a list of 4 strings, is assigned to 4 variables

matches one or more digits

56

N-glycosylation site detector>>> protein="""MGMFFNLRSNIKKKAMDNGLSLPISRNGSSNNIKDKRSEHNSNSLKGKYRYQPRSTPSKFQLTVSITSLIIIAVLSLYLFISFLSGMGIGVSTQNGRSLLGSSKSSENYKTIDLEDEEYYDYDFEDIDPEVISKFDDGVQHYLISQFGSEVLTPKDDEKYQRELNMLFDSTVEEYDLSNFEGAPNGLETRDHILLCIPLRNAADVLPLMFKHLMNLTYPHELIDLAFLVSDCSEGDTTLDALIAYSRHLQNGTLSQIFQEIDAVIDSQTKGTDKLYLKYMDEGYINRVHQAFSPPFHENYDKPFRSVQIFQKDFGQVIGQGFSDRHAVKVQGIRRKLMGRARNWLTANALKPYHSWVYWRDADVELCPGSVIQDLMSKNYDVI""".upper().replace("\n","")>>> for match in re.finditer("N[^P][ST]", protein):

print match.group(), match.span()

NGS (26, 29)NLT (214, 217)NGT (250, 253)

multi-line string, upper case, line breaks removed

N[^P][ST]- the main regular expression

re.finditerprovides an iterator over match-objects

match.group and match.span print the actual matched string and the position-tuple.Altenatively, you can print gene[match.start():match.end()]

57

PROSITE and Pfam

PROSITE – a database of regular expressionsfor protein families, domains and motifs

Pfam – a database of Hidden MarkovModels (HMMs) – equivalent toprobabilistic regular expressions

58

Another Example:

• Ferredoxins are a group of iron-sulfur proteins which mediate electron transfer

• The share the motif C, then two residues, C, then two residues, C, then three residues, C, then either P,E, or G

• The 4 C's are 4Fe-4S ligands

• What is the corresponding Python

59

Another Example:

• Ferredoxins are a group of iron-sulfur proteins which mediate electron transfer

• The share the motif C, then two residues, C, then two residues, C, then three residues, C, then either P,E, or G

• The 4 C's are 4Fe-4S ligands

• What is the corresponding Python code?• C.{2}C.{2}C.{3}C[PEG]

60

Another Example:

Courtesy of Chris Bystroff

61


62

Another Example:


63

Another Example

• Regular expressions are useful to parse text

• Example: extract information from Blast output, such as – species name– E value– Score– ID

64

Another ExampleBLASTP 2.2.6 [Apr-09-2003]

RID: 1062117117-16602-2157828.BLASTQ3Query= gi|6174889|sp|P26367|PAX6_HUMAN Paired box protein Pax-6(Oculorhombin) (Aniridia, type II protein). (422 letters)

Database: All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF 1,509,571 sequences 486,132,453 total letters

Results of PSI-Blast iteration 1Sequences with E-value BETTER than threshold Score ESequences producing significant alignments: (bits) Value

gi|4505615|ref|NP_000271.1| paired box gene 6 isoform a Paired box h... 781 0.0 gi|189353|gb|AAA59962.1| oculorhombin >gi|189354|gb|AAA59963.1| oculo... 780 0.0 gi|6981334|ref|NP_037133.1| paired box homeotic gene 6 [Rattus norveg... 778 0.0 gi|26389393|dbj|BAC25729.1| unnamed protein product [Mus musculus] 776 0.0 gi|7305369|ref|NP_038655.1| paired box gene 6 small eye Dickie's sm... 776 0.0 gi|383296|prf||1902328A PAX6 gene 775 0.0 gi|4580424|ref|NP_001595.2| paired box gene 6 isoform b Paired box h... 775 0.0 gi|18138028|emb|CAC80516.1| paired box protein [Mus musculus] 773 0.0 gi|2576237|dbj|BAA23004.1| PAX6 protein [Gallus gallus] 770 0.0 gi|27469846|gb|AAH41712.1| Similar to paired box gene 6 [Xenopus laevis] 768 0.0 …

65

Functions

• Often, we can identify self-contained tasks that occur in so many different places we may want to separate their description from the rest of our program.

• Code for such a task is called a function• Examples of such tasks:

– finding the length of a sequence– reverse complementing a sequence– finding the mean of a list of numbers

NB: Python provides the function len(x) to do this already

66

Maximum element of a list

• Function to find the largest entry in a list

def find_max(data): max = data.pop() for x in data: if x > max: max = x return max

data = [1, 5, 1, 12, 3, 4, 6]print "Data:", dataprint "Maximum:", find_max(data)

Data: [1, 5, 1, 12, 3, 4, 6]Maximum: 12

Function declaration

Function result

Function body

Function call

67

Reverse complementfrom string import maketrans

def revcomp(seq): translation = maketrans("agct", "tcga") comp = seq.translate(translation) rcomp = comp[::-1] # reversing comp return rcomp

dna = "cggcgt"rev = revcomp(dna)print "Revcomp of %s is %s"%(dna, rev)

Revcomp of cggcgt is acgccg

The arguments follow the function name in parantheses(in this case seq, the sequence to be revcomp'd)

By default, translation and comp are local variables, ie., they "live" only insidethe surrounding function

return announcesthat the return valueof this function is whatever's in rcomp

string formatted with place holders

68

revcomp goes OO

from string import *

class DNA: def __init__(self, sequence): self.seq=sequence.lower() def revcomp(self): translation = maketrans("agct", "tcga") comp = self.seq.translate(translation) self.revcomp = comp[::-1]

def report(self): print "Revcomp of %s is %s"%\ (self.seq,self.revcomp) dna = DNA("accggcatg")# Creating a DNA objectdna.revcomp()dna.report()

Class Constructorsaves input sequenceas object variable inlower case

self refers to the current object, gives access to all its variables

method calls

Useful to structure code :add additional DNA sequence functionality to this class, eg. a function that calculates GC-contents, translation to protein etc.

69

Mean & standard deviationfrom math import sqrt

def mean_sd(data): n = len(data) sum = 0 sqSum = 0 for x in data: sum += x sqSum += x * x mean = sum / n variance = sqSum / n - mean * mean sd = sqrt (variance) return (mean, sd)

data = [1, 5, 1, 12, 3, 4, 6](mean, sd) = mean_sd (data)print "Data:", dataprint "Mean:", meanprint "Standard deviation:", sd

Functionreturns atwo-elementtuple: (mean,sd)

Functiontakes a listof n numericarguments

Importing square root function from module math

70

Including variables in patterns• Function to find number of instances of a

given binding site in a sequence

def count_matches(pattern, text): pos=text.rfind(pattern) if pos==-1: return 0 else: return count_matches(pattern, text[:pos])+1

print count_matches("ACGCGT", "ACGCGTAAGTCGGCACGCGTACGCGT")

3

finds rightmost position, where pattern matches in text

text="ACGCGTAAGTCGGCACGCGTACGCGT"print text.count("ACGCGT")

call recursively with text to the left of rightmost match,count up one

no match

NB: Built-in string method count also does the job

71

Python for BioinformaticsLecture 4: Dictionaries

72

Data structures

• Suppose we have a file containing a table of Drosophila gene names and cellular compartments, one pair on each line:

Cyp12a5 MitochondrionMRG15 NucleusCop Golgibor CytoplasmBx42 Nucleus

Suppose this file is in "c:/genecomp.txt"

73

Reading a table of data

• We can split eachline into a 2-ele-ment list using thesplit command.

• This breaks the line at each space:

• The opposite of split is join, which makes a string from a list of strings

genes, comps= [], []for line in open("C:/genecomp.txt"): gene, comp = line.split() genes.append(gene) comps.append(comp)print "Genes:", " - ".join(genes)print "Compartments:", " ".join(comps)

Genes: Cyp12a5 - MRG15 – Cop - bor - Bx42Compartments: Mitochondrion Nucleus Golgi Cytoplasm Nucleus

74

Finding an entry in a table• The following code assumes that we've

already read in the table from the file:

• Example:sys.argv[1] = "Cop"

import sysgeneToFind = sys.argv[1]for i in range(len(genes)): if genes[i]==geneToFind: print "Gene:", genes[i] print "Compartment:", comps[i] sys.exit()print "Couldn't find gene"

Searching for gene CopGene: CopCompartment: Golgi

75

Binary search• The previous algorithm is inefficient. If there are N

entries in the list, then on average we have to search through ½N entries to find the one we want.

• For the full Drosophila genome, N=12,000. This is painfully slow.

• An alternative is the Binary Search algorithm:

Start with a sorted list.

Compare the middle elementwith the one we want. Pick thehalf of the list that contains ourelement.

Iterate this procedure to"home in" on the right element.This takes log2(N) steps.

76

Dictionaries (hashes)

• Implementing algorithms like binary search is a common task in languages like C.

• Conveniently, Python provides a type of array called a dictionary (also called a hash) that does something similar for you.

• A dictionary is a set of key:value pairs (like our gene:compartment table)

comp["Cop"] = "Golgi" Squared brackets [] are used to index a dictionary

77

keys and values• keys returns the list of keys in the hash

– e.g. names, in the name2seq hash

• values returns the list of values– e.g. sequences, in the name2seq hashname2seq = read_FASTA ("C:/fly3utr.txt")print "Sequence names: ", " ".join(name2seq.keys()) print "Total length: ", len("".join(name2seq.values()))

Sequence names: CG11488 CG11604 CG11455Total length: 210

78

Getting familiar with hashes>>> tlf={"Michael" : 40062, \"Bingding" : 40064, "Andreas": 40063 }>>> tlf.keys()['Bingding', 'Andreas', 'Michael']>>> tlf.values()[40064, 40063, 40062]>>> tlf["Michael"]40062>>> tlf.has_key("Lars")False>>> tlf["Lars"] = 40070>>> tlf.has_key("Lars") # now its thereTrue>>> for name in tlf.keys():... print name, tlf[name]... Lars 40070Bingding 40064Andreas 40063Michael 40062

Creating an initial phone book

Asking for all keys

Asking for all values

Asking for a value, given a key

Checking whether a key is in the list

Inserting a single key:value pair

Looping through the dictionary

79

Reading a table using hashes

import syscomps={}for line in open("C:/genecomp.txt"): gene, comp = line.split() comps[gene] = comp

geneToFind=sys.argv[1]print "Gene:", geneToFindprint "Compartment:", comp[geneToFind]

Gene: CopCompartment: Golgi

...with sys.argv[1] = "Cop" as before:

80

Reading a FASTA file into a hash

def read_fasta(filename): name = None name2seq = {} for line in open(filename): if line.startswith(">"): if name: name2seq[name]=seq name=line[1:].rstrip() seq="" else: seq+=line.rstrip() name2seq[name]=seq return name2seq

Final entry, after loop

if name only evaluates to false, if it is still None (when going over first line)

new name is derived from line from second letter on, with new-line character removed

81

Formatted output of sequencesdef print_seq(name, seq, width=50): print ">"+name i=0 while i<len(seq): print seq[i : i+width] i+=width

print_seq("Tata-box1", "TA"*55)print_seq("Tata-box2", "TA"*55, 30)

Default values, assigned in parameter line,placed rightmostHere, width default is 50-column output

>Tata-box1TATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATA>Tata-box2TATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATA

82

Files of sequence names

• Easy way to specify a subset of a given FASTA database

• Each line is the name of a sequence in a given database

• e.g. CG1167CG685CG1041CG1043

83

Get named sequences• Given a FASTA database and a "file of sequence

names", print every named sequence:

(fasta, fosn) = sys.argv[1:3]name2seq = read_FASTA (fasta)for name in open(fosn): name = name.rstrip() if name2seq.has_key[name]: seq = name2seq[name] print_seq (name, seq) else: print "Can't find sequence: %s."%name, "Known sequences: ", " ".join(name2seq.keys())

84

Common Set operations

• Two files of sequence names:• What is the overlap/difference/union?• Find eg. intersection using sets

CG1167CG685CG1041CG1043

CG215CG1041CG483CG1167CG1163

from sets import SetgeneSet1 = Set([])geneSet2 = Set([])for line in open("C:/fosn1.txt"): geneSet1.add(line.rstrip())for line in open("C:/fosn2.txt"): geneSet2.add(line.rstrip())

C:/fosn1.txt C:/fosn2.txt

>>> geneSet1Set(['CG1043', 'CG1041', 'CG1167', 'CG685'])>>> geneSet1.intersection(geneSet2)Set(['CG1041', 'CG1167'])>>> geneSet2.difference(geneSet1)Set(['CG483', 'CG215', 'CG1163'])>>> geneSet1.union(geneSet2)Set(['CG483', 'CG1043', 'CG1041', 'CG1167', 'CG685', 'CG1163', 'CG215'])

AA B

AA BAA B

difference intersection

union

85

More Set operations

• Since every element in a Set occurs only once, sets can be used to reduce redundancy

>>> from sets import Set>>> Set([1,2,3,1,3,3])Set([1, 2, 3])

>>> pqs=Set("1kim 1dan 1bob".split())>>> pdb=Set("1bob 3mad 1dan 2bad 1kim".split())>>> pqs.issubset(pdb)True

• A is a superset of B when A fully contains BTest: A.issuperset(B)

• A is a subset of B when A is fully contained in B Test: A.issubset(B)

86

The genetic code as a hashaa = {'ttt':'F', 'tct':'S', 'tat':'Y', 'tgt':'C', 'ttc':'F', 'tcc':'S', 'tac':'Y', 'tgc':'C', 'tta':'L', 'tca':'S', 'taa':'!', 'tga':'!', 'ttg':'L', 'tcg':'S', 'tag':'!', 'tgg':'W', 'ctt':'L', 'cct':'P', 'cat':'H', 'cgt':'R', 'ctc':'L', 'ccc':'P', 'cac':'H', 'cgc':'R', 'cta':'L', 'cca':'P', 'caa':'Q', 'cga':'R', 'ctg':'L', 'ccg':'P', 'cag':'Q', 'cgg':'R', 'att':'I', 'act':'T', 'aat':'N', 'agt':'S', 'atc':'I', 'acc':'T', 'aac':'N', 'agc':'S', 'ata':'I', 'aca':'T', 'aaa':'K', 'aga':'R', 'atg':'M', 'acg':'T', 'aag':'K', 'agg':'R', 'gtt':'V', 'gct':'A', 'gat':'D', 'ggt':'G', 'gtc':'V', 'gcc':'A', 'gac':'D', 'ggc':'G', 'gta':'V', 'gca':'A', 'gaa':'E', 'gga':'G', 'gtg':'V', 'gcg':'A', 'gag':'E', 'ggg':'G' }

87

Translating: DNA to proteindef translate(dna): length = len(dna) if len(dna) % 3 != 0: print "Warning: Length is not a multiple of 3!" sys.exit() protein = "" i = 0 while i < length: codon = dna[i:i+3] if not aa.has_key(codon): print "Codon %s is illegal"%codon sys.exit() protein += aa[codon] i+=3 return protein

>>> translate("gatgacgaaagttgt")'DDESC'>>> translate("gatgacgaaagttgta")Warning: Length is not a multiple of 3!… (SystemExit)>>> translate("gatgacgiaagttgt")Codon gia is illegal… (SystemExit)

88

Counting residue frequencies

def count_residues(seq): freq={} seq = seq.lower() for i in range(len(seq)): if freq.has_key(seq[i]): freq[seq[i]]+=1 else: freq[seq[i]]=1 return freq

freq = count_residues("gatgacgaaagttgt")for residue in freq.keys(): print residue,":", freq[residue]

g : 5a : 5c : 1t : 4

89

Counting N-mer frequencies

def count_nmers(seq, n): freq={} seq = seq.lower() for i in range(len(seq)-n+1): nmer=seq[i : i+n] if freq.has_key(nmer): freq[nmer]+=1 # incr. according counter else: freq[nmer]=1 # first occurence return freq

freq = count_nmers("gatgacgaaagttgt", 2)for residue in freq.keys(): print residue,":", freq[residue]

cg: 1tt: 1ga: 3tg: 2gt: 1aa: 2ac: 1at: 1ag: 1

N-mer frequencies for a whole filefrom read_fasta import read_fasta def count_nmers(seq, n, freq): seq = seq.lower() for i in range(len(seq)-n+1): nmer=seq[i : i+n] if freq.has_key(nmer): freq[nmer]+=1 else: freq[nmer]=1 return freq

name2seq = read_fasta("z:/tmp/fly3utr.txt")freq = {}## count for each sequencefor seq in name2seq.values(): freq = count_nmers(seq, 2, freq)## display statisticsfor residue in freq.keys(): print residue,":", freq[residue]

ct : 5tc : 9tt : 26cg : 4ga : 11tg : 12gc : 2gt : 17aa : 39ac : 10gg : 4at : 17ca : 11ag : 15ta : 20cc : 2

Note how we keep passing freq back into the count_nmers function, to get cumulative counts

We reuse a function we wrote earlier by import ing it.The first is the filename (without .py), the second the function name

91

Files and filehandles

• Opening a file:• Closing a file:• Reading a line:• Reading an array:• Printing a line:• Read-only:• Write-only:• Test if file exists:

fh = open(filename)

fh.close()

This fh is the filehandle

data = fh.readline()

data = fh.read()

fh.write(line)

fh = open(filename, "r")

fh = open(filename, "w")

import osif os.path.exists(filename): print "filename exists!"

92

Database access from Python# use the database package with all the DB relevant sub-routines import MySQLdb

# import class that enables data acquirement as dictionariesfrom MySQLdb.cursors import DictCursor

# Connection to database with access specificationconn = MySQLdb.connect(db="scop", # name of database host="myserver", # name of server user="guest", # username passwd="guest") # password# create access pointer that retrieves dictionaries cursor = conn.cursor(DictCursor)

# send a querycursor.execute("SELECT * FROM cla LIMIT 10")

# retrieve all rows as a list of dictionariesdata = cursor.fetchall()

# close connectionconn.close()

93

Local vs. global variablesdef foo(): a=3 print a

a=6print a foo() print a

def foo(): global a a=3 print a

a=6print a foo()print a

does not affect global a

does affect global a

def foo(a): print a

a=6print a foo(3)print a

Parameters are local

• Function variables and parameters are by default local• Unless you declare them to be global

636

633

636

94

References in Python

• Lists, Dictionaries and otherDatatypes are usually referenced, ie. when assigning a variable, no data is copied:

• [1,2,3,4]

• "Real copies" with copy module

• Don't worry about any referencing, Python is doing the job! But be aware when you want to copy objects

>>> a = [1,2,3,4]>>> b=a>>> b[2]=7>>> a[1, 2, 7, 4]

>>> from copy import copy>>> b = copy(a)>>> b[2]=3>>> a[1, 2, 7, 4]

a bassigning b=a b points to the same list as a

95

Matrices

• Easy solution: Lists of lists in core Python

• Access an element at position (i,j) in a list of lists: selecting from the i'th row the j'th element

• Disadvantages: Operations like Addition/Multiplications on lists (of lists) would be slow, need to be implemented

• Luckily: big library already available,fast (since implemented in C),rich functionality

>>> m = [[1, 2], [3, 4]]>>> m[1][3, 4]>>> m[1][1]4

96

Matrices with numarray

• Faster, more calculations (reshaping, built-in matrix operations) with external package numarray

• various matrix creation methods with numarray:– from list of lists– zeros/2 – ones/2– identity/1– from a function

– etc.

• Convenient access of multidimensional array elements

>>> from numarray import *>>> m1 = array(m);m1array([[1, 2], [3, 4]])>>> m1.getshape()(2, 2)>>> zeros((3,5))array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])>>> m2=array(arange(8))>>> m2.setshape((2,4))>>> m2array([[0, 1, 2, 3], [4, 5, 6, 7]])>>> m2[1,1]5

97

Matrices with numarray• You can select rows

and columns,

or even submatrices(same "slicing" as with lists)

• You can apply a scalar operation like – addition + – multiplication *– sine or cosineto an array

>>> m1[:,1] # second columnarray([2, 4]) >>> #arange produces one-dim. array>>> m = arange(9, shape=(3,3));marray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])

>>> m[1:,1:]array([[4, 5], [7, 8]])>>> m[1] + 3array([6, 7, 8])>>> m[1] * 3array([ 9, 12, 15])>>> m1[1] * 3array([ 9, 12])>>> sin(m1)array([[ 0.84147098, 0.90929743], [ 0.14112001, -0.7568025 ]])

98

More Math

• Remember the mean and standard deviation from Lecture 3?• Reuse of existing packages makes live easier:

• Or finding the maximum in a list becomes now:

• numarray also provides functions for dot product, vector calculations etc.

>>> data = array([1, 5, 1, 12, 3, 4, 6])>>> data.mean()4.5714285714285712>>> data.stddev()3.7796447300922722

>>> dot(array([1,2,3]), array([1,2,3]))14>>> array([1,2,3]) + array([4,5,6])array([5, 7, 9])

>>> data[argmax(data)]12

99

Longest Common Subsequencefrom numarray import *seq1="ATCTGATC"seq2="TGCATA"

len1 = len(seq1)len2 = len(seq2)

def max3(a,b,c): return max( max(a,b) ,c)

#Create an array val of length len1+1 times len2+1val=zeros((len1+1,len2+1))

for i in range(1,len1+1): for j in range(1,len2+1): if seq1[i-1]==seq2[j-1]: val[i,j] = max3(val[i-1,j], val[i,j-1], val[i-1,j-1]+1) else: val[i,j] = max3(val[i-1,j], val[i,j-1], val[i-1,j-1])print vallcs = val[len1,len2]print "The longest common subsequence of %s and %s is %d (%f)"% \ (seq1, seq2, lcs, float(lcs) / max(len1,len2))

100

Longest Common Subsequence Output

[[0 0 0 0 0] [0 1 1 1 1] [0 1 1 1 1] [0 1 1 1 1] [0 1 2 2 2] [0 1 2 3 3] [0 1 2 3 4] [0 1 2 3 4] [0 1 2 3 4]]The longest common subsequence of ATCTGATC and TGCATA is 4 (0.500000)

Result of print val

Final Result

101

Classes• Define a class to store PDB residues. A residue has: a

name, a position in the sequence, and a list of atoms. An atom has a name and coordinates. Define 2 methods: add_residue and add_atom

class PDBStructure: def add_residue(self, name, posseq): residue = {'name': resname, 'posseq': posseq, 'atoms': []} self._residues.append(residue) return residue def add_atom(self, residue, name, coord): atom = {'residue': residue, 'name': name, 'coord': coord } residue['atoms'].append(atom) return atom

102

Classes: Usagestruct = PDBStructure()residue = struct.add_residue(name = "ILE", posseq = 1 )struct.add_atom(residue, name = "N", coord = (23.46800041, -8.01799965, -15.26200008))struct.add_atom(residue, name = "CZ", coord = (125.50499725, 4.50500011, -19.14800072))residue = struct.add_residue(name = "LYS", posseq = 2 )struct.add_atom(residue, name = "OE1", coord = (126.12000275, -1.78199995, -15.04199982))

print struct.residues

[{'name': 'ILE', 'posseq': 1, 'atoms': [ \{'name': 'N', 'coord': (23.468000409999998, \-8.0179996500000001, -15.26200008)}, \{'name': 'CZ', 'coord': (125.50499725, \4.5050001100000001, -19.148000719999999)}]}, \{'name': 'LYS', 'posseq': 2, 'atoms': [ \{'name': 'OE1', 'coord': (126.12000275, \-1.7819999500000001, -15.041999819999999)}]}]

Programming in Python

Documents

Transcript of Programming in Python