Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH...

32
Searching (and manipulating) your data

description

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location: UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG UGCGUGUGUG UGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGG GCAGUAGGCC CAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUG GGGGCCAUU GAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCC AGAGAAGCC AGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGAC AGGCAGAAU GGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUG GUUGUGGGG AGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACG GGGGUGUCUC AGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGG AGGCGGGCAA GGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCU GCUGUCCCA GAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC UGG : W > A06662_protein W

Transcript of Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH...

Page 1: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

Searching (and manipulating) your data

Page 2: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.
Page 3: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

UGG : W> A06662_proteinW

Page 4: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

GAC : D> A06662_proteinWD

Page 5: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

CAG : Q> A06662_proteinWDQ

Page 6: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

UCA : S> A06662_proteinWDQS

Page 7: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

GCA : A> A06662_proteinWDQSA

Page 8: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

GAG : E> A06662_proteinWDQSAE

Page 9: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

GCA : A> A06662_proteinWDQSAEA

Page 10: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

GCG : A> A06662_proteinWDQSAEAA

Page 11: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUGUGCGUGUGUGUGUGUACGCUUGCAUUUGUGUCGGGUGGGUAAGGAGAUAGAGAUGGGCGGGCAGUAGGCCCAGGUCCCGAAGGCCUUGAACCCACUGGUUUGGAGUCUCCUAAGGGCAAUGGGGGCCAUUGAGAAGUCUGAACAGGGCUGUGUCUGAAUGUGAGGUCUAGAAGGAUCCUCCAGAGAAGCCAGCUCUAAAGCUUUUGCAAUCAUCUGGUGAGAGAACCCAGCAAGGAUGGACAGGCAGAAUGGAAUAGAGAUGAGUUGGCAGCUGAAGUGGACAGGAUUUGGUACUAGCCUGGUUGUGGGGAGCAAGCAGAGGAGAAUCUGGGACUCUGGUGGUCUGGCCUGGGGCAGACGGGGGUGUCUCAGGGGCUGGGAGGGAUGAGAGUAGGAUGAUACAUGGUGGUGUCUGGCAGGAGGCGGGCAAGGAUGACUAUGUGAAGGCACUGCCCGGGCAACUGAAGCCUUUUGAGACCCUGCUGUCCCAGAACCAGGGAGGCAAGACCUUCAUUGUGGGAGACCAGGUGAGCAUCUGGCC

UGU : C> A06662_proteinWDQSAEAAC

Page 12: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

codonAMINO = {'GCU':'A','GCC':'A','GCA':'A', 'GCG':'A', 'CGU':'R','CGC':'R','CGA':'R','CGG':'R','AGA':'R','AGG':'R', 'UCU':'S','UCC':'S','UCA':'S','UCG':'S','AGU':'S','AGC':'S’ 'AUU':'I','AUC':'I','AUA':'I','AUU':'I','AUC':'I','AUA':'I', 'UUA':'L','UUG':'L','CUU':'L','CUC':'L','CUA':'L','CUG':'L', 'GGU':'G','GGC':'G','GGA':'G', 'GGG':'G', 'GUU':'V','GUC':'V','GUA':'V','GUG':'V', 'ACU':'T','ACC':'T','ACA':'T','ACG':'T', 'CCU':'P','CCC':'P','CCA':'P','CCG':'P', 'AAU':'N','AAC':'N', 'GAU':'D','GAC':'D', 'UGU':'C','UGC':'C', 'CAA':'Q','CAG':'Q', 'GAA':'E','GAG':'E', 'CAU':'H','CAC':'H', 'AAA':'K','AAG':'K', 'UUU':'F','UUC':'F', 'UAU':'Y', 'UAC':'Y', 'AUG':'M', 'UGG':'W', 'UAG':'STOP', 'UGA':'STOP', 'UAA':'STOP' }

Page 13: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

codonAMINO = {'GCU':'A','GCC':'A','GCA':'A', 'GCG':'A', 'CGU':'R','CGC':'R','CGA':'R','CGG':'R','AGA':'R','AGG':'R', 'UCU':'S','UCC':'S','UCA':'S','UCG':'S','AGU':'S','AGC':'S', 'AUU':'I','AUC':'I','AUA':'I','AUU':'I','AUC':'I','AUA':'I', 'UUA':'L','UUG':'L','CUU':'L','CUC':'L','CUA':'L','CUG':'L', 'GGU':'G','GGC':'G','GGA':'G','GGG':'G','AAU':'N','AAC':'N', 'GUU':'V','GUC':'V','GUA':'V','GUG':'V','GAU':'D','GAC':'D', 'ACU':'T','ACC':'T','ACA':'T','ACG':'T','UGU':'C','UGC':'C', 'CCU':'P','CCC':'P','CCA':'P','CCG':'P','CAA':'Q','CAG':'Q', 'GAA':'E','GAG':'E','CAU':'H','CAC':'H','AAA':'K','AAG':'K', 'UUU':'F','UUC':'F','UAU':'Y','UAC':'Y','AUG':'M','UGG':'W', 'AUG':'START','UAG':'STOP', 'UGA':'STOP', 'UAA':'STOP' }

>>>codonAMINO['GCU']'A'>>>codonAMINO['AUG']’START’>>> for k in codonAMINO.keys():... print k, codonAMINO[k]GUC VAUA IGUA VGUG VACU TAAC Netc.

Page 14: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

Dictionaries

Dictionaries are unordered collections of objects

Dictionaries are structures for mapping immutable objects (keys) on arbitrary objects (values)

d = {key1:value1, key2:value2,…,keyN:valueN}

lists and dictionaries cannot be used as dictionary keys!!!!

keys must be unique, i.e. the same key cannot be associated to more than one value

Page 15: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.
Page 16: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>>> d = {'pep1':'MGSNKSKPKDASQRRRSLEPAENVHGAGG', \ 'pep2':'RSLEPAENVHGAGGGAFPASQTPS'}>>> len(d)2

>>> d[‘pep1’]'MGSNKSKPKDASQRRRSLEPAENVHGAGG’

>>> d['pep3'] = 'ASADGHRGPSAAFAPAAA'>>> d{'pep1' : 'MGSNKSKPKDASQRRRSLEPAENVHGAGG', 'pep2' : 'RSLEPAENVHGAGGGAFPASQTPS', ‘pep3’ : 'ASADGHRGPSAAFAPAAA'}

Page 17: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>>> del d[‘pep2’]>>> d{'pep1' : 'MGSNKSKPKDASQRRRSLEPAENVHGAGG', ‘pep3’ : 'ASADGHRGPSAAFAPAAA'}

>>> d.clear()>>> d{ }

Page 18: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

>>> dict = {“a”:1, “b”:2, “c”:3}>>> dict.keys() #list of dictionary keys[ ‘a ’, ‘c ’, ‘b ’]

>>> keys = dict.keys()>>> keys.sort() #sort keys[ ‘a ’, ‘b ’, ‘c ’]

>>> dict.values() #list of dictionary values[1, 3, 2]

>>> dict.items() #tuple of dictionary (key,value) pairs[(‘a ’, 1), (‘c ’, 3), (‘b ’, 2)]

>>> dict.has_key(“a”) #True if dict has key “a”, else FalseTrue

Page 19: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

Exercise

Using the codonAMINO dictonary from tgac.py translate the sequence in rna_seq.fasta. Start with a single reading frame.Then try all reading frames.

Page 20: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

for line in F: if line[0] == '>': header = line.split() geneID = header[0] Out.write(geneID + '_protein\n') else: seq = seq + line.strip()

prot = ''for i in range(0,len(seq),3): if codonAMINO.has_key(seq[i:i+3]): prot = prot + codonAMINO[seq[i:i+3]] else: prot = prot + '*'

Out.write(prot + '\n')

Page 21: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

F = open('rna_seq.fasta')Out = open('protein_seq.fasta','w')

seq = ''for line in F: if line[0] == '>': header = line.split() geneID = header[0] Out.write(geneID + '_protein\n') else: seq = seq + line.strip()

from tgac import codonAMINO

prot = ''for j in range(3): Out.write(str(j) + "-frame\n") for i in range(j,len(seq),3): if codonAMINO.has_key(seq[i:i+3]): prot = prot + codonAMINO[seq[i:i+3]] else: prot = prot + '*' Out.write(prot + '\n') prot = ''

Page 22: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

Remove redundancy

Page 23: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

How many different objects?

How many unique objects?

Page 24: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

Are the two groups identical?

What is the intersection of the two groups?

Page 25: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

Q5XXA6Q9Y5P2Q14667O75387Q8WV07Q8CH62Q9GZY1Q9NQQ7Q8VCX2Q7Z769Q8CH62Q14667Q9NQQ7Q14667Q9Y5P2

Q7Z769Q8CH62Q9GZY1Q9NQQ7Q14667Q5XXA6Q9Y5P2Q14667O75387Q9Y5P2Q8WV07Q8VCX2Q8CH62Q14667Q9NQQ7

Page 26: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.
Page 27: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

Sets are unordered collections of unique objects

•Sets do not support indexing and slicing•in and not in operators can be used to test an element for membership in a set. •Sets are useful for removing duplicates•Set operations: intersection, union, difference, symmetrical difference

Sets

they are not sequence-like objects and that they cannot contain identical elements

Page 28: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

In order to create a set, the method set(x) must be used, where x is a sequence-like object (string, tuple, list)

Create a new set

add(x)update(x)

Page 29: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

S1.union(S2) The union between 2 sets S1 and S2 creates a new set with the elements from both S1 and S2.

>>> S1 = set(['a','b','c'])>>> S2 = set (['c','d','e'])>>> S1.union(S2)set([‘a’, ‘c’, ‘b’, ‘e’, ‘d’])>>> S1 | S2set([‘a’, ‘c’, ‘b’, ‘e’, ‘d’])

Page 30: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

S1.intersection(S2)

The intersection of 2 sets S1 and S2 creates a new set with the elements common to S1 and S2

>>> S1 = set(['a','b','c'])>>> S2 = set (['c','d','e'])>>> S1.intersection(S2)set([‘c’])>>> S1 & S2set([‘c’])

Page 31: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

S1.symmetric_difference(S2)or S1 ^ S2

Symmetric difference of two sets S1 and S2 creates a new set with elements in either S1 or S2 but not both

>>> S1 = set(['a','b','c'])>>> S2 = set (['c','d','e'])>>> S1.symmetric_difference(S2)set([‘a’, ‘b’, ‘e’, ‘d’])>>> S1 ^ S2set([‘a’, ‘b’, ‘e’, ‘d’])

Page 32: Searching (and manipulating) your data. >A06662 Synthetic nucleotide sequence of the human GSH transferase pi gene. : Location:1..1000 UGGGACCAGUCAGCAGAGGCAGCGUGUGUGCGCGUGCGUGUGCGUGUGUG.

S1.difference(S2)or S1 - S2

The difference of two sets S1 and S2 creates a new set with elements in S1 but not in S2

>>> S1 = set(['a','b','c'])>>> S2 = set (['c','d','e'])>>> S1.difference(S2)set([‘a’, ‘b’])>>> S1 - S2set([‘a’, ‘b’])>>> S2 – S1set([‘e’, ‘d’])