Protein Sequences
• From NCBI, search either proteins or genomes – using keywords etc.
• From Genome, type in HIV-1
• What links do you have from there?
• Choose NC_001802, then coding region
• From that entry, save FASTA protein
• Identify the gag-pol and env sequence
HIV-1 Gag-Pol AA Sequence>gi|28872819|ref|NP_057849.4| Gag-Pol; Gag-Pol polyprotein [Human immunodeficiency virus 1]
MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQT
GSEELRSLYNTVATLYCVHQRIEIKDTKEALDKIEEEQNKSKKKAQQAAADTGHSNQVSQNYPIVQNIQG
QMVHQAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAA
EWDRVHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTNNPPIPVGEIYKRWIILGLNKIVRMYSPT
SILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPAATLEEMMTAC
QGVGGPGHKARVLAEAMSQVTNSATIMMQRGNFRNQRKIVKCFNCGKEGHTARNCRAPRKKGCWKCGKEG
HQMKDCTERQANFLREDLAFLQGKAREFSSEQTRANSPTRRELQVWGRDNNSPSEAGADRQGTVSFNFPQ
VTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAI
GTVLVGPTPVNIIGRNLLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEK
EGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGD
AYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQY
MDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWT
VNDIQKLVGKLNWASQIYPGIKVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSK
DLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPI
QKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNRGR
QKVVTLTDTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKEKVYL
AWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGIDKAQDEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKC
QLKGEAMHGQVDCSPGIWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTI
HTDNGSNFTGATVRAACWWAGIKQEFGIPYNPQSQGVVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIH
NFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQNFRVYYRDSRNPLWKGPAKLLWKGEGAVVIQD
NSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED
BLASTing PDB
• Open two browsers
• Open the URL to NCBI BLAST P
• BLAST the PDB database with the amino acid sequence from gag-pol and then env
• Go to http://us.expasy.org/tools/blast/
• BLAST the PDB database as above
• What are the top structures at each site?
Digging Deeper into Sequence
• From the expasy PDB BLAST return:
• Choose another sequence (close or far) and do a Multiple Sequence Alignment
• Choose BLOSSUM or PAM matrices
• View the alignments in HTML format
NCBI BLASTp of PDB
• After doing the BLAST P of PDB:
• Click on related structures to see more
• Follow the PDB links to the MMDB– Hint: you can use some of these structures
at VAST for structure comparisons
• How can you display the structure?
• RasMol, SPDBV, and Cn3D viewers
BLAST P – Other Data
• From NCBI BLAST P – what are the conserved domains that are detected?
• Click on each to find the Pfam entries
• Show domain relatives (CDART)
• (The next two images show results for gag-pol and env proteins – try both)
• Path is from CDD to CDART – explore!
Conserved Domain Databases
• NCBI contains a database of conserved domains. These are linked, by sequence to BLAST and other tools.
• Conserved domains represent “functional folds” in nature’s playbook.
• You can compare your sequence by alignment (Pfam) to other protein folds.
• Use CDART for graphical domain display.
exPASy Proteomics Tools
• http://us.expasy.org/tools/• Protein identification and characterization• DNA -> Protein• Similarity searches, pattern and profile searches• Post translational modifications• Topology prediction• Primary structure analysis• Secondary structure prediction• Tertiary structure• Sequence alignment• Biological text analysis
exPASy ScanProsite
• Go to exPASy ScanProsite
• http://www.expasy.ch/tools/scanprosite/
• Enter either HIV sequence (gag-pol or env) into the search box
• You can choose email data return here
• What are the post translational modifications? Click on the references.
PIR – Georgetown University
• Go to http://pir.georgetown.edu/
• Choose the iProClass database
http://pir.georgetown.edu/iproclass/
• Paste in the gag-pol sequence
• Look at the BLAST hits
• Try the links to domain display and pattern match. What do you see?
Pfam
• Go to The Pfam Home page at: http://www.sanger.ac.uk/Software/Pfam
• Choose Protein search
• Enter the HIV-1 gag-pol sequence
• The search may take 3 to 5 minutes
• The page return will show protein families and conserved domains
SMART
• Simple Modular Architecture Research Tool• Sequence analysis• Architecture analysis• Search with sequence or accession• Don’t forget to check a database:
– Pfam– Signal peptides– Internal peptides
NCBI Structure Tools
• http://www.ncbi.nlm.nih.gov/Structure/
• Modeling to for the MMDB and PDB
• MMDB – Molecular Modeling Data Base
• PDB – Protein Data Bank
• Search by keyword (HIV-1 or gag-pol)
• Follow links in and out of MMDB / PDB
• RasMol, Chime, Cn3D structure viewers
MMDB
• Molecular Modeling Database• http://www.ncbi.nlm.nih.gov/Structure/
MMDB/mmdb.shtml• Contains weekly updates from PDB• “The structure database is considerably
smaller than Entrez's protein or nucleotide databases, but a large fraction of all known protein sequences have homologs in this set”
Cn3D Structure Viewer
• Structure viewer– PC, Unix / Linux, Mac OSX etc.
• Helper application– Structure view / sequence view– Can align and show multiple sequences
• Has a great online tutorial– (read carefully) and try it out!
• Exports files as PNG for great presos too!
VAST and VAST Search
• Vector Alignment Search Tool• VAST Search is a service that allows
searching for structural neighbors starting with a set of 3D-coordinates specified by the user.
• Type in a structure code (PDB) view similar alignments and click to import.
• (Start by BLASTing the PDB database).
CDD and CDD Search
• http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
• Enter a sequence, accession number, or search by keyword
• CDD is linked from BLAST so you may enter it while doing sequence analysis
• Where there are CDDs there often is homology – or close cousins (UniGene)
The Protein Machine
• http://www2.ebi.ac.uk/translate/ • For translating nucleotide sequences
into protein in three different modes– You can choose the sense strand or
complement or any reading frame– You can start and end at any position– You can select any translation table
• Or enter an accession number
Top Related