Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. ·...

19
Database exercises -- Some examples of the first exercise (FA9) -- Advanced searching (Mouse, exons, introns) -- HIV questions (genomes, proteins, structures) Q1 How many exons and introns are there in the Factor ix gene? Exon: number=1 1. Count the number of exons and introns, or look at the last exon/intron and their number. #introns = #exons - 1 2. Look at the CDS and count the exons: join(aa..bb,cc..dd). The introns are the commas ‘,’ . ? Problem with last solution is that the UTR might contain >1 exon, so you must check end of exon1 and the start of CDS! (2966..3082 - 2995..3082) (UTR is 2966-2994. CDS start at 2995.) CDS: Protein coding parts of exons. Intron: number=1

Transcript of Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. ·...

Page 1: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Database exercises-- Some examples of the first exercise (FA9)-- Advanced searching (Mouse, exons, introns)-- HIV questions (genomes, proteins, structures)

Q1 How many exons and intronsare there in the Factor ix gene?

Exon: number=1

1. Count the number of exonsand introns, or look at the last exon/intron and their number.#introns = #exons - 1

2. Look at the CDS and count the exons: join(aa..bb,cc..dd). The introns are the commas ‘,’ .

? Problem with last solution is that the UTR might contain >1 exon, so you must check end of exon1 and the start of CDS! (2966..3082 - 2995..3082)(UTR is 2966-2994. CDS start at 2995.)

CDS: Protein coding parts of exons.

Intron: number=1

Page 2: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

SRSentryFA9_PIG

Links from image to display feature type in text area above.

Links to databases looked at in this exercise.

Q4 What proteins other than Factor ix related to coagulation has the PROSITEtrypsin_sermotif?

Page 3: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Q4 What proteins other than FA9 related to coagulation has the PROSITE trypsin motif?

SRS page is for TRYPSIN_SER(that FA7_PIG had). Listed are all proteins in Swissprot that have the motif (pattern).

Swissprot names are made up of short abbrivated protein name and an abbreviated organism name.

Coagulation factor proteins are named FA*_ORGwhere * is 7,9,11 etcORG is organism.

A PROSITE pattern exist or not.Pfam is different!

Page 4: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Extra molecule.

Extra atom (Calcium ion).

P-Amino Benzamidine

Human Coagulation Factor Ixa In Complex With ...

2 protein chains

Page 5: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Advanced search

Write the expression to retrieve -- mouse nucleotide sequences that have been-- introduced in the database during the year 2000 and that have -- feature table information on exons and introns.

mouse[ORGN] AND exon[FKEY] AND intron[FKEY] AND 2000[PDAT]Items 1-20 of 208mouse[ORGN] AND exon [FKEY] AND intron [FKEY] AND 2000[MDAT]Items 1-20 of 286mouse[ORGN] AND (exon [FKEY] AND intron [FKEY]) AND 2000[MDAT]Items 1-20 of 286mouse[ORGN] AND (exon [FKEY] intron [FKEY]) AND 2000[MDAT]Items 1-20 of 286

mouse[ORGN] AND (exon AND intron)[FKEY] AND 2000[MDAT]Items 1-20 of 344 -- "Field text [FKEY] without text, ignored."mouse[ORGN] AND (exon intron) AND 2000[MDAT]Items 1-20 of 344mouse[ORGN] AND exon AND intron AND 2000[MDAT]Items 1-20 of 344mouse[ORGN] AND exon [FT] AND intron [FT] AND 2000[MDAT]Field is not supported in this database. ((("Mus musculus"[Organism] AND exon[All Fields]) AND intron[All Fields]) AND 2000[MDAT])

mouse[ORGN] AND exons AND introns AND 2000[MDAT]Items 1-6 of 6mouse[ORGN] AND exons[FKEY] AND introns[FKEY] AND 2000[MDAT]No items found

Most students used MDAT

Page 6: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...
Page 7: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Remember: There are Help pages and files available!NCBI Entrez Help

Example: NCBI Entrez Limits

Page 8: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Introns (on the other strand)

Exons shown as thick lines (blocks), intronsshown as thin line with arrows for direction.

Four first exonsmatched in two CanismRNA.

Sequence from beginning of gene used to find gene in genome by BLAT search.

Page 9: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

GAaca in genome at exon 3 end

AGaca in mRNA

New “exon” because the different mRNA subsequence match later in intronic region!

Extra: One of the mRNA that match to the genome has an extra exon. Is this interesting?

Page 10: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

HIV proteins in PDB:

Search in Entrez Protein with limits.

1 HIV-1[ORGN]2 protein name (tat, gag ...)3 limited to PDB

Note: polyproteins show up with their mature products, for instance Reverse transcriptase is a product of Pol polyprotein.

Example:If we use all protein names we get 167 hits in PDB.But some of these are of HIV antibodies etc. Use “NOT” to get rid of those ...

Page 11: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Most HIV proteins have few PDBentries, but Pol (products) have many.

Examples:Tat – 3 structuresPol – 115 structs., but no IntegraseVif has none.

HIV-1[ORGN] tat gives 1958Limited to PDB gives 3

Page 12: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Another way:

Tat protein (Swissprotaccession) search in Pfam gives links to PDB.

Page 13: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

HIV

Q What List all the genes encoded by the HIV genome.

Q How many isolates of HIV can you find in the nucleotide database representing fully sequenced genomes

Nucleotide: HIV-1 [ORGN] isolate complete genome NOT papilloma NOT partial NOT proviral NOT clone ... >600 ? 300 (Look through list ... ) Genome DB: HIV [ORGN] ? 3

Q Some HIV genes encode polyproteins that are proteolytically cleaved to generate the mature proteins. What are these proteins and their mature products?

Q What HIV proteins have been studied by X-ray crystallography or NMR?

Q Two different HIV proteins regulate gene expression by interacting with a piece of retroviral RNA. What are these two proteins? Mention one PDB entry for each protein category that reveal the structure of the protein in complex with RNA.

1. Search for which proteins in Pubmed, Books and on the web.2. Search with these protein names in Entrez Protein and PDB.

Page 14: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...
Page 15: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Genes are grey.Proteins are green.

Why are there one Gag-Pol and one Gag?In 10% the ribosomes“slips” and transcribes Gag+Pol fusion polyprotein.The “fusion protein” is then cleaved.

But Gag and Pol are also polyproteins so they are both cleaved to get the mature proteins (products).Most of the products may be seen in this graphical display.

Gag: matrix, capsid, nucleocapsid + small proteins

Page 16: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Click on Protein coding genes in the display and you get a list:

gene 166..1656/gene="gag"

CDS 166..1656/gene="gag"/product="gag protein"

gene <1455..4460/gene="pol"

CDS <1455..4460/gene="pol"/product="pol protein"

gene 4405..4992/gene="vif"

CDS 4405..4992/gene="vif"/product="vif protein"

gene 4923..5213/gene="vpr"

CDS 4923..5213/gene="vpr"/product="vpr protein"

Or look in a HIV entry:

Page 17: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Advanced search at PDB.

At PDB one may search with the used techniques.

Page 18: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Rev bound to RNA entry in PDB at NCBI

Page 19: Bioinfo1 ht03 dbsummarybio.lundberg.gu.se/courses/ht03/bio1/Bioinfo1_ht03... · 2011. 8. 8. · Bioinfo1_ht03_dbsummary.PDF Author: MagnusAR Created Date: 10/6/2003 10:37:37 AM ...

Tat and Rev in complex with RNA

PDB search with

Contains Chain Type: Protein RNA

Text Search: HIV rev

Tat had no structure, which is not surprising since we only found three Tat structures in PDB and none were in complex with RNA.