[email protected]€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins,...

118
Common tools, useful databases, and tricks of the trade. Bioinformatics bioteach.ubc.ca/bioinfo2008 [email protected] 1

Transcript of [email protected]€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins,...

Page 1: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Common tools, useful databases, and tricks of the trade.

Bioinformatics

bioteach.ubc.ca/bioinfo2008

[email protected]

1

Page 2: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Workshop Schedule

• Laptops, available here for your use 9am - 4:30pm

• wireless login

mslguest

4myguest

• Vancouver guide books available

2

Page 3: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Today’s Topics

• BLAST - Finding Function by Sequence Similarity

• GUIDED TOUR - Advanced Tips & Tricks for Using BLAST

• PRACTICAL EXERCISES - The Jurassic Park Detective Story

• Genome Browsers - Accessing Genome Annotations

• PRACTICAL EXERCISES - Three different views of the BRCA1 gene

3

Page 4: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Finding Function By Sequence Similarity

BLAST

4

Page 5: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Concepts of Sequence Similarity Searching

• The premise:

One sequence by itself is not informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function.

5

Page 6: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

The BLAST algorithm

• The BLAST programs (Basic Local Alignment Search Tools) are a set of sequence comparison algorithms introduced in 1990 that are used to search sequence databases for optimal local alignments to a query.

• Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) “Basic local alignment search tool.” J. Mol. Biol. 215:403-410.

• Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” NAR 25:3389-3402.

6

Page 7: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST

db

BLAST

server

ASN.1

BLAST

Web Page

fetch sequencefetch ASN.1

Display Results

Submit Query

Request Results

Return Formatted Results

7

Page 8: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

What BLAST tells you ...

• BLAST reports surprising alignments

- Different than chance

• Assumptions

- Random sequences

- Constant composition

• Conclusions

- Surprising similarities imply evolutionary homology

Evolutionary Homology: descent from a common ancestor Does not always imply similar function

8

Page 9: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Basic Local Alignment Search Tool

• Widely used similarity search tool

• Heuristic approach based on Smith Waterman algorithm

• Finds best local alignments

• Provides statistical significance

• www, standalone, and network clients

9

Page 10: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

blastp Compares an amino acid query sequence against a protein sequence database.

blastn Compares a nucleotide query sequence against a nucleotide sequence database.

blastxCompares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown

nucleotide sequence.

tblastn Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.

tblastxCompares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide

sequence database.

BLAST programs

10

Page 11: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

more BLAST programs

MegablastContiguous Nearly identical sequences

Discontiguous Cross-species comparison

Position Specific

PSI-BLAST Automatically generates a position specific score matrix (PSSM)

RPS-BLAST Searches a database of PSI-BLAST PSSMs

nucleotide only

protein only11

Page 12: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST Algorithm

• Scoring of matches done using scoring matrices

• Sequences are split into words (default n=3)

• Speed, computational efficiency

• BLAST algorithm extends the initial “seed” hit into an HSP

• HSP = high scoring segment pair = Local optimal alignment

12

Page 13: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Sequence Similarity Searching – The statistics are important

Discriminating between real and artifactual matches is done using an estimate of probability that the match might

occur by chance.

We’ll talk more about the meaning of the scores (S) and e-values (E) that are associated with BLAST hits

13

Page 14: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Where does the score (S) come from?

• The quality of each pair-wise alignment is represented as a score and the scores are ranked.

• Scoring matrices are used to calculate the score of the alignment base by base (DNA) or amino acid by amino acid (protein).

• The alignment score will be the sum of the scores for each position.

14

Page 15: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

What’s a scoring matrix?

• Substitution matrices are used for amino acid alignments.

• each possible residue substitution is given a score

• A simpler unitary matrix is used for DNA pairs (+1 for match, -2 mismatch)

15

6

Page 16: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

16

Page 17: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLOSUM vs PAM

• BLOSUM 62 is the default matrix in BLAST 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives may be more sensitive with a different matrix.

BLOSUM 45 BLOSUM 62 BLOSUM 90

PAM 250 PAM 160 PAM 100

More Divergent Less Divergent

17

Page 18: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

What do the Score and the e-value really mean?

• The quality of the alignment is represented by the Score (S).The score of an alignment is calculated as the sum of substitution and gap scores. Substitution scores are given by a look-up table (PAM, BLOSUM) whereas gap scores are assigned empirically .

• The significance of each alignment is computed as an E value (E).Expectation value. The number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score.

18

Page 19: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Notes on E-values• Low E-values suggest that sequences are

homologous

Can’t show non-homology

• Statistical significance depends on both the size of the alignments and the size of the sequence database

Important consideration for comparing results across different searches

E-value increases as database gets bigger

E-value decreases as alignments get longer19

Page 20: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Homology: Some Guidelines

• Similarity can be indicative of homology

• Generally, if two sequences are significantly similar over entire length they are likely homologous

• Low complexity regions can be highly similar without being homologous

• Homologous sequences not always highly similar

20

Page 21: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Suggested BLAST Cutoffs

• Source: Chapter 11 – Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins

• For nucleotide based searches, one should look for hits with E-values of 10-6 or less and sequence identity of 70% or more

• For protein based searches, one should look for hits with E-values of 10-3 or less and sequence identity of 25% or more

Take Home Message:

Always look at your alignments

21

Page 22: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST Algorithm

• Scoring of matches done using scoring matrices

• Sequences are split into words (default n=3)

- Speed, computational efficiency

• BLAST algorithm extends the initial “seed” hit into an HSP

- HSP = high scoring segment pair = Local optimal alignment

22

Page 23: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

How Does BLAST Really Work?

• The BLAST programs improved the overall speed of searches while retaining good sensitivity (important as databases continue to grow) by breaking the query and database sequences into fragments ("words"), and initially seeking matches between fragments.

• Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S".

23

Page 24: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST Algorithm

24

Page 25: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

How Does BLAST Really Work?

• The BLAST programs improved the overall speed of searches while retaining good sensitivity (important as databases continue to grow) by breaking the query and database sequences into fragments ("words"), and initially seeking matches between fragments.

• Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S".

25

Page 26: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST Algorithm

26

Page 27: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Extending the High Scoring Segment Pair (HSP)

Minimum Score (S)

Neighborhood Score Threshold (T)

27

Page 28: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

28

Page 29: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST Algorithm

• Scoring of matches done using scoring matrices

• Sequences are split into words (default n=3)

- Speed, computational efficiency

• BLAST algorithm extends the initial “seed” hit into an HSP

- HSP = high scoring segment pair = Local optimal alignment

29

Page 30: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Credits

• Materials for this presentation have been adapted from the following sources:

NCBI HelpDesk - Field Guide Course Materials

Bioinformatics: A practical guide to the analysis of genes and proteins

• Questions? Please contact:

Dr. Joanne FoxMichael Smith [email protected]

30

Page 31: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

31

Page 32: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

GUIDED TOUR: Advanced Tips & Tricks for Using BLAST

BLAST

32

Page 33: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

http://www.ncbi.nlm.nih.gov/BLAST/

33

Page 34: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

New BLAST homepage

34

Page 35: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST

db

BLAST

server

ASN.1

BLAST

Web Page

fetch sequencefetch ASN.1

Display Results

Submit Query

Request Results

Return Formatted Results

35

Page 36: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Consider your research question ...

• Are you looking for an particular gene in a particular species?

• Are you looking for additional members of a protein family across all species?

• Are you looking to annotate genes in your species of interest?

36

Page 37: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Know your reagents

• Changing your choice of database is changing your search space

• Database size affects the BLAST statistics

• Databases change rapidly and are updated frequently

37

Page 38: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Protein Databases: nr

• nr (non-redundant protein sequences)

- GenBank CDS translations

- NP_ RefSeqs

- Outside Protein

- PIR, Swiss-Prot, PRF

- PDB (sequences from structures)

• pat protein patents

• env_nr environmental samples

Servicesblastpblastx

38

Page 39: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Nucleotide Databases: Human and Mouse

• Human and mouse genomic + transcript default

• Separate sections in output for mRNA and genomic

• Direct links to Map Viewer for genomic sequences

Megablast, blastn service39

Page 40: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Nucleotide Databases: Traditional

Servicesblastntblastntblastx

40

Page 41: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Nucleotide Databases: Traditional

• nr (nt)

- Traditional GenBank

- NM_ and XM_ RefSeqs

• refseq_rna

• refseq_genomic

- NC_ RefSeqs

• dbest

- EST Division

• est_human, mouse, others

• htgs

- HTG division

• gss

- GSS division

• wgs

- whole genome shotgun

• env_nt

- environmental samples

Databases are mostly non-overlapping41

Page 42: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

http://www.ncbi.nlm.nih.gov/BLAST/

Program SelectionGuide

42

Page 43: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

43

Page 44: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

44

Page 45: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

45

Page 46: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

46

Page 47: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

231571

47

Page 48: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Context Specific Help

48

Page 49: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Limiting Database: Organism

Organism autocomplete49

Page 50: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Limiting Database: Entrez Query

all[filter] NOT mammals[organism]

gene_in_mitochondrion[Properties]2006:2007 [Modification Date]

Nucleotidebiomol_mrna[Properties]biomol_genomic[Properties]

50

Page 51: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

51

Page 52: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Adjust to set stringency

May limit results

Default statistics adjustmentfor compositional bias

Off now by default. Conflicts withcomp-based stats

Expand

Algorithm parameters: Protein

52

Page 53: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Automatic Short Sequence Adjustment

e-value 20000Word Size 2Matrix PAM30Comp Stats OffLow Comp Filter Off

53

Page 54: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

54

Page 55: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

55

Page 56: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

A graphical view

56

Page 57: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

The BLAST hit list

57

Page 58: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST Alignments

Identical match

positive score(conservative)

Negative or zero

gap

58

Page 59: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST Alignments

59

Page 60: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

• SimilarityThe extent to which nucleotide or protein sequences are related. The extent of similarity between two sequences can be based on percent sequence identity and/or conservation. In BLAST similarity refers to a positive matrix score.

• Identity

The extent to which two (nucleotide or amino acid) sequences are invariant.

• Homology

Similarity attributed to descent from a common ancestor.

It is your responsibility as an informed bioinformatician to use these terms correctly: A sequence is either homologous or not. Don’t use % with this term!

60

Page 61: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

BLAST statistics to record in your bioinformatics labbook

It can be helpful to record the statistics that are found at

bottom of your BLAST results

61

Page 62: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Sorting BLAST by Taxonomy

62

Page 63: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

63

Page 64: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Nucleotide BLAST

64

Page 65: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Algorithm parameters: Nucleotide

•Masks species-specific interspersed repeats•Essential for genomic query sequences

•Prevents starting alignment in masked region•Allows extensions through masked regions

Masks LC sequence (simple repeats)

65

Page 66: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

nt BLAST: New Output

AB168636

66

Page 67: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Sortable Results

Pseudogene on Chromosome 9 Functional Gene on

Chromosome 1

Separate Sections for Transcript

and Genome

67

Page 68: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Total Score: All Segments

Functional Gene Now First

68

Page 69: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Default Sorting Order: ScoreLongest exon usually first

Query start position

Exon order

Sorting in Exon Order

69

Page 70: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

70

Page 71: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Links to Map Viewer

Chromosome 1 Chromosome 971

Page 72: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Recent and Saved Strategies

Login to My NCBI to

save search strategies

72

Page 73: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Genomic and Specialized BLAST pages

73

Page 74: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Service Addresses

•General Help [email protected]

•BLAST [email protected]

Telephone support: 301- 496- 2475

74

Page 75: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

75

Page 76: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

PRACTICAL EXERCISE: The Jurassic Park Detective Story

BLAST

76

Page 77: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

navigate to:bioteach.ubc.ca/bioinfo2008

Get the sequences from the webpage and carry out BLASTsearches

Search #1: Jurassic Park sequence

use blastn

Let’s compareour results

Search #2: The Lost World sequence

use blastx

Can you identify the Dinosaur sequences?

77

Page 78: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Try some BLAST searches with your own sequence of interest…

Explore what happens when you change advanced parameters…

78

Page 79: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Search #1 - blastn against nr

• Most common use of blastn

Sequence identification

Establish whether an exact match for a sequence is already present in the database

79

Page 80: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

80

Page 81: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Search #2 - blastx against nr

• Translating BLAST programs (blastx, tblastn, tblastx)

Look for similar proteins

Identify potential homologs in other species

81

Page 82: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Mark was here, NIH

82

Page 83: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Credits

• Materials for this presentation have been adapted with permission from the following NCBI HelpDesk course materials:

Field Guide Course Materials

Advanced Workshop for Bioinformatics Information Specialists

• NCBI BLAST

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

83

Page 84: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Accessing Genome Annotations & PRACTICAL EXERCISE: Three Different

Views of the BRCA1 Gene

Genome Browsers

84

Page 85: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

February 2001: Completion of the Draft Human GenomePublic HGP Celera Genomics

April 14, 2003:

The Human Genome is completed – agai

n!October 2004 - present:

The Human Genome is now really finished!

The Human Genome Project

85

Page 86: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

86

Page 87: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Technology

87

Page 88: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

What is Bioinformatics?

88

Page 89: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

89

Page 90: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

maps.google.ca

90

Page 91: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Let’s Look at the Human Genome...

91

Page 92: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Objectives• By the end of this module:

You will be able to describe the following concepts: genome annotation, genome builds, and genome browsers.

You will view the genomic location that contains the BRCA1 gene in the human genome using three different genome browsers.

You will be able to compare and contrast the UCSC, Ensembl and MapViewer systems for visualizing genome information.

92

Page 93: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Genome Browsers• What is a Genome Browser?

- System for displaying, viewing, and accessing genome annotation data

• Genome annotations = knowledge attached to raw genome sequence.

- Annotation information comes from many different sources

Computational pipelines

Research groups

Databases 93

Page 94: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

The “Neopolitan Ice Cream” World of Genome Browsing:

• UCSC Genome Browser

http://genome.cse.ucsc.edu/

• Ensembl

http://www.ensembl.org/

• NCBI Map Viewer

http://www.ncbi.nlm.nih.gov/mapview/

94

Page 95: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

The underlying data is common for all three “flavors” of Genome

Browsers.

95

Page 96: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

• NCBI, UCSC and Ensembl use the same human genome assembly that is generated by NCBI

- release timing is different between sites.

• Note the version of genome assembly to which you are referring

- available precomputed info and locations of features will be different between different assemblies.

96

Page 97: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Let’s compare the view of the BRCA1 gene in all

three genome browsers.

97

Page 98: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Viewing the genomic region containing BRCA1

• Common features:

Coordinate system is based on the build

Zoom in and out

Annotations displayed – ie. Gene features

• Major Differences:

Each Browser has a very different look and feel

Annotation information displayed differently

Different ways to navigate through the information

98

Page 99: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

http://genome.cse.ucsc.edu/

99

Click onGenome Browser

link

Page 100: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

100

Search for BRCA1;

Note samplequeries

Page 101: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

101

The Search Results

• Many BRCA1 isoforms

All located on chr 17

same chr coordinates

different gene structures

Page 102: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

102

Page 103: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

103

Two tasks

• What genes are on either side of BRCA1 on chr 17?

• Can you figure out how to download the genomic sequence for the BRCA1 region?

Page 104: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

104

Zoom inZoom out

DNA linkDownload Sequence

Page 105: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

105

http://www.ensembl.org/

Click on Human

Page 106: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

106

Page 107: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

107

Click on ENSG00000012048

Page 108: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

108

GeneView showsyou informationabout the gene

click here to view genomic

location

Page 109: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

109

Two tasks

• Using GeneView, can you figure out how many different alternatively spliced isoforms exist for BRCA1?

• Using ContigView, can you figure out how to download the genomic sequence for the BRCA1 region?

Page 110: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

110

GeneView showsyou information

about the transcripts

ExportView gives you access to sequence data

Page 111: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

111

http://www.ncbi.nlm.nih.gov/mapview/

Two builds of human;Note many genomes

available

Page 112: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

112

Page 113: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

113

Quick FilterGene

Page 114: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

114

Page 115: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

115

Page 116: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

116

Two tasks

• Can you figure out how to LinkOut to the OMIM and/or Homologene entries for BRCA1?

• Can you figure out how to download the genomic sequence for the BRCA1 region?

Page 117: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

117

LinkOut OMIM = diseasesv = sequence viewpr = protein recorddl = downloadhm = Homologene

Page 118: joanne@msl.ubc€¦ · 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives

Credits

• UCSC Genome Browser

http://genome.cse.ucsc.edu/

• Ensembl Genome Browser

http://www.ensembl.org/index.html

• NCBI MapViewer

http://www.ncbi.nlm.nih.gov/mapview/index.html

118