BITS: UCSC genome browser - Part 1
description
Transcript of BITS: UCSC genome browser - Part 1
Paco Hulpiau
UCSCgenome browsing
http://www.bits.vib.be
Introduction
§ Browse genes in their genomic context
§ See features in and around a specific gene
§ Investigate genome organization and explore larger
chromosome regions
§ Search and retrieve information on a gene- and
genome-scale
§ Compare genomes
Introduction
§ Collaboration between main genome browsers
Ensembl, UCSC and NCBI
» use same genome assemblies
» interlinking between sites
§ Ensembl Genome Browser: http://www.ensembl.org/
§ NCBI Map Viewer: http://www.ncbi.nlm.nih.gov/mapview/
§ UCSC Genome Browser: http://genome.ucsc.edu/
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
§ Collaboration between main genome browsers
Ensembl, UCSC and NCBI
» use same genome assemblies
» interlinking between sites
§ Ensembl Genome Browser: http://www.ensembl.org/
§ NCBI Map Viewer: http://www.ncbi.nlm.nih.gov/mapview/
§ UCSC Genome Browser: http://genome.ucsc.edu/
Introduction
Introduction
Introduction
§ Collaboration between main genome browsers
Ensembl, UCSC and NCBI
» use same genome assemblies
» interlinking between sites
§ Ensembl Genome Browser: http://www.ensembl.org/
§ NCBI Map Viewer: http://www.ncbi.nlm.nih.gov/mapview/
§ UCSC Genome Browser: http://genome.ucsc.edu/
Introduction
Introduction
Introduction
Introduction
§ Other genome browsers and genome databases:
http://genome.jgi-psf.org Eukaryotic (143) and prokaryotic (505) genomes
http://www.xenbase.org Xenopus tropicalis
http://flybase.org Drosophila genes & genomes
http://www.wormbase.org C. elegans and some related nematodes
http://www.tigr.org => http://www.jcvi.org/ Comprehensive Microbial Resource (CMR) => http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi
http://genolist.pasteur.fr Microbial genomes
Introduction
Introduction
§ The UCSC Genome browser was created by the
Genome Bioinformatics Group
at the University of California Santa Cruz (UCSC).
http://genome.ucsc.edu/
§ The Genome Browser zooms and scrolls
over chromosomes, showing the work of
annotators worldwide.
§ Blat quickly maps your sequence to the genome.
BLAT is not BLAST !
BLAT works by keeping an index of the entire genome in memory.
The index consists of all non-overlapping DNA 11-mers or protein 4-mers.
The index is used to find areas of probable homology, which are then
loaded into memory for a detailed alignment.
BLAT on DNA can quickly find sequences of 95% and greater similarity
of length 40 bases or more.
BLAT on proteins finds sequences of 80% and greater similarity of length
20 amino acids or more.
§ The Table Browser provides convenient
access to the underlying database.
§ The Gene Sorter displays a sorted table of genes
that are related to one another.
The relationship can be one of several types, including protein-
level homology,
similarity of gene expression profiles,
or genomic proximity.
§ In-Silico PCR searches a sequence database with a pair of PCR
primers, using an indexing strategy for fast performance.
§ When successful, the search returns a file (fasta) containing all
sequences in the database that lie between and include the
primer pair.
§ Genome Graphs is a tool for displaying
genome-wide data sets such as the results
of genome-wide SNP association studies,
linkage studies and homozygosity mapping.
§ Galaxy allows you to do analyses you cannot do
anywhere else without the need to install or
download anything.
§ You can analyze multiple alignments, compare
genomic annotations and much more...
§ VisiGene lets you browse through a large
collection of in situ mouse and frog images.
§ The Proteome Browser provides a wealth of
protein information presented in the form of
graphical images of tracks and histograms
and links to other sites.
§ The Utilities page contains links to some tools
created by the UCSC Genome Bioinformatics Group.
§ DNA Duster & Protein Duster remove non-sequence
related characters from an input sequence.
§ The Utilities page contains links to some tools
created by the UCSC Genome Bioinformatics Group.
§ DNA Duster & Protein Duster remove non-sequence
related characters from an input sequence.
Clade – Genome - Assembly
GENOMEBROWSERDISPLAY
POSITIONCONTROL
TRACK CONTROL
Navigation: position control
Navigation: position control
§ Click the zoom in and zoom out buttons on top
to zoom in or out 1.5, 3 or 10-fold
on the center of the window
Navigation: position control
§ Zoom in 3-fold by clicking anywhere
on the base position track
§ Zoom to a specific region using “drag and zoom”
Navigation: position control
§ To scroll the view of the display horizontally
by set increments of 10%, 50% or 95%
of the displayed size (as given in base pairs)
click the corresponding move arrow
Navigation: position control
§ To scroll the left of right side by a specified number of
vertical gridlines while keeping the opposite side fixed
click the appropriate move start or move end
arrow
Navigation: position control
§ To display a (completely) different position
enter the new location in the position/search text
box
§ You can also jump to an other gene location
Annotation Tracks
TRACK CONTROL
HIDE = removes a track from view
FULL = each item on a separate line
DENSE = all items collapsed into single line
SQUISH = all items on several lines PACKED and at 50% height
PACK = each item separate and efficiently stacked (full height)
Annotation Tracks
Annotation Tracks
§ Different genome/assembly => different tracks!
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
§ Now try to change the tracks as follows
Annotation Tracks
§ and...
SQUISH
PACK
FULL
DENSE
SQUISH
UTR EXON
INTRON
direction of transcription
EXON
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Annotation Tracks
Browser graphics in PDF
TABLE BROWSER
GET DNA
CLICK LINE
CURRENT BROWSER GRAPHIC IN PDF
TO GET OTHER DATA
CURRENT BROWSER GRAPHIC IN PDF
TO GET OTHER DATA
1
Exercises (I)
1) Search for your gene of interest
on Human Feb. 2009 (GRCh37/hg19) Assembly
» Include 1000 base pairs up- and downstream
» Only show the tracks:
RefSeq Genes (pack)
Conservation (full, primates only)
» Save graphical view as PDF (exercises1_1)
Exercises (I)
2) How many transcripts are there?
» Compare UCSC Genes with RefSeq and Ensembl genes!
» Save graphical view as PDF (exercises1_2)
Exercises (I)
3) What are the flanking genes?
Are these conserved outside mammals?
» Zoom out until you can see at least
two or three flanking genes
(may need to hide some tracks, leave RefSeq on)
» Now have a look in the chicken genome
» Save graphical view as PDF
(exercises1_3a en exercises1_3b)
Exercises (I)
4) Is there any regulatory information available?
» Change the view to see the genomic region upstream
(exon 1 and ~2000 upstream) and open some regulatory tracks
e.g. ORegAnno, TFBS Conserved, TS miRNA sites
» Save graphical view as PDF (exercises1_4)