[Bejerano Fall10/11] 1 Primer, Friday 10am, Beckman B-302 Ex. 1 is coming.

51
http://cs273a.stanford.edu [Bejerano Fall10/11] 1 Primer, Friday 10am, Beckman B-302 Ex. 1 is coming.
  • date post

    20-Jan-2016
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of [Bejerano Fall10/11] 1 Primer, Friday 10am, Beckman B-302 Ex. 1 is coming.

http://cs273a.stanford.edu [Bejerano Fall10/11] 1

Primer, Friday 10am, Beckman B-302

Ex. 1 is coming.

http://cs273a.stanford.edu [Bejerano Fall10/11] 2

Lecture 4

Our place in the tree of life

Genome Size

Genome Content:

Repetitive Sequences

Genes

http://cs273a.stanford.edu [Bejerano Fall10/11] 3

Our Place in the Tree of Life

[Human Molecular Genetics, 3rd Edition]

you are here

http://cs273a.stanford.edu [Bejerano Fall10/11] 4

Metazoans (multi-cellular organisms)

[Human Molecular Genetics, 3rd Edition]

you are here

http://cs273a.stanford.edu [Bejerano Fall10/11] 5

Vertebrates

[Human Molecular Genetics, 3rd Edition]

you are here

, Opossum

, Lizard

, Stickleback

Figure from Ryan Gregory (2005)

INTERSPECIES VARIATION IN GENOME SIZE WITHIN VARIOUS GROUPS OF ORGANISMS

6http://cs273a.stanford.edu [Bejerano Fall10/11]

http://cs273a.stanford.edu [Bejerano Fall10/11] 7

Meet Your Genome Continues

[Human Molecular Genetics, 3rd Edition]

http://cs273a.stanford.edu [Bejerano Fall10/11] 8

http://cs273a.stanford.edu [Bejerano Fall10/11] 9

Repeats / obile Elements ("selfish DNA")

HumanGenome:

3*109 letters1.5%

knownfunction >50%

junk

http://cs273a.stanford.edu [Bejerano Fall10/11] 10

[Adapted from Lunter]

http://cs273a.stanford.edu [Bejerano Fall10/11] 11

http://cs273a.stanford.edu [Bejerano Fall10/11] 12

TE composition and assortment vary among eukaryotic genomes

20%

40%

60%

80%

100%

Slim

e m

old

Budd

ing

yeas

t

Fiss

ion

yeas

tN

euro

spor

aAr

abid

opsi

sR

ice

Nem

atod

eD

roso

phila

Mos

quito

Fugu

Mou

seH

uman

DNA transposons

LTR Retro.

Non-LTR Retro.

Feschotte & Pritham 2006

13http://cs273a.stanford.edu [Bejerano Fall09/10]

http://cs273a.stanford.edu [Bejerano Fall10/11] 14

http://cs273a.stanford.edu [Bejerano Fall10/11] 15

http://cs273a.stanford.edu [Bejerano Fall10/11] 16

http://cs273a.stanford.edu [Bejerano Fall10/11] 17

http://cs273a.stanford.edu [Bejerano Fall10/11] 18

http://cs273a.stanford.edu [Bejerano Fall10/11] 19

http://cs273a.stanford.edu [Bejerano Fall10/11] 20

Assemby Challenges

http://cs273a.stanford.edu [Bejerano Fall10/11] 21

Inferring Phylogeny Using Repeats

[Nishihara et al, 2006]

http://cs273a.stanford.edu [Bejerano Fall10/11] 22

Functional elements from obile Elements

[Yass is a small town in New South Wales, Australia.]

Co-option event, probably due to favorable genomic context

[Bejerano et al., Nature 2006]

The amount of TE correlate positively with genome size

Pla

smod

ium

Slim

e m

old

Buddin

g y

east

Fiss

ion y

east

Neu

rosp

ora

Ara

bid

opsi

sBra

ssic

aRic

eM

aize

Nem

atod

e

Dro

sophila

Mos

quito

Sea

squirt

Zeb

rafish

Fugu

Mou

seHum

an

0

500

1000

1500

2000

2500

3000 Genomic DNA

TE DNA

Protein-codingDNA

Mb

Feschotte & Pritham 2006

23http://cs273a.stanford.edu [Bejerano Fall09/10]

TEs

Protein-coding genes

The proportion of protein-coding genes decreases with genome size, while the proportion of TEs increases with genome size

Gregory, Nat Rev Genet 2005 24

http://cs273a.stanford.edu [Bejerano Fall10/11] 25

Genome Size Variability

1pg = 978 Mb

http://cs273a.stanford.edu [Bejerano Fall10/11] 26

Simple Repeats

•Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome.

•These are called microsatellites,Longer repeating units are called minisatellites,The real long ones are called satellites.

•Highly polymorphic in the human population.•Highly heterozygous in a single individual.•As a result microsatellites are used in paternity testing, forensics, and the inference of demographic processes.

•There is no clear definition of how many repetitions make a simple repeat, nor how imperfect the different copies can be.

•Highly variable between genomes: e.g., using the same search criteria the mouse & rat genomes have 2-3 times more microsatellites than the human genome. They’re also longer in mouse & rat.

http://cs273a.stanford.edu [Bejerano Fall10/11] 27

http://cs273a.stanford.edu [Bejerano Fall10/11] 28

http://cs273a.stanford.edu [Bejerano Fall10/11] 29

Restriction enzymes recognize and make a cut within specific

palindromic sequences, known as restriction sites, in the DNA. This

is usually a 4- or 6 base pair sequence.

blunt end

sticky end

30http://cs273a.stanford.edu [Bejerano Fall10/11]

DNA Fingerprint BasicsDNA Fingerprint Basics

DNA fragments of different size will be produced by a restriction enzyme that cuts at the points shown by the arrows.

3131

DNA fragments are then separated DNA fragments are then separated based on size using gel based on size using gel

electrophoresiselectrophoresis..

3232

DNA Fingerprinting can be DNA Fingerprinting can be used in paternity testing or used in paternity testing or

murder cases.murder cases.

3333

http://cs273a.stanford.edu [Bejerano Fall10/11] 34

http://cs273a.stanford.edu [Bejerano Fall10/11] 35

From an evolutionary point of view transposons and simple repeats are very different.

Different instances of the same transposon share common ancestry (but not necessarily a direct common progenitor).

Different instances of the same simple repeat most often do not.

http://cs273a.stanford.edu [Bejerano Fall10/11] 36

The Gene-ome makes < 2% of the H.G.

[Human Molecular Genetics, 3rd Edition]

37

Gene Structure

Signal – a string of DNA recognized by the cellular machinery

http://cs273a.stanford.edu [Bejerano Fall10/11]

Gene Processing

Eukaryotic Gene Structure

38http://cs273a.stanford.edu [Bejerano Fall10/11]

http://cs273a.stanford.edu [Bejerano Fall10/11] 39

Gene Finding – The PracticeChallenge:

“The genes, the whole genes, and nothing but the genes”

Problems:

spliced ESTs legitimate gene isoform?

predicting gene isoforms

tissue/condition-specific genes / gene isoforms

single exon genes

pseudogenes

Practice:

http://cs273a.stanford.edu [Bejerano Fall10/11] 40

Evolution of Gene Finding Tools

1996

Procrustes

Ab-initio Alignment-based

Comparative Genomics

Informant HMM-based

Pair-HMM Phylo-HMM

Genie

DNA Protein

GenieESTExoFish

Rosetta

Slam

DoubleScan

Siepel-Haussler

Jojic-Haussler

1996

2004

2000

2002

Twinscan2001

1982

Genscan1997

GenieESTHOM2000

cDNA, Protein

intrinsic extrinsichybrid

etc

http://cs273a.stanford.edu [Bejerano Fall10/11] 41

The Human Gene Set

[HGC, 2001]

http://cs273a.stanford.edu [Bejerano Fall10/11] 42

[Celera, 2001]

http://cs273a.stanford.edu [Bejerano Fall10/11] 43

wrong!

http://cs273a.stanford.edu [Bejerano Fall10/11] 44

Signal Transduction

http://cs273a.stanford.edu [Bejerano Fall10/11] 45

Ancient Origins of Important Gene Families

46

Multigene families due to:

Single gene duplication Segment duplication: Tandem duplication or

duplication transposition

a b c d e f g

a b c d e f b c d g

Horizontal gene transfer Genome-wide doubling event

http://cs273a.stanford.edu [Bejerano Fall10/11] 47

Horizontal Gene Transfer

http://cs273a.stanford.edu [Bejerano Fall10/11] 48

Horizontal Gene Transfer in the H.G.

[HGC, 2001]

http://cs273a.stanford.edu [Bejerano Fall10/11] 49

Or is it?

[Kurland et al., 2003]

http://cs273a.stanford.edu [Bejerano Fall10/11] 50

HGT between fish & their parasites

http://cs273a.stanford.edu [Bejerano Fall10/11] 51

Retroposed Genes and Pseudogenes