The GENOME structure, function & evolution structure, function & evolution.
-
date post
22-Dec-2015 -
Category
Documents
-
view
242 -
download
2
Transcript of The GENOME structure, function & evolution structure, function & evolution.
The GENOMEThe GENOME
structure, function &
evolution
structure, function &
evolution
DefinitionsDefinitions
From genome to cell biochemistry From genome to cell biochemistry
GENOMEGENOME
TRANSCRIPTOMETRANSCRIPTOME
PROTEOMEPROTEOME
Biochemistry of the cellBiochemistry of the cell
Full genome: Total amount of DNA of an organism
Cellular genome: Haploid DNA content of a cell of an organism
Transcription
Translation
Proteome activity
Full transccriptome: Total amount of RNA of an organism
Cellular transcriptome: Total amount of RNA of a cell of an organism
Full proteome: Total amount of proteins of an organsim
Cellular proteom: Total amount of proteins of a cell of an organsim
Metabolome, lipome, phosphorylome, methylome, etc.
Disciplines of genome biology Disciplines of genome biology
Metabolome, lipome, methylomephosphorylome, interactome, etc
Proteome activity
GENOMICSGENOMICS
TRANSCRIPTOMICSTRANSCRIPTOMICS
PROTEOMICSPROTEOMICS
Metabolomics, lipomics, Metabolomics, lipomics, methylomics, phosphorylomics…methylomics, phosphorylomics…
Structural genomics
Functional genomics
(Functional genomics)
(Functional genomics)
GENOMEGENOME
TRANSCRIPTOMETRANSCRIPTOME
PROTEOMEPROTEOME
Transcription
Translation
Remarks Remarks
- Structural genomics collects elements of the DNA, the full transcriptome and proteome, but does not deal with their functions
- The scope of functional genomics:
(A) change of transcriptome and proteome
(1) in different cell types (2) healthy/diseased tissues and cells (3) treated/untreated tissues and cells
(B) Interaction between the elements of transcriptome and proteome - interaction maps
Yeast protein interaction mapYeast protein interaction map
Each dots represents a protein, with connecting lines indicating interactions between pairs of proteins.
Red dots: essential proteins – an inactivating mutation is lethalGreen dots: non-essential proteins - mutation is nonlethalOrange dots: non-essential proteins - mutation leads to slow growth
Yeast protein interaction map Yeast protein interaction map
Red: cell cycle Dark green: signalingDark blue: transcription, chromatin structurePink: protein and RNA transportOrange: RNA metabolism Light green: protein synthesis and turnoverBrown: cell polarityViolet: intermediate or energy metabolismLight blue: membrane biogenesis and/or traffic
Each oval represents a protein complex, with connections shown between complexes that share at least 1 protein.
Yeast protein interaction map
- the complete network
Yeast protein interaction map
- the complete network
Hubs: proteins with many interactions. A much larger number of protein has only few individual connections-This architecture is thought to minimize the effect on the proteome of mutations which might inactivate individual proteins
Yeast protein interaction map
- removal of party hubs
Yeast protein interaction map
- removal of party hubs
Party hubs: interact with all their partners simultaneously- their removal has little effect on the overall structure of the network
Yeast protein interaction map
- removal of date hubs
Yeast protein interaction map
- removal of date hubs
Date hubs: interact with different partners at different times- their removal breaks the network into a small subnetworks
Red dots: kinasesBlue dots: substratesGreen lines: connections
Phosphorylome of the yeastPhosphorylome of the yeast
Gene networkGene network
nucleus
mitochondrion
ER
cytoplasm
extracellular space
In a strict sense: only transcription factors are the gene network components
Structure and operation of the genome
Structure and operation of the genome
Nuclear genome Mitochondrial genome
Two genomes in a cellTwo genomes in a cell
16,569 nukleotid
The mitochondrial genomeThe mitochondrial genome
Mitochondrial
DNA strands
Mitochondrion: arose1.5x109 years ago from a purple bacterium sp.: endosymbiosis
Mammalian mitochondrion:
- most of genes has been lost or got to the chromosome - 13 polypeptide (all of the are the enzymes for oxydative phosphorylation) - 12S and 16S rRNA genes - 22 tRNA gene
Mitochondrial DNA: several thousands copies/cell
Deviations from the universal code:
codon amino acid normally amino acid in mitochondrion
UGA stop Trp (mammals, insects, yeast, fungi) AGA Arg stop (mammals, insects)
ACG Arg stop (mammals)
AUA Ile Met (mammals, insects, yeast)
CUN Leu Thr (yeast)
CGG Arg Trp (maize)
Remark: there are alterations in the genomial DNAs in some species, too (prokaryotes and eukariotes)
The mitochondrial genomeThe mitochondrial genome
The nuclear genomeThe nuclear genome
chromatin
Genome sizeGenome size
The ratio of noncoding sequences
The ratio of noncoding sequences
Prokaryotes unicellular plants/ protochordata human organisms fungi invertebrates vertebrates
%
The human genome projectThe human genome project
The human variom projectThe human variom project
Human genome: 2001: raw version (90%)2004: full version (99%)
The missing 1%: repetitive sequences near the centromere
Collection of variable sequences from different individuals - primary focus on medical application
Richard Cotton
Craig Venter
Bill Clinton
Francis Collins
Genome programsGenome programs
Relatives of human: chimp, orang utan
Model organisms of science: E. coli, yeast, C. elegans, fruit fly, arabidopsis, mouse
Pathogens and their vectors: viruses, bacteria, plasm. malariae + malaria mosquito
Agric. animals and plants: wheat, chick, cow, pig
Pets: dog
Others: archaebacteria, amoeba, wallabi kangaroo, etc.
Ascertaining the sequence of DNA is not enough to understand its operation!
Ascertaining the sequence of DNA is not enough to understand its operation!
Initiation of DNA synthesis „A” dideoxynucleotide
Template DNABase (A)
The dATP –OH group was changed to –H ddATP
For the synthesis dATP/ddATP mixture is added (less ddATP), therefore the synthesis stops at „T”s
DNA sequencing
DNA sequencing
Frederick Sanger
1.
2.
3.
1.
2.
3.
Synthesis is terminated upon incorporation
The different ddNTPs are labeled with distinct colors
detector
50 nucleotide
synthesis, then gel electrophoresis 10 nucleotide
DNA sequencing
DNA sequencing
Human genome Human genome
exons
LINEs
SINEs
LTR retrotransposonsDNA transposonsSimple repeats
Large duplications
miscellaneous heterochromatin
Miscellaneous unique sequences
introns
retrotransposons
– 3,2 GB (3,2 billion base pair)– 3,2 GB (3,2 billion base pair)
LINE: long interspersed nuclear elementsSINE: short interspersed nuclear elements
Coding sequencesGene-related sequences
Intergenic sequences
Non-coding RNA coding „genes”
48 MB 1152 MB 2000 MB
Gene-related sequenc. Intergenic sequences
1,5% 36% 62,5%
Coding sequences
Human genome Human genome
Gene-related sequences
Intergenic sequences
pseudo-genes
gene frag-ments
introns UTRs
1,5% 36% 62,5%
1,5% 24%
10,5%
Coding sequences
Human genome Human genome
othersRepeated
sequences
retroposonsDNA trans-
posonsSimple repeats
Large repeats
11%51,5%
2,8%41% 2,8% 5%
transposons
Human genome Human genome
Gene-related sequences
Intergenic sequences
pseudo-genes
gene frag-ments
introns UTRs
1,5% 36% 62,5%
1,5% 24%
10,5%
Coding sequences
leaderleader
E1E1
I1I1 E2E2 I2I2 trailertrailer
E3E3
AUG Stop
pre-mRNA
polyA signal5’-UTR 3’-UTR
Amino acid coding parts of exons
leaderleader
E1E1
E2E2 trailertrailer
E3E3
Coding sequencesAUG Stop
mRNA
polyA signal5’-UTR 3’-UTR
Coding sequencesCoding sequences
10-12,000 genes; the functions of the rest 10,000 genes are unknown!!
UTR: regulation of translation and half-life of mRNAs
Intron: 1. genetic junks2. it can contain regulatory elements3. in case of alternative splicing it can serve as an exon
Introns and UTRsIntrons and UTRs
Pseudogenes & gene fragmentsPseudogenes & gene fragments
Fossils in the genetic cemetery
2 types: 1. intron-containing: chromosomal segment duplication
2. intronless: reverse transcription, then reinsertion
Function: 1. In some cases regulation of the original gene by means of antisense interaction
2. Genetic junk
Genetic junksGene fragmentsGene fragments
PseudogenesPseudogenes
Gene-related sequences
Intergenic sequences
1,5% 36% 62,5%
OthersRepetative sequences
retroposonsDNA trans-
posonsSimplerepeats
Large repeats
11%51,5%
2,8%41% 2,8% 5%transposons
Coding sequences
Transposable elements in the human genomeTransposable elements in the human genome
class family copy number occurrance %
retr
otr
ansp
oso
ns
Transposable elementsTransposable elements
CP NC Pr RT RNaseHInt
gag pol env
LTR capsid nucleocapsid protease ribonuclease H envelope LTR
reverse transcriptase integrase
CP NC Pr RT RNaseHInt
gag pol
RT RNaseH
gag? pol
A B
OR
transposase
I. class
II. class
Endogenous retroviruses: all inactive
1%
SINEs
DNA transposons
LTR retro-transposons8%
polyA
polyA
LINEs
I. class: retotransposons
I/1. LTR transposons I/2. Non-LTR transposons II/21. LINEs II/22. SINEs
II. class: DNA transposons
3%
33%
IR IR
LTR LTR
IR: inverted repeat
7%
1%
retroposonsDNA trans-
posonsSimple repeats
Large repeats
2,8%41% 2,8% 5%
20% LINE 13% SINE 8%
LTR retrotransposons
Endogenous retroviruses (more than 20 families;
450,000 copies)
Non-LTR retrotransposons
(850,000 LINE, 1500,000 SINE)
„Copy and paste” „cut and paste”
TransposonsTransposons
degenerated virus genes
Derived from7S RNA „gene”
DNA transposons
Colonized the genome by horizontal gene transfer
Vector is unknown
LTR: long terminal regionLINE: long interspersed nuclear elementsSINE: short interspersed nuclear elements
Retrovirus infection
Retrovirus infectionenvelope
capsid
Virus RNA
Human endogenous retroviruses (HERVs)
& LTR-transposons
Human endogenous retroviruses (HERVs)
& LTR-transposons
gag: capsid (structural element)pol: polymerase: reverse transcriptase, integrase, protease, RNase H env: envelope (structural element)
LTR (long terminal repeat): promoter
LTR retrotransposons are compose of 8% of the genome, but only 1% of them has a structure similar to those of retroviruses, the others are degenerated. All of them are mutant: they are not able to form infective virions but, de some of them can move by the enzymes of other elements.
The genome of chimp and other monkeys contains infective retroviruses.
CP NC Pr RT RNaseHInt
gag pol env
LTR capsid nucleocapsid protease ribonuclease H envelope LTR
reverse transcriptase integrase
Retroviruses and their fossilsRetroviruses and their fossils
Solitary LTR
Wild type retroviruses
Human endogenous retroviruses
....... and their fossils
The effect of endogenous retroviruses on gene
expression
The effect of endogenous retroviruses on gene
expression
HERV: human endogenous retroviruses
a cellular gene
1. No effect
2. Transcription from the LTRl(HERV splice donor site can also be active)
3. The activity of LTR can be modulated
methylation
polymorphism
Cell-specific activation/inhibition
Non-LTR retrotrasposonsNon-LTR retrotrasposons
CP NC Pr RNaseHInt
gag pol
RTRNaseH
gag? pol
A B SINEs
LTR retrotransposons
polyA
polyA
LINEs: autonomous transposons
LTR LTR
Non-autonomous transposons
RT
LINE-okLINE-ok RTRNaseH
gag? polpolyA
- 21% of the human genome (850,000), 17% L1 (500,000), 10,000 full-length (6,1 kb), however, only 50-100 functional
- Some part of the rest can jump with the help of the enzymes of the intact ones.
- LINE mobilization both in germ line and somatic cells
LINE: long interspersed nuclear elementsSINE: short interspersed nuclear elementsIRES: internal ribosome entry site
ORF1 ORF2
IRESpromoter
RTRNaseHpolyA
DNA
RNAribosome
protease
A LINE-1 „propagation”
A LINE-1 „propagation”
copying
perfect 5’-deleted 5’-deletes
+ inverted
The effect of LINE-1 on the genome – formation of pseudogenes
The effect of LINE-1 on the genome – formation of pseudogenes
gene
Intronless pseudogene
insertion
to the exon
to the intron
*: stop codon
L1 mRNA
The effect of LINE-1 on the genome – gene inactivation
The effect of LINE-1 on the genome – gene inactivation
to the intron
The poly A signal of LINE is weak readthrough of adjacent gene exon
Gene „B”
An exon of gene „A”
Insertion of a piece of LINE and the exon of gene „A” to gene „B”
Or only the exon of gene „A”
mRNA
The effect of LINE-1 on the genome – transduction
The effect of LINE-1 on the genome – transduction
SINEsSINEs A BpolyA
- 13% of the genome, 11% Alu sequences; non-protein coding
- AluI restriction enzyme recognition site
- An average SINE repeat unit 100 - 400 bp (Alu: 300 bp: 280 bp + pol III promoter)
-More than 1 million copies, the most successful transposon in human
-Ancestor: SRP (signal recognition particle; ribonucleoprotein) RNA component (7SL RNA)
Alu domain S domain
The hyperparasite Alu sequences
The hyperparasite Alu sequences
DNA transposonsDNA transposons
transzposase
- Infection mechanism is not known, what could be the vector?
- transposase executes the jumping: „cut and space” mechanism – how do they multiply?
- More than 60 families: Charlie, mariner, Tigger, THE1, etc
- The mariner family resembles to the those of insects transposons: horizontal gene transfer?
IR IR
IR: inverted repeat
Defense by the hostDefense by the host
1. Heterochromatinization (methylation): inhibition of transcription
2. RNA interference: inhibition of transcription & translation
3. Local raise of mutation rate: inactivation
Benefits of the host from the transposons
Benefits of the host from the transposons
1. Variability of genes encoding antibodies and T cell receptors
2. Genome plasticity
Sleeping Beauty & Frog PrinceSleeping Beauty & Frog Prince
transgene
IR IR
transposasepromoter
Binary system
transgene
Reparated fish and. frog DNA transposons: gene therapy
Tandem repeatsTandem repeats
Microsatellites : small 1-5 base pairs repetitions: up to several hundreds repetitions
- CA/TG repeats 0.5% of the genome – no known function
- Trinucleotid repeats: CAA (Gln), ACA (ala): neurodegenerative diseases; dog transcription factors
genomial DNA
satellites
(Macro)satellites: 1 – several hundred kb repetitions: centromer and constitutive heterochromatin
- Telomer: 15 kb: TTAGGG hexamer – telomerase attaches to the ends of chromosomes
- Satellite 2 and 3: GGAAT
- Alpha satellite: 171 bp units
STR: short tandem repeats; VNTR: variable number of tandem repeat
Minisatellites (VNTR*, STR*): shorter than macrosatellites.
Genetic markers (paternal test, descent); related to several diseases, e.g. diabetes
Consecutive identical or close o identical (degenerated) repeat units
Variability in the length of: (1) repeat unit and (2) the whole repeat
1. Unrecognizably degraded transposons, pseudogenes
2. Regulatory regions: promoters, enhancers, silencers
3. others
Other intergenic sequences
Other intergenic sequences
A new RNA worldA new RNA world
1. The major part of the genome is transcriptionally active
- ncRNAs are encompassing 50x longer genomic region than genes
2. Antisense regulation
- trans-antisense RNAs (miRNAs) 1 miRNA – more gene; 1 gene – more miRNAs
- cis-atiszense RNAs: a huge amount of gene is overlapped by antisense transcripts
Novel functions of RNAs
- Traditional functions: transmission of information between DNA and proteins and other contributions to these processes
- New functions:● Independent carriers of informations (?)●Regulation of the manifestation of genetic information
A new RNA worldA new RNA world
1. Preparation of the chip
- printing
2. Collection of tissue samples
control treated
3. RNA purification
4. Reverse transcription (fluorescent labeling)
5. Hybridization 6. Detection
DNA chipDNA chip
Protein chipProtein chipcontrol treated
Protein purification
labeling
immunoreaction
detection
The Evolution of genomeThe Evolution of genome
Gene function does not change even across large evolutionary distancesGene function does not change even across large evolutionary distances
- many homologous genes of mouse and fruit fly are interchangable- many homologous genes of mouse and fruit fly are interchangable
Evolution alters gene expression and not gene functionEvolution alters gene expression and not gene function - in different species, the same genes are turned on at different time in different tissues and - in different species, the same genes are turned on at different time in different tissues and are expressed in different amountare expressed in different amount
Evolution of genetic regulationEvolution of genetic regulation
Expression of Expression of 11056 gene in the liver056 gene in the liver:
-------------------------
Expression of 12,000 gene in brainExpression of 12,000 gene in brain: in human 5,6x times higher expression level(human chimp)
rhesus macaco
orangutan
chimp
humanThe same gene expression
Expression of transcription factors differs
Duplication of „B” gene segment
A domain B domain C domain
A domain B domain B domain C domain
Exon/domain duplicationExon/domain duplication
Exon/domain shuffleExon/domain shuffle
A domain B domain C domain X domain Y domain
A domain B domain Y domain
Exon/domain shuffleExon/domain shuffle