INTRODUCTION -...
Transcript of INTRODUCTION -...
Introduction
INTRODUCTION
Introduction
Introduction:
1.1. Etiology of Amebiasis and life cycle of Entamoeba histolytica:
Entamoeba histolytica is the etiological agent of amebiasis which ranks third in terms of
mortality and morbidity caused by protozoan diseases (WHO, 1997; Walsh, 1986). WHO
estimates about 40-50 million cases of amebic colitis and amebic liver abscess (ALA) and about
100,000 annual deaths for this disease. These figures underrepresent the actual burden of the
disease since preliminary diagnosis of the disease is still based on microscopy which fails to
identify between closely related species of Entamoeba and moreover reports false positives
(Walsh et aI., 1986, WHO, 1997; Petri W, 2000). Studies in Vietnam (Blessman J, 2003;
Blessman J, 2002) showed a prevalence rate of 11.2 % and a new infection rate of 4.1 % in the
population while studies in Dhaka have shown that 2.4 % of the male population is predisposed
to ALA, 10% or more children have diarrhoea due to E. histolytica infection and 4 % have ALA
(Haque R, 1997). It is widely believed that the global burden of E. histolytica is not precisely
known and available figures provide only a rough estimate of the prevalence of the disease.
E. histolytica is a dimorphic parasite. Human beings are the primary hosts for infection.
Contaminated food and water sources are reservoirs of infection. Ingestion of dormant E.
histolytica cysts present therein begins the infectious cycle. The cyst passes through the stomach
being protected from its harsh acidic environment by the resistant cyst wall. The cyst excysts in
the ileo-caecal region of the small intestine to release four to eight trophozoites which constitute
the invasive form of the parasite. Amebiasis is not a result of only infection by the parasite but is
rather an outcome of interactions between the pathogen, the local microbial flora and importantly
the host immune response. The pathophysiology of amebiasis essentially involves adherence,
colonization, cytolysis and tissue necrosis leading to the classical amebic dysentery. Extra
intestinal infections result from hematogenic dissemination of the parasite to different organs of
the body; primarily the liver where the organism produces necrotic lesions called amebic liver
abscesses. The pathophysiology of intestinal amebiasis is better understood as compared to that
of extra-intestinal infections. Though not completely elucidated as yet, pathophysiology of
intestinal amebiasis is believed to involve the following stages:
Adherence of the trophozoites to the colonic mucin via interaction of the D-galactose / N
acetyl-D-glucosamine (Gal / GaINAc) specific amebic lectin with host glycoconjugates marks
Introduction
the initiation of the amebic invasion (Petri, 1987; Chadee et aI., 1987). Other proteins that might
aid in adherence include the Serine rich E. histolytica protein (SREHP)(Stanley et aI., 1990).
Colonization and invasion into the colonic endothelium is initiated by the secretion of thiol
(cysteine) proteinase (EhCPs) which is a multigene family with at least 20 members (Bruchhaus
I, 2003). The cysteine proteinases degrade MUC2 the major component of the human colonic
mucin, causing proteolysis of the enteric cell villin and effacement of intestinal microvilli (Keene
W, 1986; Li E, 1995; Lauwaet T, 2003) and activate pre-interleukin-1J3 which leads to
inflammation of the colonic endothelium (Zhang Z, 2000). The colonization and adherence of
the parasite also leads to cytolysis of the colonic tissue via contact dependent killing (mediated
via cytolytic peptides called amebapores secreted by the ameba), induction of apoptosis and
acute inflammatory reaction.
Cytolytic disruption leads to deeper infiltration and horizontal dissemination of the
parasite leading to the formation of flask shaped ulcers characteristic of amebiasis. The parasite
may penetrate into the hepatic portal circulation via the capillary bed in the intestinal wall and
use this hematogenous route to disseminate to various organs, primarily the liver the reason for
which is unknown. The typical extra-intestinal infection is exemplified by necrotic lesion in the
hepatic tissue known as amebic liver abscesses.
Extra-intestinal infections are a dead end for the parasite. The parasite ensures
continuation of the life cycle via cyst formation. Under conditions which are poorly understood,
trophozoites convert into dormant cysts that can withstand harsh environmental conditions. The
cysts are shed into the environment along with feces. The life cycle is designated in Figure 1.
2
,Cyst
~ Extyst (small In!asline')
" ~
: C6mm~!1s~gr6\'i.th>(¢plon) ..•. '.''/''.:: ..... >.
<¥' :"':.'
Fe,p.?I-oral: ~read (iJ(!!.Cl or jfjfjire,ct
'MLJCQ:satvf¢~ r,ati~)n 'COF1tirnfed cO[l'imfH)$sl (qofpH) e.~i$t~O~1:r
...• /.,. ."(. ¥:C .. ~ .. ~ D)recle,\~l'1$fon 10 .
s~in
,Y CutanE!>OUS or;
p0A;l!~ai:~in~Pi<lsj$
:arainabscess L.tmg or pericardial abscess
Fig. 1. Flow chart depicting the life cycle of E. histolytica infection.
1.2. Pathogenic and non-pathogenic species of Entamoeba:
Introduction
Entamoeba species are found across different animal genera and occupy a variety of
niches. The common gut pathogens include E. histolytica in humans, E. invadens in reptiles, E.
poleckii in pigs and E. chattoni in non-human primates. E. dis par which is genetically closest to
E. histolytica is a commensal in the human gut. E. gingivalis colonizes the oral cavity and is
implicated in causing oral amebiasis in humans. E. moshkovskii which is morphologically
indistinguishable from E. histolytica was considered to be non-pathogenic and free living. Recent
studies however report the occurrence of this species in humans, though direct demonstration of
pathogenicity is yet lacking (Ali IK et aI., 2003). Molecular studies have established that E.
dispar is genetically closest to E. histolytica. Among the above species, E. histolytica, E.
hartmanni, E. dispar and E. invadens each produce quadrinucleate cysts while E. polecki, E.
chattoni produce a uninucleate cyst. E. gingivalis lacks a cyst stage.
3
Introduction
1.3. rganization of E. histolytica trophozoites and cysts.
istolytica trophozoites are 20-40 f-lm in diameter with an ameboid shape, are generally
ucleate but upto four nuclei may be present. The nuclear membrane is 120nm thick and has
n lear pores of 65nm diameter. The important nuclear proteins known include histones HI, H3
al., 1997; Binder2 et al., 1995; Fodinger, 1993). Chromosomal
Trophozoites are sluggishly motile and possess a well organized
cytoskeleton (Tavares et al., 2005). The functionally homologous important cytoskeletal proteins
identified in this organism include Actin (Meza I, 1983), EF -1 a (De Meester, 1991), EhABPH
(Ebert et ai, 2000), Myosin 1 b (Voigt et aI, 1999) and Profilin (Binderl et al., 1995).Trophozoites
lack typical eukaryotic cytosolic organelles though functional counterparts are believed to exist
in the form of cytosolic vacuoles which abound the cytosol of this organism. These include the
crypton I mitosome (Ghosh S, 2000; Tovar et al., 1999) which are believed to be double
membranous, vacuolar, Hsp60 associated, mitochondrion derived structures. A typical ER and
Golgi is absent, although ER specific and Golgi markers have been shown to exist.. The
cytoplasmic membrane is approximately 10nm thick and has an abundant surface coat
predominantly composed of glycoproteins and lipophosphoglycans. The composition of the
plasma membrane is believed to protect the trophozoites from the action of the amebapores
(Andra J et al., 2004). In addition, the plasma membrane has been shown to harbor cholesterol
rich lipid raft regions which are involved in pinocytosis and adhesion (Laughlin et al., 2004). The
proteophosphoglycan of E. histolytica is linked via GPI anchors to the cell membrane. The cell
surface harbors the 260kDa Gal IGalNAc lectin, which plays an important role in adhesion of the
trophozoites to the intestinal lining.
1.4. Metabolism in E. histolytica:
The parasite is microaerophillic and carbohydrate fermentation forms the major source of
energy. Enzymes of glycolysis have been identified and some of them are postulated to have
been acquired by horizontal gene transfer. Enzymes of the TCA cycle and mitochondrial electron
transport chain are however absent. Pyrophosphate is used as a phosphate donor instead of a
nucleoside triphosphate in many of these reactions. E. histolytica does not have lipid
biosynthesis pathways but can synthesize common phospholipids. Since the parasite lacks
4
Introduction
enzymes needed for de novo purme, pyrimidine and thymidylate synthesis, nucleotide
metabolism is primarily effected through the salvage pathway. Interestingly, ribonucleotide
reductase is also absent. Reactive oxygen radicals are removed through the use of superoxide
dismutase. Protein metabolism is poorly understood. Cysteine is an essential amino acid for
growth while proline and glutamate exist in high intracellular concentrations in the
trophozoites.An overview of E. histolytica metabolism is shown in Figure 2.
5
t ~"t~tre-sc~ne
I ntrod uction
doo:o.:!t1bcnudo a;;icG
p "*""'f d<::Tp..·" dUTP---+ dU ~,lP biil~fI' .... ··' dc~~,xvr:boE:(l·l f'
t
!1eel;Ji(~(\l'\ydl) , gi'(Cerai,jeh','d<?·3P
sulpr.atc
ptms Dhatidyh::hcfi no l i~~('ir{mJony~ PP".1 t
, gilicr.fGprIGEpt\')-
gGfBnyl·PP l
f<Jrt:i,18:,'·PP-- farnesyl·prot~in
Figure 2. Metabolic pathways in E. histolytica [adapted from Loftus 8, 2005]
- 6 -
Introduction
1.5. Organization of the Entamoeba genome:
Understanding the genome of E. histolytica has been traditionally hampered due
to lack of tools for genetic dissection in this parasite. The genome is highly A + T rich
(67% in the coding region and 72% in the intergenic region) and it is estimated that each
trophozoites contains about 0.24 pg of DNA (Dvorak, 1995). The organism is believed to
be tetraploid (Willhoeft U, 1999). and the total haploid genome size was calculated at 50
Mbp. Pulsed Field Gel Electrophoresis (PFGE) helped to establish that the genome of E.
histolytica HM-I :IMSS has 31-35 chromosomal bands of sizes ranging from 300 kb to
2200 Kb. The size distribution however changes with the isolate and the conditions used
for separation of the chromosomal bands (Bhattacharya A et aL, 2000; Willhoeft U and
Tannich E, 1999; Bagchi A, 2001; Clark CG, Cantellano ME and Bhattacharya A, 2000).
The interpretation is further complicated by chromosomal size polymorphism which
shapes the fluctuations in genome sizes of many other protozoan parasites. Molecular
analysis showed that there are 14 linkage groups (Willhoeft U and Tannich E, 1999).
Current understanding of the E. histolylica genome stems from the completion of
the E. histolytica genome project accomplished under the aegis of National Institute of
Allergy and Infectious Diseases (NIAID) grant to The Institute for Genomic Research
(TIGR) and the Wellcome trust grant to the Sanger Institute (Loftus B et aL, 2005). The
genome is estimated to be about 22.8 Mb in size, a value that is similar or identical to
other protozoan parasites. Analysis of 12.5 X coverage of the genome yielded 888 non
redundant contigs containing 9938 predicted genes comprising 49 % of the genome.
Average gene size was postulated to be 1.17 Kb. Interestingly, in contrast to previous
estimates of 6% intron containing genes (Willhoeft U et al 2001), genome sequence
showed that approximately 25% genes have introns while 6% genes contain multiple
introns (Loftus B, 2005). Another interesting feature of the genome is the presence of
episomal DNA molecules of sizes varying from 5 kb to 50 kb (Dhar SK, 1995:). Among
these different molecules, the 24.5 kbp EhR 1 circle which harbors the ribosomal RNA
genes is the most abundant with around 200 copies per haploid genome (Bhattacharya A
et aL, 2000). The organization and role of the other episomal molecules remains to be
elucidated. Genome analysis revealed the full complement of t-RNA genes in E.
histolytica and all except four of these were organ ized as tandem arrays of 1-5 copies.
- 7 -
Introduction
The mode in which the genome has been sequenced precludes analysis of the physical
organization of these t-RNA arrays. Previous work with histidine t-RNA gene arrays
(Satish S, 2002) however has shown that the histidine t-RNA array is located on only one
chromosome (band 12) in HM-I :IMSS and is not present exclusively in the sub
telomeric region although a few telomeric copies cannot be ruled out. Chromosomal
localization of the other t-RNA genes remains to be determined.
Multi-copy protein coding gene families also abound in the E. hislolylica genome.
These include virulence factor genes such as the Gal IGalNAc lectin intermediate subunit
( 30 copies), the cysteine proteinase family (at I east 20 different members), the protein
kinase family ( 271 members across seven superfamilies) and phosphatases (about 100
members) ( Loftus Bet aI., 2005). Gene expression of another mUlti-copy family, Ehssp I
responds differentially to oxidative and heat stress (Satish S, 2003).
The E. histolytica genome is also home to a number of genes that are thought to
be inherited via lateral gene transfer. About 96 genes, of which 58% code for metabolic
enzymes and 42% code for unknown functions, are believed to be horizontally
transferred. These genes are believed to have conferred enhanced capability for
carbohydrate metabolism on the genome, and the Cytophaga-Flavobacterium-
Bacteroides group appears to be the major donor for these genes. Interestingly, the genus
Bacteroides are the predominant microbes among the intestinal microbiota (Suau A,
1999). The predominance of Bacteroides in the same environmental niche where E.
histolytica colonizes also supports the notion of horizontal gene transfer among these
species.
The analysis of the E. histolytica genome shows that the genome is similar in
terms of its size, organization and composition to many other protozoan genomes from
different genera. However, it also shows some deviations from the general genome
organization among these classes. A brief comparative analysis of a few protozoa whose
genomes are being or have been sequenced is presented below. The results of these
studies are summarized in Table I.
- 8 -
Introduction
1.6. Salient features of protozoan genomes :
1. Most protozoan genomes show a very high A + T content.
2. Organeller genomes are highly reduced or defunct in protozoa.
3. Gene synteny and gene order is interestingly conserved across classes despite
sequence divergence.
4. Chromosomes do not condense and usually exhibit chromosomal size
polymorphism.
5. Some protozoan genomes are rich in repetitive DNA while others have a severe
paucity of repeat regions.
6. Protozoa moving between vector and host exhibit variable ploidy levels between
the asexual and sexual stages of the life cycle.
7. Many genes especially those related to metabolism have been inherited by
horizontal gene transfer.
8. Most protozoa lack well developed cellular organelle, and usually depend on an
atypical glycolysis as a major energy generating pathway.
- 9 -
Introduction
Organism I Entamoeba Plasmodium Dic(vostelium Cryptosporidium Giardia
Jalciparum discoideum hominis lamblia . /llsto(Vtica
Genome Size (Mbp) 11
22.8
1123 8. I 11 9.16 II 12
(A+PYo) Genic region 72 11 80.6 77.57 11 68 .3 NA
Ploidy 4n III n or 2n In II In 4n or 8n
No. of chromosomal 30-35 r 11
6
11
8 5
bands
Chromosome size 0.3 -2.2 110.643-3.29 114-7
1109
-14 II 16 -38
(mbp)
I % coding region 11
49% 11
62 11
56 11
69 IIN.A.
No. of predicted genes / 9938 predicted
1
5268
11
2799
11
3994 9649 predicted ORFs
ORFs
I Gene density 0.435 kbp 114.338 kbp 112.600 kbp 112.293 kbp N.A.
I % genes with introns 25 1154 11
68 115-20 (estimated) N.A.
Mean intronic length 100 r8 11177
N.A. N.A.
(bp)
I Composition of introns IINA 11 86.5 11
87 N.A. N.A.
-10-
introduction
I (%A+T) II II II I Mean intergenic region INA
11
1694
11
786 716 N.A.
I length
Composition of NA 86.4 86 69.7 N.A.
intergenic region
(%A+T)
Location of rRNA 14.0-25.0 kb nuclear Dispersed copies on 90 kb nuclear On chromosomes On chromosomes;
genes episomes different chromosomes eplsomes telomeric location;
(few copies present) rONA containing
chromosomes undergo
frequent rearragement
Energy generation Primarily glycolysis Glycolysis: host Polyphosphate as Atypical Primarily Glycolysis
PPi used metabolites; PPi used energy source Glycolysis PPi used
Reference(s): Loftus B et aI., Gardner et al (2002) L. Eichinger et al Xu Pet al (2004) httQ://gmod.mbl.edu/Qe
(2005) Nature, vol Nature, vol 419, pp 498- (2005) Nature, vol Nature, vol 431, rl/site/giardia?Qage=intr
433, pp865-868. 511. 435, pp 43-57. pp 1 107-1 112 Q
-11-
Introduction
Organism
I
Cryptosporidium Encephalitizoan Leishmania major Trypanosoma brucei
parvum cuniculi
Genome Size (Mb) 11 9.1 112.5 Mb 32.8 26
(A+T%) Genic region 11
70 11
53 40.3 55.6
Ploidy II Probably diploid Partially aneuploid diploid
No. of chromosomal bands 8 II 36 II
Chromosome size (Mb) 0.9-\.5 0.217-0.315 0.3-2.5
% coding region 75.3 90 47.9 50.5
No. of predicted genes /ORFs 3807 1997 8272 9068
Gene density 1 per 2.382 Kbp 1 gene per 1.025 Kbp I gene per 0.252 kb 1 per 0.348 kb
% genes with introns 5 Introns are very rare Introns are very rare N.A.
Mean intronic length N.A. N.A. N.A. N.A.
Composition of introns N.A. rA N.A. N.A.
(%A+T)
-12-
Introduction
I Mean intergenic region length 11
566 11129 bp 11
2045 1I1279bP I I (%A+T) intergenic region 11
76.1 1155 11
40.7
1159 I I Location of rRNA genes liOn chromosomes lion chromosomes II On chromosomes II On chromosomes I
Energy generation Primarily glycolysis Glycolysis,TCA absent Glycolysis and pentose Glycolysis, Pentose
phosphate pathway in phosphate pathway and
glycosomes TCA cycle
Reference: Abrahamsen et al Katinka MD et al (200 I) Ivens AC et al (2005) Berriman M et al (2005)
(2004) Nature, vol Nature, vol 414, pp 450- Science, vol 309, pp 436- Science, vol 309, pp 416-
431, pp 1107-1112 453. 442. 422.
Table 1. Salient features of a few protozoan genomes. N.A. - Not available
-13-
Introduction
1.7. Repetitive DNA is an important constituent of eukaryotic genomes:
Repetitive DNA can be classified (Glockner et a\., 200 I) as
I. Simple repeats ( stretches of mono to trinucleotides, > I 00 bp long)
2. Complex repeats ( length >500 bp usually and extending upto 5 kb)
3. Multi-copy gene families
The above system of repeat classification takes into account the length of a repetitive
DNA molecule as the sole criterion. Repetitive DNA can also be classified using their
location as a more useful criterion into the following categories:
I. Sub-telomeric satellite DNA
2. Sub-telomeric Retroelements
3. Interspersed elements
4. Episomal repetitive DNA
5. Telomeric repeats
6. Centromeric repeats and satellite DNA
Interspersed repeat elements are one of the most important family of repeats in the
study of genome organization and evolution. Most interspersed repeats are, or have been,
mobile in the host genome and hence are referred to more commonly as transposable
elements. Based upon their mode of propagation in the host genome transposable
elements are classified as either DNA Transposons or as Retroelements. DNA
Transposons have pervaded the prokaryotic kingdom widely but are less prevalent in the
eukaryotic realm. Active copies of these elements use an element encoded transposase to
excise from an existing site and integrate into a new site.
Retroelements use transcription as an intermediate stage for mobilization.
Existing active copies are transcribed and translated using host machinery and then
reverse transcribed by the element encoded proteins to generate a second copy of the
element which then integrates into another site in the genome.
Whether a transposable element survives in the genome or not depends upon
I. Can it minimize the deleterious effects of mutation that the genome would be
subjected to as a consequence of transposition ?
-14-
Introduction
2. Can it contribute functionality to the host genome and help it in its adaptation
to the environment?
Assuming a neutral rate of mutation for the entire genome, both host genes and
transposable elements respond adaptively to factors that may be environmental and / or
genetic. Many mutations can lead to potential loss of activity. The host genes tend to be
preserved via positive selection if their function is essential to the survival of the genome.
On the other hand the persistence of a transposable element (essentially a genetic
parasite) depends upon the benefits the host genome derives from its persistence.
Therefore it implies that successful persistence of a transposable element is an indication
of its contribution to the host function. Alternatively, a transposable element could also
exist as a passive commensal rather than as a genetic parasite. The less frequent
occurrence of DNA transposons in genomes of eukaryotes thus suggests that DNA
transposons have been excluded because their insertions led to loss of function mutation
in the host genome. Retro-elements on the contrary have successfully pervaded
eukaryotic genomes. One of their contributions to the host function may have been the
telomerase enzyme which is a specialized reverse transcriptase.
Retroelements can be sub-classified as Long Terminal Repeat (L TR) containing
and non-L TR containing retroelements. L TR retrotransposons resemble retroviruses in
their structure and intracellular life cycles. Generally they contain two ORFs flanked by
long direct terminal repeats often of 200-600 bp. Elements like gypsy from D.
melanogaster and Osvaldo from D. buzzati also encode env which codes for an envelope
protein. It is believed that retroviruses have originated from 5- 10 kb long L TR
retrotransposons by acquiring envelope protein coding genes and have then horizontally
transmitted between cells and species. The gag gene codes for a nucleic acid binding
protein with a characteristic cysteine / histidine motif ( CX2CX4HX4C) which is either a
zinc finger or a leucine zipper. Pol gene encodes the aspartic protease, reverse
transcriptase, RNAse Hand Integrase domains while the env gene encodes a envelope
protein. L TR elements are capable of extracellular existence and horizontal transmission
though they are usually transmitted vertically in the germ line (Brindley Pet aI., 2003).
A typical non-L TR retrotransposon is 5 -6 kb long, harbors an internal RNA
Polymerase II promoter, often has a A rich tail and codes for one or two open reading
-15-
Introduction
frames. The archetypal element for this class of retrotransposons is the L I element of
humans (Long Interspersed Nuclear Element or LINE) that constitutes about 17% of the
human genome and exists in an estimated 520,000 copies in human genome. The ORF(s)
encode a nucleic acid binding motif, coiled-coil motif for protein-protein interaction, a
reverse transcriptase , an endonuclease and in some families a RNAseH motif. The
nucleic acid binding and coiled-coil domains are located at the N terminal of the ORF
(ORFI) while the RT and EN domains are located in the ORF2. The RT domain is
central to the ORF2 but the EN domain can be upstream or downstream of the RT
domain.
Based upon the reverse transcriptase and endonuclease domain non-L TR elements
can be grouped into 11 clades several of which also carry RNAse H domains. Among
them are elements belonging to the CRE clade that are site specific for mini-exon arrays
in trypanosomes (Aksoy et aI., 1990), R2 elements from arthropods (Burke Muller and
Eickbush, 1995), and the L 1 clade. The R 1 element is similar to the R2 element in the RT
domain but shows a different organization. Both R 1 and R2 elements are site specific and
insert in the 28S rONA loci in al1hropods. R 1 encodes two ORFs while R2 encodes
single ORF. R2 element carries a C-tenninal endonuclease domain while R 1 carnes an
apurinic-apyrimidinic (APE) type of endonuclease at the N-terminal of ORF2.
Both L TR and non-L TR type of retroelements encode their own enzymatic
machinery needed for mobilization and are hence referred to as autonomous elements.
Eukaryotic genomes are rich in another class of elements that depend on this enzymatic
machinery for their own retrotransposition and are commonly referred to as Short
interspersed nuclear elements (SINEs). All SINE elements known to date are descendent
from either 7SL RNA, the RNA component of the Signal Recognition particle (SRP) or a
small RNA like t-RNA. The human Alu and murine B 1 SINEs belong to the former class
while most other SINE elements belong to the latter class.
Alu elements are 300 nt long, polyadenylated and have a bipartite structure
consisting of two monomers connected by an A rich region and differing only by a 31 nt
insertion in the right monomer. The left monomer carries a typical RNA Polymerase III
promoter in the form of box A and box B sequences. Elements are flanked by target site
-16-
Introduction duplications (Batzer MA and Deninger PL, 2002). A schematic representation of the
various classes of Retroelements is shown in Figure 3.
5' LTR gag pol en\,.. 3' LTR
Retro\·iruses
5' LTR gag pol 3' LTR ',',", ,<::::1 :::t
L T R retrotransposons
Pol II ORF·'l ORF·2
.... f;l:j@~;;:j{?:i;*l!b;_:":t~.,~.#lO· , IAAAAAAAA ..
Non· L TR retrotransposons (LI r~ Es)
Pol III .... 1:1 A L;] B I ,:M:A.4.A:¥i~)1.jjS;;;j
Non autonomous retroposons (SINEs)
Exon 3
Retro·pseudogenes
Figure 3. Different classes of retrotransposable elements
1.8. Steps involved in retrotransposition of non-L TR retrotransposons:
Retrotransposition of LINEs essentially involves the following steps:
I. Transcription is initiated from an internal promoter (presumably RNA Pol II) by
the host transcription machinery to produce a mono or bicistronic transcript. encoding
one or two ORFs. The transcript is believed to be polyadenylated. It is not known
however if the transcript is subject to post-transcriptional processing such as splicing,
capping etc.
2. The mature full length transcript is transported to the cytosol for translation.
3. Host translation machinery translates this transcript to synthesize one or two
proteins, depending on the mono or bicistronic nature of the element. The ORF I
protein has nucleic acid binding and protein-protein interaction domains and is
necessary for retrotransposition. ORF2 protein encodes a reverse transcriptase and an
-17-
Introduction
endonuclease the relative locations of which varies between species . The
proteins remain associated with the transcript to form a ribonucleoprotein (RNP)
particle.
4. The RNP complex is transported back to the nucleus. The mechanism by which
this transfer takes place is as yet not known.
5. In the nucleus, the insertion of the element takes place via a mechanism where
integration occurs into a new insertion site by Target primed Reverse transcription
(TPRT). TPRT involves nicking at the new site and the use of the free 3' OH
generated to catalyze the reverse transcription of the transcript, followed by its
copying and integration into the site. Majority of the copies of these elements are
5' truncated since full length reverse transcription of the transcript is seldom
achieved. Therefore most copies of LINEs in the genome tend to be truncated
with the 5' ends predominantly missing in most of them. The abundance of
truncated copies is further accentuated by host recombination processes leading to
copies with both ends truncated.
The mobilization of SINE elements also proceeds in a mechanism similar
to LINEs. Though thousands of copies of SINEs usually exist in a genome data from
human Alu and mouse Bland B2 elements suggests that only a few master copies are
retrotranspositionally competent and these copies give rise to other copies that propagate
in the genome borrowing the enzymatic machinery encoded by the LINEs. This is
supported by the observation that both LINEs and SINEs share a stretch of sequence at
their 3' ends near their poly(A) tails. Evolutionary data from humans shows that the
LINEs in mammalian genome have increased during the past 150 million years of
evolution; a period during which Alu amplification activity has been high. The following
Figure 4 gives a schematic representation of the steps involved in the mobilization of
LINEs and SINEs.
-18-
Introduction
II·:t!" "'. 4illiti·:e1l .,,~ Ii;~' •• "' .... ill'" • ""'iii ~".*.II t" .. III ................ _11: .. 01 ......... ,. •••• _ ............... "' ...... "' ....... "' .............................. ..
V ORF-l
ORF-2
Cytopfdsm
,n...... -" <c7Ua:sstntJ ..
Ribonucleoprotein complex
Figure 4. Steps involved in the mobilization of LINEs and SINES
-\9-
Introduction
1.8. Transposons in parasitic protozoa
Protozoan genomes also have been the targets of retrotransposons. The
distribution of mobile genetic elements in protozoan genomes has been reviewed in the
recent past (Bhattacharya S et aI., 2002; Wickstead et aI., 2003). The trypanosomatid
lineage ( T brucei, T cruzi and C. fasciculata) contain the most number of transposable
elements most of which surprisingly belong to the non-L TR class. G. lamblia and
E. histolytica each have three families of LINEs and SINEs. It is noteworthy that though
protozoan genomes share many characteristics the transposable elements in these
genomes belong to different lineages indicating that the invasion of these genomes
occurred at different times in evolution.
Table 2 summarises the major mobile genetic elements found among protozoa.
-20-
lntroduction
O.-ganisrr T. brucei T. cruzi C. fasciculata G.lamblia Element INGI RIME SLACS L1Tc CZAR SIRE VIPER CRE1 CRE2 GilM & GilT GilD
Near Near Dispersed; Four SL- SL- chromoso SL- SL- Near
Location repeated repeated RNA Dispersed RNA frequently RNA RNA Sub- Repetitive mes Telomeric genes genes near (0.96 to genes genes telomeres genes genes genes 1.6Mb)
Co~ no. -500 -500 9 2,800 30-40 1,500-3,00 -300 10 6 -15 30 Size (Kb) 5.2 0.512 6.678 5.5 7.237 0.428 2.539 3.5 9.6 - 6.0 -3.0 TSD 12 12 49 22 29 29 None (bp)*
Transcri 5 0.8 pt 2.5 to 0.8 to Not ( some Not to Not Not
9.0 8.0 detected small detected detected detected (Kb) species) 7.0
S' - UTR short 1511 101 1504 707 844 55 none I(bp) - 100) none
3' - UTR 148 535 534 60 bp 1200 3000 short I(bp) No. of OR One One Two Three Two One One One One One
Domains -, H~ !r!i.1 (9 in ORFs
RT * Location, central ORF2 ORF2 ORF2 (YIDD) central central (motif) ( FADD) (YLDD) (YADD) (YLDD) (YIDD)
EN * N terminus ORF1 6: ~;;;~ C- terminus Location, (Apurinic (Apurinic ORF1 ~~,(-; \
C terminus (CCHC) (motif) Endonucle Endonucle (CCHH) '1 -.... '. (CCHC) REL-ENDO
11;;/ . \ ase) ase) a. l,- .\
\' C': ' .... , ..... GI Y i( :) \ ",\ ORF1 \~ /C:" NB * ORF1 (CCHH) \ ~~ ~y3:.~ N- terminus N-Location, C terminus C- ORF3 ORF2 ~ * ,;/ (CCHH) terminus N terminus (motif) (CCHH) terminus (CCHH) N terminus ~::::.:::.:::;:::->' C terminus (CCHH) (CCHH)
(CCHH) (HHCC) (CCHH) 2 motifs
Other RNaseH Elementt Non-L TR Non-L TR Non-L TR on-L TR LTR LTR Non-L TR Non-L TR Non-L TR Non-L TR Potentia" No; may No; Yes(?) All present y Yes need Yes Yes Yes may
Yes May need Yes Yes day copies autonom INGI need CRE2 inactive ous VIPER
Table 2. Retrotransposable elements in protozoan genomes.
-21-
Introduction
1.10. Aims and Objectives of this study:
I. Identification of repetitive sequence families in the genome of E. histolytica and their
characterization with respect to physical localization and expression status.
2. Comparative analysis of non-L TR retrotransposons in the genome of E. histolytica with
respect to their genomic localization and expression status in E. histolytica.
-22-