INTRODUCTION -...

Introduction

INTRODUCTION

Introduction

Introduction:

1.1. Etiology of Amebiasis and life cycle of Entamoeba histolytica:

Entamoeba histolytica is the etiological agent of amebiasis which ranks third in terms of

mortality and morbidity caused by protozoan diseases (WHO, 1997; Walsh, 1986). WHO

estimates about 40-50 million cases of amebic colitis and amebic liver abscess (ALA) and about

100,000 annual deaths for this disease. These figures underrepresent the actual burden of the

disease since preliminary diagnosis of the disease is still based on microscopy which fails to

identify between closely related species of Entamoeba and moreover reports false positives

(Walsh et aI., 1986, WHO, 1997; Petri W, 2000). Studies in Vietnam (Blessman J, 2003;

Blessman J, 2002) showed a prevalence rate of 11.2 % and a new infection rate of 4.1 % in the

population while studies in Dhaka have shown that 2.4 % of the male population is predisposed

to ALA, 10% or more children have diarrhoea due to E. histolytica infection and 4 % have ALA

(Haque R, 1997). It is widely believed that the global burden of E. histolytica is not precisely

known and available figures provide only a rough estimate of the prevalence of the disease.

E. histolytica is a dimorphic parasite. Human beings are the primary hosts for infection.

Contaminated food and water sources are reservoirs of infection. Ingestion of dormant E.

histolytica cysts present therein begins the infectious cycle. The cyst passes through the stomach

being protected from its harsh acidic environment by the resistant cyst wall. The cyst excysts in

the ileo-caecal region of the small intestine to release four to eight trophozoites which constitute

the invasive form of the parasite. Amebiasis is not a result of only infection by the parasite but is

rather an outcome of interactions between the pathogen, the local microbial flora and importantly

the host immune response. The pathophysiology of amebiasis essentially involves adherence,

colonization, cytolysis and tissue necrosis leading to the classical amebic dysentery. Extra

intestinal infections result from hematogenic dissemination of the parasite to different organs of

the body; primarily the liver where the organism produces necrotic lesions called amebic liver

abscesses. The pathophysiology of intestinal amebiasis is better understood as compared to that

of extra-intestinal infections. Though not completely elucidated as yet, pathophysiology of

intestinal amebiasis is believed to involve the following stages:

Adherence of the trophozoites to the colonic mucin via interaction of the D-galactose / N

acetyl-D-glucosamine (Gal / GaINAc) specific amebic lectin with host glycoconjugates marks

Introduction

the initiation of the amebic invasion (Petri, 1987; Chadee et aI., 1987). Other proteins that might

aid in adherence include the Serine rich E. histolytica protein (SREHP)(Stanley et aI., 1990).

Colonization and invasion into the colonic endothelium is initiated by the secretion of thiol

(cysteine) proteinase (EhCPs) which is a multigene family with at least 20 members (Bruchhaus

I, 2003). The cysteine proteinases degrade MUC2 the major component of the human colonic

mucin, causing proteolysis of the enteric cell villin and effacement of intestinal microvilli (Keene

W, 1986; Li E, 1995; Lauwaet T, 2003) and activate pre-interleukin-1J3 which leads to

inflammation of the colonic endothelium (Zhang Z, 2000). The colonization and adherence of

the parasite also leads to cytolysis of the colonic tissue via contact dependent killing (mediated

via cytolytic peptides called amebapores secreted by the ameba), induction of apoptosis and

acute inflammatory reaction.

Cytolytic disruption leads to deeper infiltration and horizontal dissemination of the

parasite leading to the formation of flask shaped ulcers characteristic of amebiasis. The parasite

may penetrate into the hepatic portal circulation via the capillary bed in the intestinal wall and

use this hematogenous route to disseminate to various organs, primarily the liver the reason for

which is unknown. The typical extra-intestinal infection is exemplified by necrotic lesion in the

hepatic tissue known as amebic liver abscesses.

Extra-intestinal infections are a dead end for the parasite. The parasite ensures

continuation of the life cycle via cyst formation. Under conditions which are poorly understood,

trophozoites convert into dormant cysts that can withstand harsh environmental conditions. The

cysts are shed into the environment along with feces. The life cycle is designated in Figure 1.

2

,Cyst

~ Extyst (small In!asline')

" ~

: C6mm~!1s~gr6\'i.th>(¢plon) ..•. '.''/''.:: ..... >.

<¥' :"':.'

Fe,p.?I-oral: ~read (iJ(!!.Cl or jfjfjire,ct

'MLJCQ:satvf¢~ r,ati~)n 'COF1tirnfed cO[l'imfH)$sl (qofpH) e.~i$t~O~1:r

...• /.,. ."(. ¥:C .. ~ .. ~ D)recle,\~l'1$fon 10 .

s~in

,Y CutanE!>OUS or;

p0A;l!~ai:~in~Pi<lsj$

:arainabscess L.tmg or pericardial abscess

Fig. 1. Flow chart depicting the life cycle of E. histolytica infection.

1.2. Pathogenic and non-pathogenic species of Entamoeba:

Introduction

Entamoeba species are found across different animal genera and occupy a variety of

niches. The common gut pathogens include E. histolytica in humans, E. invadens in reptiles, E.

poleckii in pigs and E. chattoni in non-human primates. E. dis par which is genetically closest to

E. histolytica is a commensal in the human gut. E. gingivalis colonizes the oral cavity and is

implicated in causing oral amebiasis in humans. E. moshkovskii which is morphologically

indistinguishable from E. histolytica was considered to be non-pathogenic and free living. Recent

studies however report the occurrence of this species in humans, though direct demonstration of

pathogenicity is yet lacking (Ali IK et aI., 2003). Molecular studies have established that E.

dispar is genetically closest to E. histolytica. Among the above species, E. histolytica, E.

hartmanni, E. dispar and E. invadens each produce quadrinucleate cysts while E. polecki, E.

chattoni produce a uninucleate cyst. E. gingivalis lacks a cyst stage.

3

Introduction

1.3. rganization of E. histolytica trophozoites and cysts.

istolytica trophozoites are 20-40 f-lm in diameter with an ameboid shape, are generally

ucleate but upto four nuclei may be present. The nuclear membrane is 120nm thick and has

n lear pores of 65nm diameter. The important nuclear proteins known include histones HI, H3

al., 1997; Binder2 et al., 1995; Fodinger, 1993). Chromosomal

Trophozoites are sluggishly motile and possess a well organized

cytoskeleton (Tavares et al., 2005). The functionally homologous important cytoskeletal proteins

identified in this organism include Actin (Meza I, 1983), EF -1 a (De Meester, 1991), EhABPH

(Ebert et ai, 2000), Myosin 1 b (Voigt et aI, 1999) and Profilin (Binderl et al., 1995).Trophozoites

lack typical eukaryotic cytosolic organelles though functional counterparts are believed to exist

in the form of cytosolic vacuoles which abound the cytosol of this organism. These include the

crypton I mitosome (Ghosh S, 2000; Tovar et al., 1999) which are believed to be double

membranous, vacuolar, Hsp60 associated, mitochondrion derived structures. A typical ER and

Golgi is absent, although ER specific and Golgi markers have been shown to exist.. The

cytoplasmic membrane is approximately 10nm thick and has an abundant surface coat

predominantly composed of glycoproteins and lipophosphoglycans. The composition of the

plasma membrane is believed to protect the trophozoites from the action of the amebapores

(Andra J et al., 2004). In addition, the plasma membrane has been shown to harbor cholesterol

rich lipid raft regions which are involved in pinocytosis and adhesion (Laughlin et al., 2004). The

proteophosphoglycan of E. histolytica is linked via GPI anchors to the cell membrane. The cell

surface harbors the 260kDa Gal IGalNAc lectin, which plays an important role in adhesion of the

trophozoites to the intestinal lining.

1.4. Metabolism in E. histolytica:

The parasite is microaerophillic and carbohydrate fermentation forms the major source of

energy. Enzymes of glycolysis have been identified and some of them are postulated to have

been acquired by horizontal gene transfer. Enzymes of the TCA cycle and mitochondrial electron

transport chain are however absent. Pyrophosphate is used as a phosphate donor instead of a

nucleoside triphosphate in many of these reactions. E. histolytica does not have lipid

biosynthesis pathways but can synthesize common phospholipids. Since the parasite lacks

4

Introduction

enzymes needed for de novo purme, pyrimidine and thymidylate synthesis, nucleotide

metabolism is primarily effected through the salvage pathway. Interestingly, ribonucleotide

reductase is also absent. Reactive oxygen radicals are removed through the use of superoxide

dismutase. Protein metabolism is poorly understood. Cysteine is an essential amino acid for

growth while proline and glutamate exist in high intracellular concentrations in the

trophozoites.An overview of E. histolytica metabolism is shown in Figure 2.

5

t ~"t~tre-sc~ne

I ntrod uction

doo:o.:!t1bcnudo a;;icG

p "*""'f d<::Tp..·" dUTP---+ dU ~,lP biil~fI' .... ··' dc~~,xvr:boE:(l·l f'

t

!1eel;Ji(~(\l'\ydl) , gi'(Cerai,jeh','d<?·3P

sulpr.atc

ptms Dhatidyh::hcfi no l i~~('ir{mJony~ PP".1 t

, gilicr.fGprIGEpt\')-

gGfBnyl·PP l

f<Jrt:i,18:,'·PP-- farnesyl·prot~in

Figure 2. Metabolic pathways in E. histolytica [adapted from Loftus 8, 2005]

- 6 -

Introduction

1.5. Organization of the Entamoeba genome:

Understanding the genome of E. histolytica has been traditionally hampered due

to lack of tools for genetic dissection in this parasite. The genome is highly A + T rich

(67% in the coding region and 72% in the intergenic region) and it is estimated that each

trophozoites contains about 0.24 pg of DNA (Dvorak, 1995). The organism is believed to

be tetraploid (Willhoeft U, 1999). and the total haploid genome size was calculated at 50

Mbp. Pulsed Field Gel Electrophoresis (PFGE) helped to establish that the genome of E.

histolytica HM-I :IMSS has 31-35 chromosomal bands of sizes ranging from 300 kb to

2200 Kb. The size distribution however changes with the isolate and the conditions used

for separation of the chromosomal bands (Bhattacharya A et aL, 2000; Willhoeft U and

Tannich E, 1999; Bagchi A, 2001; Clark CG, Cantellano ME and Bhattacharya A, 2000).

The interpretation is further complicated by chromosomal size polymorphism which

shapes the fluctuations in genome sizes of many other protozoan parasites. Molecular

analysis showed that there are 14 linkage groups (Willhoeft U and Tannich E, 1999).

Current understanding of the E. histolylica genome stems from the completion of

the E. histolytica genome project accomplished under the aegis of National Institute of

Allergy and Infectious Diseases (NIAID) grant to The Institute for Genomic Research

(TIGR) and the Wellcome trust grant to the Sanger Institute (Loftus B et aL, 2005). The

genome is estimated to be about 22.8 Mb in size, a value that is similar or identical to

other protozoan parasites. Analysis of 12.5 X coverage of the genome yielded 888 non

redundant contigs containing 9938 predicted genes comprising 49 % of the genome.

Average gene size was postulated to be 1.17 Kb. Interestingly, in contrast to previous

estimates of 6% intron containing genes (Willhoeft U et al 2001), genome sequence

showed that approximately 25% genes have introns while 6% genes contain multiple

introns (Loftus B, 2005). Another interesting feature of the genome is the presence of

episomal DNA molecules of sizes varying from 5 kb to 50 kb (Dhar SK, 1995:). Among

these different molecules, the 24.5 kbp EhR 1 circle which harbors the ribosomal RNA

genes is the most abundant with around 200 copies per haploid genome (Bhattacharya A

et aL, 2000). The organization and role of the other episomal molecules remains to be

elucidated. Genome analysis revealed the full complement of t-RNA genes in E.

histolytica and all except four of these were organ ized as tandem arrays of 1-5 copies.

- 7 -

Introduction

The mode in which the genome has been sequenced precludes analysis of the physical

organization of these t-RNA arrays. Previous work with histidine t-RNA gene arrays

(Satish S, 2002) however has shown that the histidine t-RNA array is located on only one

chromosome (band 12) in HM-I :IMSS and is not present exclusively in the sub

telomeric region although a few telomeric copies cannot be ruled out. Chromosomal

localization of the other t-RNA genes remains to be determined.

Multi-copy protein coding gene families also abound in the E. hislolylica genome.

These include virulence factor genes such as the Gal IGalNAc lectin intermediate subunit

( 30 copies), the cysteine proteinase family (at I east 20 different members), the protein

kinase family ( 271 members across seven superfamilies) and phosphatases (about 100

members) ( Loftus Bet aI., 2005). Gene expression of another mUlti-copy family, Ehssp I

responds differentially to oxidative and heat stress (Satish S, 2003).

The E. histolytica genome is also home to a number of genes that are thought to

be inherited via lateral gene transfer. About 96 genes, of which 58% code for metabolic

enzymes and 42% code for unknown functions, are believed to be horizontally

transferred. These genes are believed to have conferred enhanced capability for

carbohydrate metabolism on the genome, and the Cytophaga-Flavobacterium-

Bacteroides group appears to be the major donor for these genes. Interestingly, the genus

Bacteroides are the predominant microbes among the intestinal microbiota (Suau A,

1999). The predominance of Bacteroides in the same environmental niche where E.

histolytica colonizes also supports the notion of horizontal gene transfer among these

species.

The analysis of the E. histolytica genome shows that the genome is similar in

terms of its size, organization and composition to many other protozoan genomes from

different genera. However, it also shows some deviations from the general genome

organization among these classes. A brief comparative analysis of a few protozoa whose

genomes are being or have been sequenced is presented below. The results of these

studies are summarized in Table I.

- 8 -

Introduction

1.6. Salient features of protozoan genomes :

1. Most protozoan genomes show a very high A + T content.

2. Organeller genomes are highly reduced or defunct in protozoa.

3. Gene synteny and gene order is interestingly conserved across classes despite

sequence divergence.

4. Chromosomes do not condense and usually exhibit chromosomal size

polymorphism.

5. Some protozoan genomes are rich in repetitive DNA while others have a severe

paucity of repeat regions.

6. Protozoa moving between vector and host exhibit variable ploidy levels between

the asexual and sexual stages of the life cycle.

7. Many genes especially those related to metabolism have been inherited by

horizontal gene transfer.

8. Most protozoa lack well developed cellular organelle, and usually depend on an

atypical glycolysis as a major energy generating pathway.

- 9 -

Introduction

Organism I Entamoeba Plasmodium Dic(vostelium Cryptosporidium Giardia

Jalciparum discoideum hominis lamblia . /llsto(Vtica

Genome Size (Mbp) 11

22.8

1123 8. I 11 9.16 II 12

(A+PYo) Genic region 72 11 80.6 77.57 11 68 .3 NA

Ploidy 4n III n or 2n In II In 4n or 8n

No. of chromosomal 30-35 r 11

6

11

8 5

bands

Chromosome size 0.3 -2.2 110.643-3.29 114-7

1109

-14 II 16 -38

(mbp)

I % coding region 11

49% 11

62 11

56 11

69 IIN.A.

No. of predicted genes / 9938 predicted

1

5268

11

2799

11

3994 9649 predicted ORFs

ORFs

I Gene density 0.435 kbp 114.338 kbp 112.600 kbp 112.293 kbp N.A.

I % genes with introns 25 1154 11

68 115-20 (estimated) N.A.

Mean intronic length 100 r8 11177

N.A. N.A.

(bp)

I Composition of introns IINA 11 86.5 11

87 N.A. N.A.

-10-

introduction

I (%A+T) II II II I Mean intergenic region INA

11

1694

11

786 716 N.A.

I length

Composition of NA 86.4 86 69.7 N.A.

intergenic region

(%A+T)

Location of rRNA 14.0-25.0 kb nuclear Dispersed copies on 90 kb nuclear On chromosomes On chromosomes;

genes episomes different chromosomes eplsomes telomeric location;

(few copies present) rONA containing

chromosomes undergo

frequent rearragement

Energy generation Primarily glycolysis Glycolysis: host Polyphosphate as Atypical Primarily Glycolysis

PPi used metabolites; PPi used energy source Glycolysis PPi used

Reference(s): Loftus B et aI., Gardner et al (2002) L. Eichinger et al Xu Pet al (2004) httQ://gmod.mbl.edu/Qe

(2005) Nature, vol Nature, vol 419, pp 498- (2005) Nature, vol Nature, vol 431, rl/site/giardia?Qage=intr

433, pp865-868. 511. 435, pp 43-57. pp 1 107-1 112 Q

-11-

Introduction

Organism

I

Cryptosporidium Encephalitizoan Leishmania major Trypanosoma brucei

parvum cuniculi

Genome Size (Mb) 11 9.1 112.5 Mb 32.8 26

(A+T%) Genic region 11

70 11

53 40.3 55.6

Ploidy II Probably diploid Partially aneuploid diploid

No. of chromosomal bands 8 II 36 II

Chromosome size (Mb) 0.9-\.5 0.217-0.315 0.3-2.5

% coding region 75.3 90 47.9 50.5

No. of predicted genes /ORFs 3807 1997 8272 9068

Gene density 1 per 2.382 Kbp 1 gene per 1.025 Kbp I gene per 0.252 kb 1 per 0.348 kb

% genes with introns 5 Introns are very rare Introns are very rare N.A.

Mean intronic length N.A. N.A. N.A. N.A.

Composition of introns N.A. rA N.A. N.A.

(%A+T)

-12-

Introduction

I Mean intergenic region length 11

566 11129 bp 11

2045 1I1279bP I I (%A+T) intergenic region 11

76.1 1155 11

40.7

1159 I I Location of rRNA genes liOn chromosomes lion chromosomes II On chromosomes II On chromosomes I

Energy generation Primarily glycolysis Glycolysis,TCA absent Glycolysis and pentose Glycolysis, Pentose

phosphate pathway in phosphate pathway and

glycosomes TCA cycle

Reference: Abrahamsen et al Katinka MD et al (200 I) Ivens AC et al (2005) Berriman M et al (2005)

(2004) Nature, vol Nature, vol 414, pp 450- Science, vol 309, pp 436- Science, vol 309, pp 416-

431, pp 1107-1112 453. 442. 422.

Table 1. Salient features of a few protozoan genomes. N.A. - Not available

-13-

Introduction

1.7. Repetitive DNA is an important constituent of eukaryotic genomes:

Repetitive DNA can be classified (Glockner et a\., 200 I) as

I. Simple repeats ( stretches of mono to trinucleotides, > I 00 bp long)

2. Complex repeats ( length >500 bp usually and extending upto 5 kb)

3. Multi-copy gene families

The above system of repeat classification takes into account the length of a repetitive

DNA molecule as the sole criterion. Repetitive DNA can also be classified using their

location as a more useful criterion into the following categories:

I. Sub-telomeric satellite DNA

2. Sub-telomeric Retroelements

3. Interspersed elements

4. Episomal repetitive DNA

5. Telomeric repeats

6. Centromeric repeats and satellite DNA

Interspersed repeat elements are one of the most important family of repeats in the

study of genome organization and evolution. Most interspersed repeats are, or have been,

mobile in the host genome and hence are referred to more commonly as transposable

elements. Based upon their mode of propagation in the host genome transposable

elements are classified as either DNA Transposons or as Retroelements. DNA

Transposons have pervaded the prokaryotic kingdom widely but are less prevalent in the

eukaryotic realm. Active copies of these elements use an element encoded transposase to

excise from an existing site and integrate into a new site.

Retroelements use transcription as an intermediate stage for mobilization.

Existing active copies are transcribed and translated using host machinery and then

reverse transcribed by the element encoded proteins to generate a second copy of the

element which then integrates into another site in the genome.

Whether a transposable element survives in the genome or not depends upon

I. Can it minimize the deleterious effects of mutation that the genome would be

subjected to as a consequence of transposition ?

-14-

Introduction

2. Can it contribute functionality to the host genome and help it in its adaptation

to the environment?

Assuming a neutral rate of mutation for the entire genome, both host genes and

transposable elements respond adaptively to factors that may be environmental and / or

genetic. Many mutations can lead to potential loss of activity. The host genes tend to be

preserved via positive selection if their function is essential to the survival of the genome.

On the other hand the persistence of a transposable element (essentially a genetic

parasite) depends upon the benefits the host genome derives from its persistence.

Therefore it implies that successful persistence of a transposable element is an indication

of its contribution to the host function. Alternatively, a transposable element could also

exist as a passive commensal rather than as a genetic parasite. The less frequent

occurrence of DNA transposons in genomes of eukaryotes thus suggests that DNA

transposons have been excluded because their insertions led to loss of function mutation

in the host genome. Retro-elements on the contrary have successfully pervaded

eukaryotic genomes. One of their contributions to the host function may have been the

telomerase enzyme which is a specialized reverse transcriptase.

Retroelements can be sub-classified as Long Terminal Repeat (L TR) containing

and non-L TR containing retroelements. L TR retrotransposons resemble retroviruses in

their structure and intracellular life cycles. Generally they contain two ORFs flanked by

long direct terminal repeats often of 200-600 bp. Elements like gypsy from D.

melanogaster and Osvaldo from D. buzzati also encode env which codes for an envelope

protein. It is believed that retroviruses have originated from 5- 10 kb long L TR

retrotransposons by acquiring envelope protein coding genes and have then horizontally

transmitted between cells and species. The gag gene codes for a nucleic acid binding

protein with a characteristic cysteine / histidine motif ( CX2CX4HX4C) which is either a

zinc finger or a leucine zipper. Pol gene encodes the aspartic protease, reverse

transcriptase, RNAse Hand Integrase domains while the env gene encodes a envelope

protein. L TR elements are capable of extracellular existence and horizontal transmission

though they are usually transmitted vertically in the germ line (Brindley Pet aI., 2003).

A typical non-L TR retrotransposon is 5 -6 kb long, harbors an internal RNA

Polymerase II promoter, often has a A rich tail and codes for one or two open reading

-15-

Introduction

frames. The archetypal element for this class of retrotransposons is the L I element of

humans (Long Interspersed Nuclear Element or LINE) that constitutes about 17% of the

human genome and exists in an estimated 520,000 copies in human genome. The ORF(s)

encode a nucleic acid binding motif, coiled-coil motif for protein-protein interaction, a

reverse transcriptase , an endonuclease and in some families a RNAseH motif. The

nucleic acid binding and coiled-coil domains are located at the N terminal of the ORF

(ORFI) while the RT and EN domains are located in the ORF2. The RT domain is

central to the ORF2 but the EN domain can be upstream or downstream of the RT

domain.

Based upon the reverse transcriptase and endonuclease domain non-L TR elements

can be grouped into 11 clades several of which also carry RNAse H domains. Among

them are elements belonging to the CRE clade that are site specific for mini-exon arrays

in trypanosomes (Aksoy et aI., 1990), R2 elements from arthropods (Burke Muller and

Eickbush, 1995), and the L 1 clade. The R 1 element is similar to the R2 element in the RT

domain but shows a different organization. Both R 1 and R2 elements are site specific and

insert in the 28S rONA loci in al1hropods. R 1 encodes two ORFs while R2 encodes

single ORF. R2 element carries a C-tenninal endonuclease domain while R 1 carnes an

apurinic-apyrimidinic (APE) type of endonuclease at the N-terminal of ORF2.

Both L TR and non-L TR type of retroelements encode their own enzymatic

machinery needed for mobilization and are hence referred to as autonomous elements.

Eukaryotic genomes are rich in another class of elements that depend on this enzymatic

machinery for their own retrotransposition and are commonly referred to as Short

interspersed nuclear elements (SINEs). All SINE elements known to date are descendent

from either 7SL RNA, the RNA component of the Signal Recognition particle (SRP) or a

small RNA like t-RNA. The human Alu and murine B 1 SINEs belong to the former class

while most other SINE elements belong to the latter class.

Alu elements are 300 nt long, polyadenylated and have a bipartite structure

consisting of two monomers connected by an A rich region and differing only by a 31 nt

insertion in the right monomer. The left monomer carries a typical RNA Polymerase III

promoter in the form of box A and box B sequences. Elements are flanked by target site

-16-

Introduction duplications (Batzer MA and Deninger PL, 2002). A schematic representation of the

various classes of Retroelements is shown in Figure 3.

5' LTR gag pol en\,.. 3' LTR

Retro\·iruses

5' LTR gag pol 3' LTR ',',", ,<::::1 :::t

L T R retrotransposons

Pol II ORF·'l ORF·2

.... f;l:j@~;;:j{?:i;*l!b;_:":t~.,~.#lO· , IAAAAAAAA ..

Non· L TR retrotransposons (LI r~ Es)

Pol III .... 1:1 A L;] B I ,:M:A.4.A:¥i~)1.jjS;;;j

Non autonomous retroposons (SINEs)

Exon 3

Retro·pseudogenes

Figure 3. Different classes of retrotransposable elements

1.8. Steps involved in retrotransposition of non-L TR retrotransposons:

Retrotransposition of LINEs essentially involves the following steps:

I. Transcription is initiated from an internal promoter (presumably RNA Pol II) by

the host transcription machinery to produce a mono or bicistronic transcript. encoding

one or two ORFs. The transcript is believed to be polyadenylated. It is not known

however if the transcript is subject to post-transcriptional processing such as splicing,

capping etc.

2. The mature full length transcript is transported to the cytosol for translation.

3. Host translation machinery translates this transcript to synthesize one or two

proteins, depending on the mono or bicistronic nature of the element. The ORF I

protein has nucleic acid binding and protein-protein interaction domains and is

necessary for retrotransposition. ORF2 protein encodes a reverse transcriptase and an

-17-

Introduction

endonuclease the relative locations of which varies between species . The

proteins remain associated with the transcript to form a ribonucleoprotein (RNP)

particle.

4. The RNP complex is transported back to the nucleus. The mechanism by which

this transfer takes place is as yet not known.

5. In the nucleus, the insertion of the element takes place via a mechanism where

integration occurs into a new insertion site by Target primed Reverse transcription

(TPRT). TPRT involves nicking at the new site and the use of the free 3' OH

generated to catalyze the reverse transcription of the transcript, followed by its

copying and integration into the site. Majority of the copies of these elements are

5' truncated since full length reverse transcription of the transcript is seldom

achieved. Therefore most copies of LINEs in the genome tend to be truncated

with the 5' ends predominantly missing in most of them. The abundance of

truncated copies is further accentuated by host recombination processes leading to

copies with both ends truncated.

The mobilization of SINE elements also proceeds in a mechanism similar

to LINEs. Though thousands of copies of SINEs usually exist in a genome data from

human Alu and mouse Bland B2 elements suggests that only a few master copies are

retrotranspositionally competent and these copies give rise to other copies that propagate

in the genome borrowing the enzymatic machinery encoded by the LINEs. This is

supported by the observation that both LINEs and SINEs share a stretch of sequence at

their 3' ends near their poly(A) tails. Evolutionary data from humans shows that the

LINEs in mammalian genome have increased during the past 150 million years of

evolution; a period during which Alu amplification activity has been high. The following

Figure 4 gives a schematic representation of the steps involved in the mobilization of

LINEs and SINEs.

-18-

Introduction

II·:t!" "'. 4illiti·:e1l .,,~ Ii;~' •• "' .... ill'" • ""'iii ~".*.II t" .. III ................ _11: .. 01 ......... ,. •••• _ ............... "' ...... "' ....... "' .............................. ..

V ORF-l

ORF-2

Cytopfdsm

,n...... -" <c7Ua:sstntJ ..

Ribonucleoprotein complex

Figure 4. Steps involved in the mobilization of LINEs and SINES

-\9-

Introduction

1.8. Transposons in parasitic protozoa

Protozoan genomes also have been the targets of retrotransposons. The

distribution of mobile genetic elements in protozoan genomes has been reviewed in the

recent past (Bhattacharya S et aI., 2002; Wickstead et aI., 2003). The trypanosomatid

lineage ( T brucei, T cruzi and C. fasciculata) contain the most number of transposable

elements most of which surprisingly belong to the non-L TR class. G. lamblia and

E. histolytica each have three families of LINEs and SINEs. It is noteworthy that though

protozoan genomes share many characteristics the transposable elements in these

genomes belong to different lineages indicating that the invasion of these genomes

occurred at different times in evolution.

Table 2 summarises the major mobile genetic elements found among protozoa.

-20-

lntroduction

O.-ganisrr T. brucei T. cruzi C. fasciculata G.lamblia Element INGI RIME SLACS L1Tc CZAR SIRE VIPER CRE1 CRE2 GilM & GilT GilD

Near Near Dispersed; Four SL- SL- chromoso SL- SL- Near

Location repeated repeated RNA Dispersed RNA frequently RNA RNA Sub- Repetitive mes Telomeric genes genes near (0.96 to genes genes telomeres genes genes genes 1.6Mb)

Co~ no. -500 -500 9 2,800 30-40 1,500-3,00 -300 10 6 -15 30 Size (Kb) 5.2 0.512 6.678 5.5 7.237 0.428 2.539 3.5 9.6 - 6.0 -3.0 TSD 12 12 49 22 29 29 None (bp)*

Transcri 5 0.8 pt 2.5 to 0.8 to Not ( some Not to Not Not

9.0 8.0 detected small detected detected detected (Kb) species) 7.0

S' - UTR short 1511 101 1504 707 844 55 none I(bp) - 100) none

3' - UTR 148 535 534 60 bp 1200 3000 short I(bp) No. of OR One One Two Three Two One One One One One

Domains -, H~ !r!i.1 (9 in ORFs

RT * Location, central ORF2 ORF2 ORF2 (YIDD) central central (motif) ( FADD) (YLDD) (YADD) (YLDD) (YIDD)

EN * N terminus ORF1 6: ~;;;~ C- terminus Location, (Apurinic (Apurinic ORF1 ~~,(-; \

C terminus (CCHC) (motif) Endonucle Endonucle (CCHH) '1 -.... '. (CCHC) REL-ENDO

11;;/ . \ ase) ase) a. l,- .\

\' C': ' .... , ..... GI Y i( :) \ ",\ ORF1 \~ /C:" NB * ORF1 (CCHH) \ ~~ ~y3:.~ N- terminus N-Location, C terminus C- ORF3 ORF2 ~ * ,;/ (CCHH) terminus N terminus (motif) (CCHH) terminus (CCHH) N terminus ~::::.:::.:::;:::->' C terminus (CCHH) (CCHH)

(CCHH) (HHCC) (CCHH) 2 motifs

Other RNaseH Elementt Non-L TR Non-L TR Non-L TR on-L TR LTR LTR Non-L TR Non-L TR Non-L TR Non-L TR Potentia" No; may No; Yes(?) All present y Yes need Yes Yes Yes may

Yes May need Yes Yes day copies autonom INGI need CRE2 inactive ous VIPER

Table 2. Retrotransposable elements in protozoan genomes.

-21-

Introduction

1.10. Aims and Objectives of this study:

I. Identification of repetitive sequence families in the genome of E. histolytica and their

characterization with respect to physical localization and expression status.

2. Comparative analysis of non-L TR retrotransposons in the genome of E. histolytica with

respect to their genomic localization and expression status in E. histolytica.

-22-

INTRODUCTION -...

Documents

Transcript of INTRODUCTION -...