Title Page Computational Analysis of RNA Editing Sites in Plant ...

35
and Evolution. All rights reserved. For permissions, please e-mail: [email protected] The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology 1 Title Page Computational Analysis of RNA Editing Sites in Plant Mitochondrial Genomes Reveals Similar Information Content and a Sporadic Distribution of Editing Sites. R. Michael Mulligan 1* , Kenneth LC Chang 2** and Chia Ching Chou 2** 1 Department of Developmental and Cell Biology 2 Department of Information and Computer Science University of California Irvine, CA 92697-2300 Submission for a Research Article *To whom correspondence should be addressed: R. Michael Mulligan Department of Developmental and Cell Biology University of California Irvine, CA 92697-2300 Voice: 949-824-8433 Fax: 949-824-4709 Email: [email protected] Running Title: Computational Analysis of RNA Editing Sites Key words: RNA Editing, evolution, plant mitochondria Date Received: Date Accepted: **These authors contributed equally to the manuscript. MBE Advance Access published June 24, 2007

Transcript of Title Page Computational Analysis of RNA Editing Sites in Plant ...

and Evolution. All rights reserved. For permissions, please e-mail: [email protected] The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology

1

Title Page

Computational Analysis of RNA Editing Sites in Plant Mitochondrial Genomes Reveals

Similar Information Content and a Sporadic Distribution of Editing Sites.

R. Michael Mulligan1*, Kenneth LC Chang2** and Chia Ching Chou2**

1Department of Developmental and Cell Biology

2Department of Information and Computer Science

University of California

Irvine, CA 92697-2300

Submission for a Research Article

*To whom correspondence should be addressed:

R. Michael Mulligan

Department of Developmental and Cell Biology

University of California

Irvine, CA 92697-2300

Voice: 949-824-8433

Fax: 949-824-4709

Email: [email protected]

Running Title: Computational Analysis of RNA Editing Sites

Key words: RNA Editing, evolution, plant mitochondria

Date Received:

Date Accepted:

**These authors contributed equally to the manuscript.

MBE Advance Access published June 24, 2007

2

ABSTRACT

A computational analysis of RNA editing sites was performed on protein-coding

sequences of plant mitochondrial genomes from Arabidopsis thaliana, Beta vulgaris,

Brassica napus, and Oryza sativa. The distribution of nucleotides around edited and

unedited cytidines was compared in 41 nucleotide segments and included 1481 edited

cytidines and 21,390 unedited cytidines in the four genomes. The distribution of

nucleotides was examined in 1, 2, and 3 nucleotide windows by comparison of nucleotide

frequency ratios and relative entropy. The relative entropy analyses indicate that

information is encoded in the nucleotide sequences in the 5 prime flank (-18 to -14, -13 to

-10, -6 to -4, -2/-1) and the immediate 3 prime flanking nucleotide (+1), and these regions

may be important in editing site recognition. The relative entropy was large when two or

three nucleotide windows were analyzed, suggesting that several contiguous nucleotides

may be involved in editing site recognition. RNA editing sites were frequently preceded

by two pyrimidines or AU and followed by a guanidine (HYCG) in the monocot and

dicot mitochondrial genomes, and rarely preceded by two purines. Analysis of

chloroplast editing sites from a dicot, Nicotiana tabacum, and a monocot, Zea mays,

revealed a similar distribution of nucleotides around editing sites (HYCA). The

similarity of this motif around editing sites in monocots and dicots in both mitochondria

and chloroplasts suggests that a mechanistic basis for this motif exists that is common in

these different organelle and phylogenetic systems. The preferred sequence distribution

around RNA editing sites may have an important impact on the acquisition of editing

sites in evolution because the immediate sequence context of a cytidine residue may

3

render a cytidine editable or uneditable, and consequently determine whether a T to C

mutation at a specific position may be corrected by RNA editing. The distribution of

editing sites in many protein-coding sequences is shown to be non-random with editing

sites clustered in groups separated by regions with no editing sites. The sporadic

distribution of editing sites could result from a mechanism of editing site loss by gene

conversion utilizing edited sequence information, possibly through an edited cDNA

intermediate.

Key Words: RNA Editing, Relative Entropy, Gene Conversion, Copy Correction, Non-

random Distribution, Evolution of Editing, Editing Site Recognition, Retroconversion,

Gene Transfer

4

Introduction

RNA editing is a post-transcriptional process that changes the nucleotide sequence of

RNAs. C-to-U editing occurs in the organelles of vascular plants and changes the coding

information in mRNAs. In higher plants, specific cytidine residues are converted to

uridine residues in chloroplast and in mitochondrial transcripts, and this process

frequently re-specifies the codon to direct the incorporation of a non-synonymous amino

acid residue (Covello and Gray 1989; Gualberto et al. 1989; Hiesel et al. 1989). The

amino acid specified by the edited codon is typically the evolutionarily conserved amino

acid at that position, and the unedited codon would code for a radical amino acid

substitution.

Several higher plant chloroplast genomes have been sequenced and analysed for

editing, and generally have about 30 C-to-U editing sites (Maier et al. 1995; Sugiura

1995; Schmitz-Linneweber et al. 2002; Kugita et al. 2003a; Kugita et al. 2003b; Tillich et

al. 2005). The complete Arabidopsis thaliana, Brassica napus, Beta vulgaris and Oryza

sativa mitochondrial genomes have been sequenced and analysed for RNA editing, and

these genomes encode 441, 427, 357, and 491 C-to-U editing sites, respectively (Giege

and Brennicke 1999; Kubo et al. 2000; Notsu et al. 2002; Handa 2003; Mower 2005).

Thus, the number of nucleotide changes directed by RNA editing is much greater in

mitochondria than in chloroplasts, although the editing process is generally thought to be

similar in these organelles (Maier et al. 1996; Mulligan 2004)

The plant organellar editing complexes must specifically recognize ~30 editing

sites in chloroplasts and about 400 editing sites in plant mitochondria. Analysis of three

editing sites in transgenic tobacco chloroplasts by 5’ and 3’ deletion led to the broad

5

conclusion that recognition elements exist largely in the 5’ flanking region with some

sequence requirements in the 3’ region (Chaudhuri, Carrer and Maliga 1995; Bock,

Hermann and Kossel 1996; Chaudhuri and Maliga 1996; Bock, Hermann and Fuchs

1997; Reed and Hanson 1997; Hermann and Bock 1999; Reed, Peeters and Hanson 2001;

Chateigner-Boutin and Hanson 2002; Chateigner-Boutin and Hanson 2003). A detailed

analysis of the petB and psbE editing site in Nicotiana tabucum chloroplasts has

identified the -20 to +10 region as important for editing site conversion (Miyamoto,

Obokata and Sugiura 2002), and mutations at nucleotides -11 to -1, +2 to +4, and +8/9

were deleterious to in vitro editing of psbE RNAs (Hayes and Hanson 2007). RNA

editing site recognition in chloroplasts appears to occur through trans-acting factors that

recognize several editing sites with similar cis elements (Chateigner-Boutin and Hanson

2002; Chateigner-Boutin and Hanson 2003). The groups of editing sites are referred to as

editing site clusters and share common sequence motifs that are frequently composed of

three or four nucleotides. Recently, the pentatricopeptide proteins have been recognized

as a large class of organellar RNA binding proteins that are required for RNA editing and

other RNA processing reactions (Kotera, Tasaka and Shikanai 2005; Schmitz-

Linneweber et al. 2006).

Computational analysis of sequences around editing sites has been performed by

examination of the distribution of single nucleotides in close proximity to RNA editing

sites in plant mitochondrial genomes, and were compared to a small subset of unedited

cytidines (Giege and Brennicke 1999; Cummings and Myers 2004). An analysis of the

Arabidopsis mitochondrial genome compared nucleotide frequencies from -17 to +7 in

sequences around all known edited cytidines and 30 randomly selected unedited

6

cytidines. This study reported a high incidence of pyrimidines in position -2 and -1, a

low incidence of guanines at position -1, and other unexpected nucleotide frequencies at -

5 and -17 (Giege and Brennicke 1999). A second computational analysis of plant

mitochondrial editing sites analysed editing sites from the Oryza, Arabidopsis, and

Brassica mitochondrial genomes and compared them to a subset of randomly selected

non-edited cytidines with the same codon position frequencies (Cummings and Myers

2004). This study detected the pyrimidine bias that exists at position -1, and reported a

correlation of the free energy of folding of the 41 nucleotide RNA segments centered on

an edited or unedited cytidine.

In this study we present a comprehensive analysis of edited and unedited cytidines

in the protein-coding sequences of four mitochondrial genomes. In order to evaluate

possible higher order distribution of nucleotides, our analyses have included analysis of

the distribution of single, di- and tri-nucleotides around edited and unedited cytidines in

the Arabidopsis, Beta, Brassica and Oryza mitochondrial genomes. The relative entropy

of the nucleotide sequences flanking edited and unedited cytidines are very similar in

these genomes, suggesting that the same regions are utilized in editing site recognition in

mitochondria of moncots and dicots. Analysis of information content suggests that

several groups of two or three contiguous nucleotides may be utilized in editing site

recognition. Comparison of the RNA sequences immediately adjacent to chloroplast and

mitochondrial editing sites to unedited cytidines suggests that a similar sequence of

YYCR are enriched around editing sites in both organelle systems in monocots and

dicots, and the immediate sequence context of a cytidine residue may be critical factor in

whether a cytidine is editable. In addition, the distribution of editing sites within

7

individual coding sequences was analysed and editing sites are frequently non-randomly

distributed. Evolutionary mechanisms that may result in a sporadic distribution of RNA

editing sites are discussed.

Materials and Methods

DNA Sequence Data

A comprehensive analysis of RNA editing sites in mitochondrial genomes has been

reported for the Arabidopsis thaliana, Beta vulgaris, Brassica napus , and Oryza sativa

(Giege and Brennicke 1999; Kubo et al. 2000; Notsu et al. 2002; Handa 2003; Mower

and Palmer 2006). DNA sequences and editing site locations were obtained from

Genbank accession numbers NC001284, AP006444, BA000009 with DQ381444-

DQ381465, and BA000029, respectively. Genbank genome entries were converted into a

series of FASTA-formatted text for all known protein-coding sequences, and were

annotated with edited cytidines represented as an upper case C. Thus, editing sites are

represented as the unedited nucleotide and are considered to be cytidines in these

analyses. These files are available in the supplemental information. Protein coding

sequences were limited to entries that were larger than 100 nucleotides, and included only

protein coding sequences, with no intron or untranslated regions. In addition, small

ORFs, uncharacterized ORFs, and small exons were eliminated from the database.

Computational Analyses

Computer programs were written and compiled with Dr Java (version 1.4). The

nucleotide distribution around all edited and all unedited cytidines in the database was

analyzed in a sliding window of one, two, or three nucleotides. Each FASTA entry in the

genome file was scanned for an edited C or an unedited c, and every time a cytidine was

8

encountered, a sequence was written to an array of edited or unedited sequences. Thus,

the sequences flanking all edited or unedited cytidines in the database are aligned in a

matrix. The size of the region to be analyzed was specified as an input to the program,

and was typically the 20 or 50 nucleotides flanking a cytidine (e.g. a 41 or 101-nucleotide

sequence was written to the matrix). Cytidines that were encountered in a FASTA entry

that had less than the specified region in either the 5’ or 3’ direction were ignored; thus

the first and last 20 (or 50) nucleotides of the coding sequences were eliminated from

analysis.

The arrays represent the alignment of all RNA sequence surrounding edited or

unedited cytidines, and were analyzed for the distribution of nucleotide sequences by

scanning one, two, or three nucleotide windows and calculating the number of times each

sequence was encountered in a specific position relative to a cytidine. As an example of

the output, Table 1 shows the distribution of dinucleotides around Arabidopsis

mitochondrial editing sites and unedited cytidines in the -2/-1 window. The frequency

that each dinucleotide is encountered adjacent to an edited or unedited cytidine (P, Q) is

the number of times that a dinucleotide is observed divided by the total number of edited

or unedited cytidines. The ratio of the frequencies that each dinucleotide is around an

edited and unedited cytidine is defined as the selectivity ratio (P/Q). Thus, a sequence

with a selectivity ratio of 1 has the same relative frequency around edited and unedited

cytidines, while a sequence with a selectivity ratio greater than 1 is more frequently

present around an editing site. Relative entropy was calculated as the Kullback-Leibler

distance by the equation d = Σ Pk log (Pk/Qk) over k terms (k = 4n) for the distribution of

nucleotides in 1, 2, or 3 nucleotide windows.

9

Random Editing Site Assignment

Random editing site assignment was used to compare the results of the mitochondrial

database with a random distribution of editing sites. The random editing site assignment

program scanned each FASTA formatted entry in the database and determined the

number and codon position of each of the editing sites. The program then randomly

selected a cytidine in the same codon position to be assigned as an editing site. Thus, the

random editing site assignment program maintained the number and codon position of

editing sites in a coding sequence. Statistics such as mean, standard deviation, variance,

and confidence intervals were determined from 1000 genome files with randomly

assigned editing sites.

Results

Nucleotide Distribution Around Edited and Unedited Cytidines

The distribution of nucleotides around edited and unedited cytidines was analyzed by

calculation of relative entropy to determine where information content existed within

these sequences. Figure 1 shows the relative entropy of edited and unedited cytidines for

Arabidopsis and Oryza mitochondrial coding sequences. The analysis was performed by

analysis of nucleotides in the 40 or 100 nucleotides flanking edited and unedited

cytidines. The 5% confidence interval for the relative entropy of each mitochondrial

genome was determined by 1000 iterations of random assignment of RNA editing sites

and calculation of the mean and standard deviation of the relative entropy values. The

Beta vulgaris and Brassica napus mitochondrial genomes were also analyzed, but are

provided in the supplemental material to improve figure clarity.

10

Figure 1 shows the relative entropy for the analysis of a one nucleotide window

over the entire 101 nucleotide segment. The relative entropy is extremely high in the

immediate vicinity of the editing site (nucleotides -2, -1, +1) and several peaks are

observed that exceed the 5% confidence interval in the -20 to +8 nucleotide region.

Figure 1B shows an expanded view of the -20 to +8 nucleotide region, and the relative

entropies of the Arabidopsis and Oryza mitochondrial genomes are very similar in this

region. The relative entropy of the two nucleotides immediately upstream of an editing

site is very large suggesting great importance of these nucleotides in editing site

recognition. In addition, the coincidence and magnitude of peaks in the relative entropy

profiles are very similar, suggesting similar regions are involved in editing site

recognition. Thus, the information content is very similar around RNA editing sites in

the dicot (Arabidopsis) and the monocot (Oryza) genomes. These taxa are thought to

have diverged about 150 MY ago (Chaw et al. 2004), and these results suggest that

similar editing site recognition mechanisms are utilized in these mitochondrial systems.

Analysis of the relative entropy around editing sites in two and three nucleotide

windows resulted in some important differences. For example, nucleotide position -5

shows a peak in relative entropy over the adjacent nucleotides when analyzed as a single

nucleotide (Fig. 1B). Uridines are enriched at the -5 position and the selectivity ratio is

very high (Fig. 2); however, the selectivity ratio of C at position 5 is not remarkable, nor

is the entropy or distribution of mononucleotides at -6 or -4 positions. When

dinucleotides are analyzed, the entropy analysis shows a broad peak that includes

dincleotides at -6/-5 and -5/-4 (Fig, 1C), and CU and CC are enriched at -6/-5 and UA

and CG are enriched at the -5/-4 position (Fig. 2). Finally, when trinucleotides are

11

analyzed, a large peak in the entropy profile is evident at trinucleotide -6/-5/-4 (Fig. 1D),

and CUA and CCG are enriched at these positions (Fig. 2). The trinucleotide CCG has a

greater selectivity ratio than CUA that includes the highly enriched U at position -5.

Thus, analysis of multiple adjacent nucleotides reveals that combinations of nucleotides

are enriched around the RNA editing sites that are not evident when single nucleotides

are analyzed. These results suggest that multiple contiguous nucleotides are recognized

by the editing apparatus, and that distinct combinations of nucleotides in regions with

high relative entropy may exist in the cis element of RNA editing sites.

Similar changes in the relative entropy profile are evident in the -12 to -10 region

and the -18 to -15 region. In summary, the relative entropy of nucleotide sequences

around RNA editing sites suggests that the greatest information is present immediately

upstream of editing sites (-2/-1), and additional information is present in the -18 to -14, -

13 to -10, -6 to -4, -2/-1, and +1/+2, regions.

The distribution of nucleotides is similar around plant mitochondrial and

chloroplast editing sites. Table 1 shows the distribution of dinucleotides in the -2/-1

window around plant mitochondrial editing sites. Panel A shows the number of times

each dinucleotide occupies the -2/-1 window upstream of an edited or an unedited

cytidine. P and Q, the frequencies that a dinucleotide is upstream of an edited or unedited

cytidine is calculated as the number of times that a specific dinucleotide is present

divided by the total number of edited or unedited cytidines. The ratio of these

frequencies (P/Q) is the selectivity ratio that expresses the relative frequency of a specific

dinucleotide around an edited or unedited cytidine.

12

The selectivity ratios for editing sites in the Arabidopsis, Beta, and Oryza

genomes are compared in columns 6, 7, and 8, respectively (Table 1). The selectivity

ratios are very similar for editing sites in the three genomes with about half of the

dinucleotides rarely observed upstream of an editing site (UA, UG, CA, GG, CG, AA,

GA, AG). These dinucleotides have very low selectivity ratios, and include all eight

dinucleotides with a purine in the -1 position. The dinucleotides with consistently high

selectivity ratios (UU, CU, UC, AU) are pyrimidine-pyrimidine or AU combinations.

Figure 3 compares the selectivity ratios for Arabidopsis with the Beta or Oryza

editing sites in the -2/-1 and +1/+2 windows (Fig. 3A and B respectively). Each point

represents the selectivity ratios for one of the sixteen dinucleotides. About half of the

dinucleotides exhibit low selectivity ratios in all three species, and as a result are

clustered near the origin. In addition, dinucleotides with high selectivity ratios in

Arabidopsis generally have high selectivity ratios in Beta and Oryza, and linear

regression of the selectivity ratios shows lines with slopes of nearly one and intercepts

very close to zero. The coefficient of determination (r2) between Arabidopsis and Oryza

selectivity ratios is 0.90 and indicates a very strong degree of correlation.

Figure 3B compares the selectivity ratios for Arabidopsis with the Beta or Oryza

editing sites in the +1/+2 window. The selectivity ratios also exhibit a strong correlation

in the+1/+2 window with a coefficient of determination of 0.68 between Arabidopsis and

Oryza editing sites. In contrast to the -2/-1 window, none of the selectivity ratios are very

small indicating that dinucleotides downstream to the editing site are not discriminated

against as highly as in the upstream position. The dinucleotides with high selectivity

ratios in the +1/+2 position are GG and GU.

13

The distribution of nucleotides around RNA editing sites in chloroplasts is very

similar to plant mitochondria. Table 1B and Figure 4 show selectivity ratios for the

distribution of dinucleotides in the -2/-1 window for Arabidopsis mitochondria compared

to editing sites in tobacco and maize chloroplast genomes. Eight of the dinucleotides are

clustered near the origin and are rarely observed upstream of editing sites in mitochondria

or in chloroplasts (AG, GA, AA, CG, GG, CA, UG, UA), and several dinucleotides are

frequently detected upstream of editing sites (UU, AU, CC) in both organelles.

Regression analysis of these values gives a coefficient of determination (r2) of 0.65

between the Arabidopsis mitochondrial and tobacco chloroplast and 0.73 between the

Arabidopsis mitochondria and Zea chloroplast editing sites, and indicates a moderate to

strong correlation. The similarity of nucleotide distribution around editing sites in dicots

and monocots and in both chloroplast and mitochondria suggests that common features

are necessary for editing site conversion in these diverse taxa and organelle systems.

Effect of Codon Position of an Editing Site

The distribution of editing sites in plant mitochondrial genomes is typically about

35%:55%:10% in the first, second and third positions of the codon (Giege and

Brennicke); thus the second codon position is over represented and the third codon

position is under represented in edited cytidines, and the sequence context of editing sites

may be influenced by codon position.

In order to directly assess the effect of codon position, relative entropy was

separately analyzed in a one nucleotide window for editing sites in the first, second, or

third position, and compared to unedited cytidines from the same codon position (Fig. 5).

If entropy values were strongly influenced by codon position, then the peaks and troughs

14

in the entropy profile would be expected to be displaced by one nucleotide. However, the

entropy profile in the 5’ flanking region is quite similar, especially for the first and

second positions that exhibit peaks at -1, -5, and -8/-9, and intervening troughs. In the -

10 to -20 region, many peaks coincide with a few differences; however, a strong single

nucleotide displacement of the profile is not evident. The analysis of a small number of

editing sites from the third codon position resulted in much larger fluctuations in the

entropy values, but showed similar trends. This analysis suggests that although there is

some influence of editing site position in the entropy value, information is similarly

embedded in the 5’ flanking region of editing sites irrespective of position in the codon.

Some codon position effects are evident, especially in the nucleotides

immediately downstream of an editing site. In both the Oryza and Arabidopsis genomes,

the downstream region exhibited a peak at the +1 nucleotide for editing sites in the

second position, and at the +2 nucleotide for editing sites in the first position. This

position represents the first downstream wobble position, and synonymous mutations

may allow optimization of the editing site for efficient editing, and would result in

increased entropy at these positions.

Editing Sites are Sporadically Distributed in Some Genes

Some coding sequences exhibited an unusual distribution of editing sites that appeared to

be clustered in groups and separated by gaps that lacked editing sites. In order to

systematically examine the distribution of RNA editing sites within individual coding

sequences, the interval between editing sites was calculated for all coding sequences

greater than 500 nucleotides that included at least three editing sites.

15

The variance of the intervals between editing sites for an individual coding

sequence was determined as a measurement of the distribution of editing site intervals

relative to the mean interval size, and was compared to the variances of 1000 randomly

assigned coding sequences. Table 2 shows p values for the analysis of coding sequences

in the three genomes, and 31%, 45%, and 35% of the coding sequences analyzed from

each genome exhibited a non-random distribution of editing sites with p values less than

0.05. A random distribution of editing sites would be expected to yield a p value of 0.05

for only 5% of the coding sequences, and would be expected only once for each of the

~20 coding sequences analyzed from each genome. These results demonstrate that an

unexpectedly large fraction of plant mitochondrial coding sequences exhibit a non-

random distribution of editing sites. The distribution of editing sites for several coding

sequences with small p values is graphically presented in Figure 6.

Discussion

Editing Site Sequence Context

Analysis of the information content around RNA editing sites in plant mitochondrial

transcripts suggests that groups of nucleotides in specific regions are important in editing

site recognition (Figure 7). The relative entropy immediately upstream and downstream

of an editing site is large and suggests that these regions are critical for editing site

recognition. Based on these results, it would appear that the simple motif “HYCGK”

represents a sequence that is likely to be edited in plant mitochondria. These

observations extend earlier studies based on single nucleotide analyses that concluded

16

that editing sites are frequently preceded by pyrimidines and rarely preceded by a

guanine (Maier et al. 1996; Giege and Brennicke 1999; Cummings and Myers 2004).

The distribution of nucleotides in the immediate 5’ flanking region of editing sites

in monocot and dicot mitochondria was shown to be remarkably similar (Table 1A) and

the selectivity ratios exhibit a strong correlation between monocots and dicots (Figure

3A). The distribution of dinucleotides in the -2/-1 window in chloroplasts editing sites of

a dicot (Nicotiana tabacum) and a monocot (Zea mays) is very similar to the distribution

of dinucleotides observed in plant mitochondria (Table 1B, Figure 4). Thus, a similar

distribution of nucleotides immediately upstream of RNA editing sites in monocots and

dicots and in both mitochondria and chloroplasts suggests that similar molecular systems

are involved and that a preferred editing site sequence context is shared among these

organisms and organelles. However, some differences are noted between the

mitochondrial and chloroplast systems: in chloroplast editing sites, the dinucleotide CU

is not prevalent in the -2/-1 window and the +1 nucleotide is more typically an A. These

may represent differences that distinguish the editing machinery in these two systems.

Monocots and dicots diverged 150 MY ago (Chaw et al. 2004), and chloroplast

trans-acting specificity factors are proposed to change rapidly in evolution (Schmitz-

Linneweber et al. 2001). In principle, a trans-acting RNA binding factor would be

expected to be able recognize virtually any sequence. Thus, the similarity of editing site

context that is maintained across diverse taxa and different organelles may reflect

common features in the mechanisms of editing. The sequence similarity around RNA

editing sites as well as the strong selection for and against nucleotides immediately

adjacent to editing sites in these disparate systems suggests that some cytidines may exist

17

in an “editable” context while other cytidines may exist in an “uneditable” context. Thus,

the immediate sequence context of a nucleotide may have an important impact on

whether a T to C mutation could be edited and has important consequences on how RNA

editing sites are acquired in evolution (Covello and Gray 1993).

Analysis of relative entropy suggests that information around RNA editing sites

exists in the -18 to 14, -13 to -10, -6 to -4, -2 to -1, and +1 to +2 regions. These regions

would be expected to be recognized by the editing machinery, and Figure 7 shows the

combinations of nucleotides most frequently observed in these positions based on

selectivity ratios. Analysis of the relative entropy in larger nucleotide windows exhibited

large increases over the relative entropy of randomly assigned editing sites, and several

contiguous nucleotides may be important in editing site recognition rather than an

individual nucleotides at specific positions. These results are consistent with the editing

site cluster model that proposes that groups of editing sites are recognized by the same

trans-acting factor (Chateigner-Boutin and Hanson 2002).

The groups of trinucleotides indicated in the -18 to -14 region overlap, and

suggest that a larger series of nucleotides may be important in editing site recognition.

For example, YUC, UCC, and CCU are frequently encountered at positions -18/-16, -17/-

15, and -16/-14, respectively (Figure 7). The pentanucleotide YUCCU is present at

nucleotide -18 to -14 in eleven editing sites of the 376 rice editing sites analyzed,

suggesting that this may represent a portion of an editing site recognition sequence.

Other 4 and 5 nucleotide combinations were noted in the Arabidopsis genome such as

YUACA (-18/-14) that may be important editing site recognition motifs.

Editing Site Recognition

18

RNA editing site recognition and conversion has been analyzed with an in vitro editing

system and by electroporation of intact mitochondria. Deletion of 5’ and 3’ sequences

suggests that nucleotide sequences from -20 to +10 are required for editing site

conversion (Takenaka, Neuwirt and Brennicke 2004; Neuwirt et al. 2005). These studies

examined an atp9 editing site and showed that five pentanucleotide nucleotide regions

between -25 and -1 were highly important to critical in editing site conversion, while

sequences in the -35 to -25 and +1 and greater were much less important. In addition, the

+1 nucleotide and was shown to be extremely important for editing site conversion, as

well as nucleotide deletions or insertions at -2. These results suggest that spacing

between cis-elements may be important and is a conclusion supported by the entropy

analyses in this study.

The cis-acting sequences of the two editing sites in wheat cox2 are proposed to be

present within -16 to +6 nucleotides of the editing site (Farre et al. 2001; Choury et al.

2004). Single nucleotide mutagenesis within the 23 nucleotide region of editing site C77

of the cox2 transcript demonstrated that residues at -11, -10, -9, -6, -2, and -1 were

critical for effective editing site conversion, while editing site C259 showed a similar

trend with critical residues at -12, -11, +1, +3, and +4. These positions correspond well

with regions identified as potentially important in editing site recognition in this study.

While the analysis of individual editing sites provides detailed information about an

individual editing site, this study has analyzed all editing sites of entire genomes, and

provides statistical information about the features of a “typical” editing site.

Editing Sites Distribution

19

A statistical analysis of the intervals between RNA editing sites demonstrated that coding

sequences frequently include groups of editing sites that are separated by gaps with no

editing sites. The mechanism of editing site acquisition is proposed to involve T to C

mutation in the genome that can be corrected by C-to-U editing (Covello and Gray 1993;

Schmitz-Linneweber et al. 2001; Schmitz-Linneweber et al. 2002; Tillich et al. 2005).

Conversely, the simplest mechanism of editing site loss would be a C to T mutation at an

edited C, such that the edited cytidine was lost from the genome. These mechanisms

would be predicted to occur randomly within a gene; however, the distribution of editing

sites is frequently non-random with large gaps between clustered RNA editing sites.

The molecular process that results in the sporadic distribution of editing sites in

these genes is open for speculation. The distribution of editing sites in the matR coding

sequence has been previously noted to correspond to regions that encode the reverse

transcriptase and maturase domains of this protein (Thomson et al. 1994; Begu et al.

1998), and the distribution of editing sites could be related to maintenance of these

functions. Alternatively, the targeting of one region of a transcript by the editing

apparatus might facilitate editing of additional T to C mutations in that region, and

consequently the acquisition of editing sties might tend to occur in groups.

Figure 8 illustrates a possible mechanism for the generation of the sporadic

distribution of editing sites. Loss of RNA editing sites from relatively large regions of a

coding sequence could occur through retroconversion that would remove adjacent editing

sites by replacement with the edited sequence information. This process would

presumably require conversion of edited mRNA to cDNA through reverse transcription,

and there is limited evidence for reverse transcriptase activity in plant mitochondria

20

(Wahleithner, MacFarlane and Wolstenholme 1990; Begu et al. 1998; Farre and Araya

1999). Recombination or gene conversion could integrate the edited information into the

genome, and this process is thought to happen readily in plant mitochondria (Knoop

2004).

A number of individual examples of loss of editing sites within a plant

mitochondrial transcripts have provided examples that may have occurred through

retroconversion. Editing in cox3 and rps13 transcripts was completely or nearly

eliminated in the Iridaceae and Amarylliadaceae, yet these transcripts include numerous

editing sites in related dicots (Lopez, Picardi and Quagliariello 2007). A similar example

involved the loss of editing sties from cox1 in several gymnosperm taxa, yet cox1 is

heavily edited with 25 to 34 editing sites in related species (Lu, Szmidt and Wang 1998).

Two compelling examples involve the loss of introns and the adjacent editing sites in the

Caryophylales and the Asterales. The nad4 gene lost an intron in the Caryophylaceae,

and editing sites were eliminated near the newly created exon-exon boundary (Itchoda et

al. 2002), while nad4 transcripts in Lactuca lost two introns as well as the RNA editing

sites in that region of the gene (Geiss, Abbas and Makaroff 1994). The simultaneous loss

of both introns and the adjacent editing sites strongly suggests that the process involved

recombination with a spliced and edited intermediate.

Numerous examples of gene transfer from the mitochondrial to the nuclear

genome demonstrate that the nuclear forms of these transferred genes have lost

mitochondrial introns and editing sites (Nugent and Palmer 1991; Kadowaki et al. 1996).

Mitochondrial gene transfer to the nucleus would require RNA-mediated transfer through

a cDNA intermediate, or wholesale loss of editing sites and introns in the mitochondrial

21

genome prior to DNA-mediated gene transfer (Henze and Martin 2001). Thus, RNA

editing represents an obstacle to gene transfer to the nucleus, and may contribute to the

retention of genes in plant mitochondrial genomes. Loss of editing sites in the

mitochondrial genome would facilitate DNA-mediated gene transfer to the nucleus

(Henze and Martin 2001). Both DNA-mediated and RNA-mediated gene transfer would

be facilitated by mechanisms related to retroconversion, either by removal of editing sites

to facilitate DNA-mediate gene transfer, or as a mechanism to produce a cDNA for

integration in the nuclear genome.

While individual examples of editing site and intron loss suggest that

retroconversion has occurred in specific taxonomic groups, the statistical analysis of the

distribution of editing sites in mitochondrial genomes demonstrates that numerous genes

exhibit a sporadic distribution of editing sites. Taken together, these observations suggest

that loss of editing sites has occurred periodically and may have important consequences

in the evolution of plant mitochondrial genomes.

Supplemental Information

Text files of DNA sequence information for the coding sequences of the Arabidopsis,

Beta, and Oryza mitochondrial genomes are provided. The files are in fasta format and

edited cytidines are represented as an upper case “C”. Figure 1E-H shows relative

entropy analyses for editing sites in the Brassica and Beta genomes. A detailed table of

trinucleotides and selectivity ratios around RNA editing sites is provided as a supplement

to Figure 7.

22

Acknowledgements

The authors are grateful to Dr. Brandon Gaut for assistance with statistical analyses,

experimental design, and thoughtful discussion. Kenneth LC Chang and Chia Ching

Chou each contributed equally to this work. Ms. Nam Nguyen provided excellent

technical assistance.

References

Begu D, Mercado A, Farre JC, Moenne A, Holuigue L, Araya A, Jordana X. 1998.

Editing status of mat-r transcripts in mitochondria from two plant species: C-to-U changes occur in putative functional RT and maturase domains. Curr Genet 33:420-428.

Bock R, Hermann M, Fuchs M. 1997. Identification of critical nucleotide positions for plastid RNA editing site recognition. Rna 3:1194-1200.

Bock R, Hermann M, Kossel H. 1996. In vivo dissection of cis-acting determinants for plastid RNA editing. Embo J 15:5052-5059.

Chateigner-Boutin AL, Hanson MR. 2002. Cross-competition in transgenic chloroplasts expressing single editing sites reveals shared cis elements. Mol Cell Biol 22:8448-8456.

Chateigner-Boutin AL, Hanson MR. 2003. Developmental co-variation of RNA editing extent of plastid editing sites exhibiting similar cis-elements. Nucleic Acids Res 31:2586-2594.

Chaudhuri S, Carrer H, Maliga P. 1995. Site-specific factor involved in the editing of the psbL mRNA in tobacco plastids. Embo J 14:2951-2957.

Chaudhuri S, Maliga P. 1996. Sequences directing C to U editing of the plastid psbL mRNA are located within a 22 nucleotide segment spanning the editing site. Embo J 15:5958-5964.

Chaw SM, Chang CC, Chen HL, Li WH. 2004. Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes. J Mol Evol 58:424-441.

Choury D, Farre JC, Jordana X, Araya A. 2004. Different patterns in the recognition of editing sites in plant mitochondria. Nucleic Acids Res 32:6397-6406.

Covello PS, Gray MW. 1989. RNA editing in plant mitochondria. Nature 341:662-666. Covello PS, Gray MW. 1993. On the evolution of RNA editing. Trends Genet 9:265-268. Cummings MP, Myers DS. 2004. Simple statistical models predict C-to-U edited sites in

plant mitochondrial RNA. BMC Bioinformatics 5:132.

23

Farre JC, Araya A. 1999. The mat-r open reading frame is transcribed from a non-canonical promoter and contains an internal promoter to co-transcribe exons nad1e and nad5III in wheat mitochondria. Plant Mol Biol 40:959-967.

Farre JC, Leon G, Jordana X, Araya A. 2001. cis Recognition elements in plant mitochondrion RNA editing. Mol Cell Biol 21:6731-6737.

Geiss KT, Abbas GM, Makaroff CA. 1994. Intron loss from the NADH dehydrogenase subunit 4 gene of lettuce mitochondrial DNA: evidence for homologous recombination of a cDNA intermediate. Mol Gen Genet 243:97-105.

Giege P, Brennicke A. 1999. RNA editing in Arabidopsis mitochondria effects 441 C to U changes in ORFs. Proc Natl Acad Sci U S A 96:15324-15329.

Gualberto JM, Lamattina L, Bonnard G, Weil JH, Grienenberger JM. 1989. RNA editing in wheat mitochondria results in the conservation of protein sequences. Nature 341:660-662.

Handa H. 2003. The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. Nucleic Acids Res 31:5907-5916.

Hayes ML, Hanson MR. 2007. Identification of a sequence motif critical for editing of a tobacco chloroplast transcript. Rna 13:281-288.

Henze K, Martin W. 2001. How do mitochondrial genes get into the nucleus? Trends Genet 17:383-387.

Hermann M, Bock R. 1999. Transfer of plastid RNA-editing activity to novel sites suggests a critical role for spacing in editing-site recognition. Proc Natl Acad Sci U S A 96:4856-4861.

Hiesel R, Wissinger B, Schuster W, Brennicke A. 1989. RNA editing in plant mitochondria. Science 246:1632-1634.

Itchoda N, Nishizawa S, Nagano H, Kubo T, Mikami T. 2002. The sugar beet mitochondrial nad4 gene: an intron loss and its phylogenetic implication in the Caryophyllales. Theor Appl Genet 104:209-213.

Kadowaki K, Kubo N, Ozawa K, Hirai A. 1996. Targeting presequence acquisition after mitochondrial gene transfer to the nucleus occurs by duplication of existing targeting signals. Embo J 15:6652-6661.

Knoop V. 2004. The mitochondrial DNA of land plants: peculiarities in phylogenetic perspective. Curr Genet 46:123-139.

Kotera E, Tasaka M, Shikanai T. 2005. A pentatricopeptide repeat protein is essential for RNA editing in chloroplasts. Nature 433:326-330.

Kubo T, Nishizawa S, Sugawara A, Itchoda N, Estiati A, Mikami T. 2000. The complete nucleotide sequence of the mitochondrial genome of sugar beet (Beta vulgaris L.) reveals a novel gene for tRNA(Cys)(GCA). Nucleic Acids Res 28:2571-2576.

Kugita M, Kaneko A, Yamamoto Y, Takeya Y, Matsumoto T, Yoshinaga K. 2003a. The complete nucleotide sequence of the hornwort (Anthoceros formosae) chloroplast genome: insight into the earliest land plants. Nucleic Acids Res 31:716-721.

Kugita M, Yamamoto Y, Fujikawa T, Matsumoto T, Yoshinaga K. 2003b. RNA editing in hornwort chloroplasts makes more than half the genes functional. Nucleic Acids Res 31:2417-2423.

24

Lopez L, Picardi E, Quagliariello C. 2007. RNA editing has been lost in the mitochondrial cox3 and rps13 mRNAs in Asparagales. Biochimie 89:159-167.

Lu MZ, Szmidt AE, Wang XR. 1998. RNA editing in gymnosperms and its impact on the evolution of the mitochondrial coxI gene. Plant Mol Biol 37:225-234.

Maier RM, Neckermann K, Igloi GL, Kossel H. 1995. Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J Mol Biol 251:614-628.

Maier RM, Zeltz P, Kossel H, Bonnard G, Gualberto JM, Grienenberger JM. 1996. RNA editing in plant mitochondria and chloroplasts. Plant Mol Biol 32:343-365.

Miyamoto T, Obokata J, Sugiura M. 2002. Recognition of RNA editing sites is directed by unique proteins in chloroplasts: biochemical identification of cis-acting elements and trans-acting factors involved in RNA editing in tobacco and pea chloroplasts. Mol Cell Biol 22:6726-6734.

Mower JP. 2005. PREP-Mt: predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics 6:96.

Mower JP, Palmer JD. 2006. Patterns of partial RNA editing in mitochondrial genes of Beta vulgaris. Mol Genet Genomics 276:285-293.

Mulligan RM. 2004. RNA Editing in Plant Organelles. Pp. 239-260 in H. Daniel, and C. Chase, eds. Molecular Biology and Biotechnology of Plant Organelles. Springer, Dordrecht, NL.

Neuwirt J, Takenaka M, van der Merwe JA, Brennicke A. 2005. An in vitro RNA editing system from cauliflower mitochondria: editing site recognition parameters can vary in different plant species. Rna 11:1563-1570.

Notsu Y, Masood S, Nishikawa T, Kubo N, Akiduki G, Nakazono M, Hirai A, Kadowaki K. 2002. The complete sequence of the rice (Oryza sativa L.) mitochondrial genome: frequent DNA sequence acquisition and loss during the evolution of flowering plants. Mol Genet Genomics 268:434-445.

Nugent JM, Palmer JD. 1991. RNA-mediated transfer of the gene coxII from the mitochondrion to the nucleus during flowering plant evolution. Cell 66:473-481.

Reed ML, Hanson MR. 1997. A heterologous maize rpoB editing site is recognized by transgenic tobacco chloroplasts. Mol Cell Biol 17:6948-6952.

Reed ML, Peeters NM, Hanson MR. 2001. A single alteration 20 nt 5' to an editing target inhibits chloroplast RNA editing in vivo. Nucleic Acids Res 29:1507-1513.

Schmitz-Linneweber C, Regel R, Du TG, Hupfer H, Herrmann RG, Maier RM. 2002. The plastid chromosome of Atropa belladonna and its comparison with that of Nicotiana tabacum: the role of RNA editing in generating divergence in the process of plant speciation. Mol Biol Evol 19:1602-1612.

Schmitz-Linneweber C, Tillich M, Herrmann RG, Maier RM. 2001. Heterologous, splicing-dependent RNA editing in chloroplasts: allotetraploidy provides trans-factors. Embo J 20:4874-4883.

Schmitz-Linneweber C, Williams-Carrier RE, Williams-Voelker PM, Kroeger TS, Vichas A, Barkan A. 2006. A pentatricopeptide repeat protein facilitates the trans-splicing of the maize chloroplast rps12 pre-mRNA. Plant Cell 18:2650-2663.

Sugiura M. 1995. The chloroplast genome. Essays Biochem 30:49-57. Takenaka M, Neuwirt J, Brennicke A. 2004. Complex cis-elements determine an RNA

editing site in pea mitochondria. Nucleic Acids Res 32:4137-4144.

25

Thomson MC, Macfarlane JL, Beagley CT, Wolstenholme DR. 1994. RNA editing of mat-r transcripts in maize and soybean increases similarity of the encoded protein to fungal and bryophyte group II intron maturases: evidence that mat-r encodes a functional protein. Nucleic Acids Res 22:5745-5752.

Tillich M, Funk HT, Schmitz-Linneweber C, Poltnigg P, Sabater B, Martin M, Maier RM. 2005. Editing of plastid RNA in Arabidopsis thaliana ecotypes. Plant J 43:708-715.

Wahleithner JA, MacFarlane JL, Wolstenholme DR. 1990. A sequence encoding a maturase-related protein in a group II intron of a plant mitochondrial nad1 gene. Proc Natl Acad Sci U S A 87:548-552.

26

Table 1. Distribution of dinucleotides in the -2/-1 window in mitochondrial (A) and chloroplast (B) editing sites.

A. Arabidopsis Betaa Oryzaa

dinucleotide # edited #unedited P Q P/Q P/Q P/Q UU 117 754 0.290 0.126 2.29 2.13 2.26 CU 54 400 0.134 0.067 1.99 2.40 2.35 UC 60 457 0.149 0.077 1.94 2.21 2.38 AU 61 489 0.151 0.082 1.84 1.60 1.50 AC 28 325 0.069 0.054 1.27 1.07 0.57 CC 23 295 0.057 0.049 1.15 1.46 1.37 GU 20 304 0.050 0.051 0.97 0.86 1.23 GC 15 276 0.037 0.046 0.80 0.75 0.47 UG 8 386 0.020 0.065 0.31 0.18 0.28 UA 8 413 0.020 0.069 0.29 0.45 0.30 CG 2 229 0.005 0.038 0.13 0.16 0.07 AA 3 381 0.007 0.064 0.12 0.04 0.04 CA 2 280 0.005 0.047 0.11 0.06 0.22 GA 1 285 0.002 0.048 0.05 0.07 0.06 GG 1 302 0.002 0.051 0.05 0.00 0.16 AG 1 389 0.002 0.065 0.04 0.00 0.00 Sum 404 5965

B.

Mitochondria Chloroplast At mt Nt ct Zm ct

dinucleotide P/Q P/Q P/Q UU 2.29 3.10 2.64 CU 1.99 0.64 1.52 UC 1.94 1.96 1.14 AU 1.84 2.26 2.23 AC 1.27 0.66 0.79 CC 1.15 2.50 1.84 GU 0.97 0 2.02 GC 0.80 0 0 UG 0.31 0 0 UA 0.29 0 0.52 CG 0.13 0 0 AA 0.12 0 0 CA 0.11 0 0 GA 0.05 0 0 GG 0.05 0 0 AG 0.04 0 0

a The number of edited and unedited cytidines analyzed in the Arabidopsis, Beta, and Oryza genomes is 404/5965; 332/5161; and 376/4683, respectively.

27

Table 2. Distribution of RNA editing sites in Mitochondrial Genesb.

Gene p value Gene p value Gene p value Atmt matR 0.002 Bvmt nad6 0.002 Osmt cox1 0.001 Atmt atp1 0.006 Bvmt matR 0.003 Osmt nad6 0.001 Atmt rps4 0.015 Bvmt ccmFn 0.010 Osmt atp6 0.008 Atmt nad5 Ex2 0.017 Bvmt ccmFc Ex1 0.014 Osmt rps2 0.011 Atmt ccmFc Ex1 (ccb452) 0.019 Bvmt nad5 Ex2 0.027 Osmt rps4 0.032 Atmt nad6 0.037 Bvmt ccmB 0.033 Osmt ccmFn 0.040 Atmt ccmB (ccb206) 0.039 Bvmt atp1 0.039 Bvmt rpl5 0.044 Bvmt atp4 0.045 Atmt rpl5 0.083 Bvmt rps4 0.063 Osmt ccmFc Ex1 0.196 Atmt rps3 Ex2 0.165 Bvmt ccmFc Ex2 0.122 Osmt nad9 0.222 Atmt ccmC (ccb256) 0.181 Bvmt nad4 Ex2 0.297 Osmt atp4 0.236 Atmt tatC 0.188 Bvmt cox3 0.365 Osmt tatC 0.327 Atmt rpl16 0.256 Bv nad9 0.455 Osmt rpl16 0.454 Atmt ccmFc Ex2 (ccb452) 0.298 Bvmt ccmC 0.537 Osmt nad2 Ex4 0.527 Atmt ccmFn (ccb382) 0.318 Bvmt atp6 0.616 Osmt ccmC 0.569 Atmt atp4 0.365 Bvmt nad2 Ex4 0.713 Osmt ccmB 0.607 Atmt cox2 Ex1 0.460 Bvmt rps3 0.716 Osmt cob 0.607 Atmt nad2 Ex4 0.559 Bvmt cob 0.778 Osmt rps1 0.674 Atmt ccb203 (ccmFN2) 0.562 Bvmt mttB (tatC) 0.833 Osmt ccmFc Ex2 0.952 Atmt cox3 0.597 Atmt cob 0.818 Atmt nad4 Ex2 0.864 Atmt nad9 0.947

b Protein coding sequences or exons greater than 500 nucleotides and with three or more editing sites were evaluated for editing site distribution. The observed variance for the intervals between editing sites was compared to the variance of 1000 trials of random editing site assignment.

28

Figure Legends.

Figure 1. Nucleotide sequences around RNA editing sites in monocot and dicot

mitochondria have similar entropy profiles. Relative entropy for the distribution of

nucleotides is plotted for 50 nucleotides flanking RNA editing sites (panel A) or -20 to

+8 nucleotide window in 1, 2, or 3 nucleotide windows (panels B, C, D) for Arabidopsis

and Oryza mitochondrial genomes. Random editing site assignment was used to produce

a randomly edited mitochondrial genome files and relative entropy analysis of 1000

random assignments was used to determine a mean relative entropy value and a 5%

confidence interval. The Brassica napus and Beta vulgaris mitochondrial genome were

also analyzed, and these results are provided in the supplemental information.

Figure 2. Specificity ratios for Arabidopsis editing sites suggest that multiple

contiguous nucleotides are important in editing site recognition in the -6 to -4

region. Selectivity ratios for mono-, di- and tri-nucleotides in the -6 to -4 region are

shown in the top, middle and bottom of the figure. The selectivity ratio for uridine at -5

is very high; however, the distribution of C at -5 and other mononucleotides at -4 and -6

are not notable. Selectivity ratios for dinucleotides at -6/-5 show that CU and CC are

enriched and at -5/-4 UA and CG are enriched around editing sites. The trinucleotide

CCG has a greater selectivity ratio than CUA that includes the highly enriched U at

position -5.

Figure 3. Selectivity ratios of dinucleotides near RNA editing sites are similar in

Arabidopsis, Beta, and Oryza mitochondrial genomes. A. The selectivity ratio (P/Q)

29

for dinucleotides in the -2/-1 window upstream of edited and unedited cytidines are

plotted as the selectivity ratio (P/Q) observed in Arabidopsis against the Oryza or Beta

values. Thus, each point represents the selectivity ratio for a specific dinucleotide, and a

large number of values are clustered near the origin. Regression analysis of the Oryza

(Os) and the Arabidopsis (At) selectivity ratios gives an equation of y = 1.003x - 0.03

with a coefficient of determination of r2 = 0.90. Regression analysis of the Beta (Bv) and

the Arabidopsis (At) selectivity ratios gives an equation of y = 1.04x - 0.02 with a

coefficient of determination of r2 = 0.96. B. The selectivity ratio (P/Q) for dinucleotides

in the +1/+2 window are plotted as the selectivity ratio (P/Q) observed in Arabidopsis

versus the Oryza or Beta. Regression analysis of the Oryza (Os) and the Arabidopsis (At)

selectivity ratios gives an equation of y = 1.03x - 0.03 with a coefficient of determination

of r2 = 0.69. Regression analysis of the Beta (Bv) and the Arabidopsis (At) selectivity

ratios gives an equation of y = 1.27x - 0.24 with a coefficient of determination of r2 =

0.84.

Figure 4. Selectivity ratios for dinucleotides upstream of RNA editing sites are

similar in mitochondrial and chloroplast genomes. The selectivity ratio (P/Q) for

dinucleotides in the -2/-1 window upstream of edited and unedited cytidines are plotted

as the P/Q value observed in the Arabidopsis genome versus the Nicotiana (Nt) or Zea

mays (Zm) genome on the Y axis. Thus, each point represents the selectivity ratio for a

specific dinucleotide, and a large number of values are clustered near the origin.

Regression analysis of the Nicotiana chloroplast (Nt ct) and the Arabidopsis

mitochondrial (At mt) selectivity ratios gives an equation of y = 1.07x- 0.19 with a

30

coefficient of determination of r2 = 0.65. Regression analysis of the Zea mays chloroplast

(Zm ct) and the Arabidopsis mitochondrial (At mt) selectivity ratios gives an equation of

y = 1.00 x - 0.03 with a coefficient of determination of r2 = 0.73.

Figure 5. The Effect of Codon Position on Relative Entropy. Relative entropy was

determined in a one nucleotide window for editing sites in the first, second or third codon

position (CPA1, 2, and 3) in the in the Arabidopsis (A) and Oryza genomes (B). The

number of editing sites analyzed in the first, second, and third codon position was 103,

163, and 26 in the Arabidopsis genome and122, 171, and 32 in the Oryza genome,

respectively.

Figure 6. Editing Sites are Sporadically Distributed in Plant Mitochondrial Genes.

The distribution of editing sites in coding sequences that exhibit non-random distribution

of RNA editing sites is illustrated on a line graph. The positions of RNA editing sites are

shown as vertical lines on a line representing the length of the coding sequence. The

average size of the largest gap for the genes that exhibited p values less than 0.05 was

533, 545, and 559 nucleotides in the Arabidopsis, Beta, and Oryza genomes, respectively.

Figure 7. Model of RNA Editing Site Recognition. A model for the interaction of the

editing apparatus with an editing substrate is shown. The edited cytidine is shown as a

bolded C, and regions where relative entropy is high are shown as an upper case N.

Groups of nucleotides that are frequently present in these positions are shown under the

RNA sequence. The groups of di- and tri-nucleotides noted in this figure show high

selectivity ratios that exceed the 5% confidence interval of the mean and standard

31

deviation of the selectivity ratios determined from randomly assigned editing sites.

Trinucleotides marked with a single asterisk were only significant in the monocot, Oryza,

and trinucleotides marked with two asterisks were significant in Arabidopsis.

Nucleotides with no asterisk were significant in both taxa.

Figure 8. Gene Conversion Model of Editing Site Loss Resulting in the Clustered

Distribution of RNA editing sites. Gene conversion events in the mitochondrial

genome that utilized cDNA sequence derived from edited mRNA would convert the

cytidines at editing sites to thymidines that would not require editing. This process

would eliminate editing sites with the region that experienced gene conversion, and

create stretches of coding sequence with no editing sites and would leave clusters of

editing sites within regions that had not experienced gene conversion.

32

Figure 1ABCD

33

Figure 2.

Figure 3.

34

Figure 4.

Fig. 5

35

Fig. 6

Fig. 7

Fig. 8