Architecture and secondary structure of an entire HIV-1 RNA ......(Fig. 1a, grey boxes). Inside...

20
ARTICLES Architecture and secondary structure of an entire HIV-1 RNA genome Joseph M. Watts 1 , Kristen K. Dang 2 , Robert J. Gorelick 5 , Christopher W. Leonard 1 , Julian W. Bess Jr 5 , Ronald Swanstrom 3 , Christina L. Burch 4 & Kevin M. Weeks 1 Single-stranded RNA viruses encompass broad classes of infectious agents and cause the common cold, cancer, AIDS and other serious health threats. Viral replication is regulated at many levels, including the use of conserved genomic RNA structures. Most potential regulatory elements in viral RNA genomes are uncharacterized. Here we report the structure of an entire HIV-1 genome at single nucleotide resolution using SHAPE, a high-throughput RNA analysis technology. The genome encodes protein structure at two levels. In addition to the correspondence between RNA and protein primary sequences, a correlation exists between high levels of RNA structure and sequences that encode inter-domain loops in HIV proteins. This correlation suggests that RNA structure modulates ribosome elongation to promote native protein folding. Some simple genome elements previously shown to be important, including the ribosomal gag-pol frameshift stem-loop, are components of larger RNA motifs. We also identify organizational principles for unstructured RNA regions, including splice site acceptors and hypervariable regions. These results emphasize that the HIV-1 genome and, potentially, many coding RNAs are punctuated by previously unrecognized regulatory motifs and that extensive RNA structure constitutes an important component of the genetic code. Genomes of all single-stranded RNA viruses contain internal structures fundamental to viral replication and host defence evasion. Important viral RNA structures include internal ribosome entry sites, packaging signals, pseudoknots, transfer RNA mimics, ribosomal frameshift motifs, and cis-regulatory elements 1,2 . In the human immunodefi- ciency virus (HIV), RNA structures activate transcription, initiate reverse transcription, facilitate genomic dimerization, direct HIV packaging, manipulate reading frames, regulate RNA nuclear export, signal polyadenylation, and interact with viral and host proteins 2–6 . These RNA regulatory motifs have been identified by focusing on the 59 and 39 untranslated regions plus a few internal sequences. Most potential regulatory structures within viral RNA genomes, including in ,85% of the HIV-1 genome, are uncharacterized. This raises the possibility that new categories of RNA structure-mediated regulation remain to be identified. The HIV-1 genome is primarily a coding RNA and contains nine open reading frames that produce 15 proteins 2,3 . The Gag poly- protein precursor is proteolytically processed to generate the matrix (MA), capsid (CA), nucleocapsid (NC) and p6 proteins. The Gag-Pol polyprotein contains protease (PR), reverse transcriptase (RT) and integrase (IN). The env gene encodes a 30-amino-acid signal peptide (SP), gp120 and gp41. Other sequences encode auxiliary proteins (Fig. 1a, grey boxes). Inside virions, HIV genomic RNA is found as a non-covalent dimer, is 59 capped and 39 polyadenylated, and is annealed to a host tRNA Lys3 molecule 2 . Viral proteins, especially nucleocapsid, chaperone the folding of HIV RNA 7 . Whole-genome structure analysis To develop an accurate view of RNA structure in the full-length genome, we analysed authentic genomic RNA extracted from HIV- 1 virions. Our gentle purification maintained both previously char- acterized secondary structures and the few known RNA tertiary structures. For example, the host tRNA Lys3 was bound to the genome 2 and a pseudoknot in the 59 untranslated region (UTR) 6,8 remained stably formed. The RNA was sufficiently intact to act as a template for primer extension reactions spanning the entire genome (Supplemen- tary Table 1 and Methods). High-throughput selective 29-hydroxyl acylation analysed by primer extension (SHAPE) 6,9–11 was used to chemically interrogate local nucleotide flexibility at 99.4% of the 9,173 nucleotides in the NL4-3 HIV-1 RNA genome. 1-methyl-7-nitroisatoic anhydride (1M7) preferentially acylates conformationally flexible nucleotides at the ribose 29-OH position 9,10 . The resulting 29-O-adducts are detected as stops to primer extension using fluorescently labelled primers and capillary electrophoresis 6,10 (Fig. 3a) and are quantified by whole-trace Gaussian integration 11 (Fig. 3b). SHAPE measure- ments are reproducible between independent biological replicates (R 2 5 0.75; Supplementary Fig. 1). SHAPE reactivities are highly sensitive to local nucleotide flexibility and disorder, but are insens- itive to solvent accessibility 9,12 (Supplementary Fig. 2). SHAPE reactivities therefore provide direct model-free informa- tion about the overall level of structure, or architecture, for any RNA. The median SHAPE reactivity varies markedly across the HIV-1 genome (Fig. 1b, dark blue line). Regions with median reactivities below 0.25 indicate domains with substantial base-paired secondary RNA structure, whereas median SHAPE reactivities of 0.5 and greater indicate regions of largely unstructured nucleotides. We also assessed HIV-1 genome structure by examining evolu- tionary information contained in nucleotide and amino acid vari- ation to assign a pairing probability at each nucleotide 13,14 . This algorithm does not use chemical reactivity or thermodynamic information, and thus infers RNA structure using information that is orthogonal to SHAPE. We identify at least 10 ‘structured’ regions that exhibit both low SHAPE reactivity and high pairing probability (Fig. 1b, compare dark 1 Department of Chemistry, 2 Department of Biomedical Engineering, 3 Linenberger Cancer Center, 4 Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA. 5 AIDS and Cancer Virus Program, SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702-1201, USA. Vol 460 | 6 August 2009 | doi:10.1038/nature08237 711 Macmillan Publishers Limited. All rights reserved ©2009

Transcript of Architecture and secondary structure of an entire HIV-1 RNA ......(Fig. 1a, grey boxes). Inside...

  • ARTICLES

    Architecture and secondary structure ofan entire HIV-1 RNA genomeJoseph M. Watts1, Kristen K. Dang2, Robert J. Gorelick5, Christopher W. Leonard1, Julian W. Bess Jr5,Ronald Swanstrom3, Christina L. Burch4 & Kevin M. Weeks1

    Single-stranded RNA viruses encompass broad classes of infectious agents and cause the common cold, cancer, AIDS andother serious health threats. Viral replication is regulated at many levels, including the use of conserved genomic RNAstructures. Most potential regulatory elements in viral RNA genomes are uncharacterized. Here we report the structure of anentire HIV-1 genome at single nucleotide resolution using SHAPE, a high-throughput RNA analysis technology. The genomeencodes protein structure at two levels. In addition to the correspondence between RNA and protein primary sequences, acorrelation exists between high levels of RNA structure and sequences that encode inter-domain loops in HIV proteins. Thiscorrelation suggests that RNA structure modulates ribosome elongation to promote native protein folding. Some simplegenome elements previously shown to be important, including the ribosomal gag-pol frameshift stem-loop, are componentsof larger RNA motifs. We also identify organizational principles for unstructured RNA regions, including splice site acceptorsand hypervariable regions. These results emphasize that the HIV-1 genome and, potentially, many coding RNAs arepunctuated by previously unrecognized regulatory motifs and that extensive RNA structure constitutes an importantcomponent of the genetic code.

    Genomes of all single-stranded RNA viruses contain internal structuresfundamental to viral replication and host defence evasion. Importantviral RNA structures include internal ribosome entry sites, packagingsignals, pseudoknots, transfer RNA mimics, ribosomal frameshiftmotifs, and cis-regulatory elements1,2. In the human immunodefi-ciency virus (HIV), RNA structures activate transcription, initiatereverse transcription, facilitate genomic dimerization, direct HIVpackaging, manipulate reading frames, regulate RNA nuclear export,signal polyadenylation, and interact with viral and host proteins2–6.These RNA regulatory motifs have been identified by focusing onthe 59 and 39 untranslated regions plus a few internal sequences.Most potential regulatory structures within viral RNA genomes,including in ,85% of the HIV-1 genome, are uncharacterized. Thisraises the possibility that new categories of RNA structure-mediatedregulation remain to be identified.

    The HIV-1 genome is primarily a coding RNA and contains nineopen reading frames that produce 15 proteins2,3. The Gag poly-protein precursor is proteolytically processed to generate the matrix(MA), capsid (CA), nucleocapsid (NC) and p6 proteins. The Gag-Polpolyprotein contains protease (PR), reverse transcriptase (RT) andintegrase (IN). The env gene encodes a 30-amino-acid signal peptide(SP), gp120 and gp41. Other sequences encode auxiliary proteins(Fig. 1a, grey boxes). Inside virions, HIV genomic RNA is found asa non-covalent dimer, is 59 capped and 39 polyadenylated, and isannealed to a host tRNALys3 molecule2. Viral proteins, especiallynucleocapsid, chaperone the folding of HIV RNA7.

    Whole-genome structure analysis

    To develop an accurate view of RNA structure in the full-lengthgenome, we analysed authentic genomic RNA extracted from HIV-1 virions. Our gentle purification maintained both previously char-acterized secondary structures and the few known RNA tertiary

    structures. For example, the host tRNALys3 was bound to the genome2

    and a pseudoknot in the 59 untranslated region (UTR)6,8 remainedstably formed. The RNA was sufficiently intact to act as a template forprimer extension reactions spanning the entire genome (Supplemen-tary Table 1 and Methods).

    High-throughput selective 29-hydroxyl acylation analysed byprimer extension (SHAPE)6,9–11 was used to chemically interrogatelocal nucleotide flexibility at 99.4% of the 9,173 nucleotides in theNL4-3 HIV-1 RNA genome. 1-methyl-7-nitroisatoic anhydride(1M7) preferentially acylates conformationally flexible nucleotidesat the ribose 29-OH position9,10. The resulting 29-O-adducts aredetected as stops to primer extension using fluorescently labelledprimers and capillary electrophoresis6,10 (Fig. 3a) and are quantifiedby whole-trace Gaussian integration11 (Fig. 3b). SHAPE measure-ments are reproducible between independent biological replicates(R2 5 0.75; Supplementary Fig. 1). SHAPE reactivities are highlysensitive to local nucleotide flexibility and disorder, but are insens-itive to solvent accessibility9,12 (Supplementary Fig. 2).

    SHAPE reactivities therefore provide direct model-free informa-tion about the overall level of structure, or architecture, for any RNA.The median SHAPE reactivity varies markedly across the HIV-1genome (Fig. 1b, dark blue line). Regions with median reactivitiesbelow 0.25 indicate domains with substantial base-paired secondaryRNA structure, whereas median SHAPE reactivities of 0.5 and greaterindicate regions of largely unstructured nucleotides.

    We also assessed HIV-1 genome structure by examining evolu-tionary information contained in nucleotide and amino acid vari-ation to assign a pairing probability at each nucleotide13,14. Thisalgorithm does not use chemical reactivity or thermodynamicinformation, and thus infers RNA structure using information thatis orthogonal to SHAPE.

    We identify at least 10 ‘structured’ regions that exhibit both lowSHAPE reactivity and high pairing probability (Fig. 1b, compare dark

    1Department of Chemistry, 2Department of Biomedical Engineering, 3Linenberger Cancer Center, 4Department of Biology, University of North Carolina, Chapel Hill, North Carolina27599-3290, USA. 5AIDS and Cancer Virus Program, SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702-1201, USA.

    Vol 460 | 6 August 2009 | doi:10.1038/nature08237

    711 Macmillan Publishers Limited. All rights reserved©2009

    www.nature.com/doifinder/10.1038/nature08237www.nature.com/naturewww.nature.com/nature

  • SH

    AP

    E r

    eact

    ivity

    Pairing p

    robab

    ility

    gag

    pol vpr

    vpu

    neftat

    rev

    envU3,R

    R,U5

    Gene start locations

    Gene stop locations

    118

    033

    533

    6

    1838

    1634

    4587

    5165

    4642

    3

    2

    1

    Rea

    din

    g fr

    ame

    5105

    5395

    5376

    5515

    5607

    5767

    5590

    5852

    8005

    8189

    8331

    8953

    9173

    7915

    8333

    8622

    MA CA NC p6

    PR RT RNase IN SP gp120 gp41

    vif

    Inter-proteinlinkers

    Inter-proteinlinkers

    Protein-domainjunctions

    Protein-domainjunctions

    a

    b

    c

    V1/V2 V3 V4 V5CPPT PPT

    0

    0.2

    0.4

    0.6

    0.8

    1.00

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Frameshift RRE5′ UTR 3′ UTR

    d ( )Partial

    V1/V2

    N

    C

    SD1 SA5

    C

    N

    N

    Cp6

    NC

    MA

    CA

    PR

    RT

    RNase

    IN

    SP

    gp120

    gp41

    SP-stem

    Figure 1 | Organization, extent of RNA structure, and relationship toprotein structure for an HIV-1 genome. a, HIV-1 genome organization.Protein coding regions are shown as grey boxes; polyprotein-domainjunctions are depicted as solid vertical lines. Gene start and end sites arenumbered according to NL4-3. CA, capsid; IN, integrase; MA, matrix; NC,nucleocapsid, PR, protease; RT, reverse transcriptase; SP, signal peptide.b, Comparison of median SHAPE reactivities (dark blue line) andevolutionary pairing probabilities (cyan line). Medians are calculated using a75-nucleotide window. The global median (0.34) is depicted as a red line.Pairing probability is not reported for regions encoding overlapping reading

    frames. PPT, polypurine tract; CPPT, central PPT. c, Inter-protein linkers inpolyprotein precursors and the unstructured peptide loops that link proteindomains are indicated with green and yellow bars, respectively. The singleinter-protein linker that is not encoded by a region of highly structured RNA(at the RNase H–integrase junction) is shown with an open green bar.d, Comparison of domain structures for the large HIV proteins with thestructure of the encoding RNA. Polyprotein linkers are green; inter-domainloops are yellow; folded protein domains are blue, red, light magenta, purpleand grey (Supplementary Table 2).

    ARTICLES NATURE | Vol 460 | 6 August 2009

    712 Macmillan Publishers Limited. All rights reserved©2009

  • blue and cyan traces). This group includes the 59 UTR and the Revresponsive element (RRE), which are known HIV regulatory elem-ents (Fig. 1b). However, most of these highly structured and evolu-tionarily conserved elements have not been characterized previously.These regions include the protease–reverse transcriptase junction,

    domains in the reverse transcriptase, integrase, and Vif open readingframes, an element 39 of the Env signal peptide, and the nef 39-UTRregion.

    We also identify at least seven ‘unstructured’ regions, extendingover 200–600 nucleotides, with high SHAPE reactivities and low

    pol (IN) endvpr start

    gag (CA) startgag (MA) endgag (MA) start

    gag (p2) start

    AB

    DC

    gag (CA) endE

    gag (p2) end

    gag (NC) start

    F

    Ggag (NC) endgag (p1) startTF peptide start

    HIJ

    gag (p1) endgag (p6) start

    KL

    TF peptide end

    pol (PR) start

    M

    Ngag (p6) endOpol (PR) endpol (RT) start

    PQ

    pol (RT) end

    pol (RNase) start

    R

    Spol (RNase) endpol (IN) start

    TU

    vif startVWX

    Protein-coding regions

    SHAPE reactivity

    1.00.6

    0.30

    0.4

    Not analysed (53 nts)O

    A

    O

    3′

    O O

    O O

    O OO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O O

    O O

    O O

    O O

    O OOO

    O O O

    O O

    O O

    O OOO

    O OOO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O

    O

    O

    OOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    O

    O

    O

    OO

    OO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O O

    O OO

    O O

    O O

    O O

    O O

    O OO

    OO O O

    O

    O

    O

    O

    O

    OO

    O

    O O O O O OO

    OO

    OO

    OO

    OOO

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OOOOOOOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O OO

    O

    O

    OO

    O

    O

    OO

    OOOOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    OO

    O

    O O

    O O

    O O

    O O

    O OOOOOO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    OO

    OOO

    OOOOOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOOOOOO

    OO

    OO

    OOOOOOOO

    OOO

    O

    OOO

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    OO

    O

    OO

    OO

    OO

    O O O O O O OO

    OOOO

    OO

    OOOO

    OOO O

    OO

    O

    O

    O

    O

    OO

    OO

    OO

    O O O O O

    O O

    O O

    O O

    O O

    O OO

    O

    OO O

    O O

    OO O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    OO

    O O OO

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O O

    O O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    O

    OO

    OOO

    O

    O

    O

    OO

    O O

    O

    OO O

    O O

    O OO

    OO

    OO

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO O

    O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO O

    O

    O

    O

    OO

    O O OOO

    OO

    O

    O

    OO

    O

    O

    OO

    OO

    OOOOO

    OOO

    O

    O

    O

    O O O

    O O

    O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    OOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OOOOOOO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OOOOOO

    OO

    OO

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    OOO

    O

    O O O O OO

    OO

    O O

    OO

    OO O O

    O OOO

    OO

    OO

    O

    O

    O

    OO

    OO

    OO

    OOOO

    OOO

    OOOO

    OOOO

    OOO

    OO

    OO

    O

    O

    OO

    OO

    O O O O OO

    OOO

    O

    OO

    OO

    O O

    O

    O

    O

    O

    O

    OO

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    O O O OO

    OO

    O

    O

    O

    O

    O

    OO

    O

    OO

    OOO

    OO

    O

    OO

    OO

    O

    O

    O

    O

    OO

    O

    OO

    O

    O

    OO

    O

    O

    OO

    OO

    O

    O

    OO O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOO

    OO

    O

    O

    O

    OO

    OO

    O OO O O

    OO

    OO

    O

    O

    OO

    OO

    O

    OO

    O O O O O O O O

    O

    O

    O

    O

    O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO O O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    O

    O

    O

    O

    O

    O O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    OO

    OOOO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O O O O O OO O

    O O

    O O

    O OO

    O O

    O O

    O O

    O O

    O OOO

    OO

    O O O

    O O

    O O

    O O

    O O

    O OOO

    O

    O

    O

    OOO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O

    O OO

    OO

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    O OO

    OO O O O O O O

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OOO

    O

    O

    O

    O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

    O OO

    O

    OO O

    O O

    O O

    O O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    OO

    OO

    OOO

    OO

    O

    OO

    OOOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OOO

    OOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    OO

    OOOO

    OOOO

    O

    O

    OO

    OO O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    OOOO

    OO

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    OOOO

    OO

    OO

    O O O O O OO O

    O O O O

    O

    O

    OO

    OO

    OO O

    O O

    O O

    OO O O

    O

    O O OO

    OO

    OO

    O

    O OO

    OOO

    OO

    OO O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O

    O

    OO

    O O

    O O

    O O

    O

    O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    O

    O

    O

    OOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    O

    O

    O

    O

    O

    OO

    OO

    OO O O

    O OO

    OO

    OO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    OO O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    O

    O

    O

    O OO

    OO

    O

    O

    O

    O

    OO

    O

    OO O O

    OO

    OO

    OO

    OO

    OO

    O

    OO

    O

    OO

    OO

    OO

    O OO

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    OO

    O

    OO

    OOOO

    OO

    O

    O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O O

    O O

    O OO

    O O

    O O

    O O

    O O

    O O

    O OOOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    O

    OO

    O

    O

    OO

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    OO

    O

    OO

    O O OOO

    O O O O O OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    O

    OO

    O

    O

    O

    O

    O

    O OO

    OOO

    OO

    OOO

    O

    O O O

    O O

    O O

    O O

    O O

    O O

    O OOOOOOOOO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOOOOOOOOOOOOOOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OOOOO

    O

    OO

    OO

    OO

    OO

    OO

    OO

    O

    OO

    O

    OO

    OO

    OOOO

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    OOO

    OO

    O

    O

    O

    OO

    OO

    OO

    O OO O O O

    OO

    OO

    O

    OOOOOOOOOOOOOOO

    O

    O

    O

    O

    O

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    OO

    O

    O

    OO

    O OO O

    O OO

    O

    O

    O

    OO

    OO

    OO

    O OO O O O O

    OO

    OO

    OO

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O O O O

    OO

    O

    O

    O

    OO

    OO

    OOOOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O OO

    O

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    OO O O O

    OOO

    OO

    OOOOO

    OOOO

    OOOOOOOO

    OO

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O OO

    O O O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O O O

    OO

    OO

    OO

    OO

    O

    OO

    OO

    O OO O O

    OO

    OO

    O

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O O O

    OOO

    O

    OO

    OOOO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O OOO

    OOOOO

    OO

    OOOOOO

    OO

    OO

    OOO

    OO

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    OO

    O

    OO

    O O OO O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O O

    O O

    O O

    O OO

    OO O

    O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O O

    O O

    O OO

    OO O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO O

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O

    O

    O

    O

    O

    OO O

    OO

    OO

    OO

    OOO

    O

    O

    OO

    OO

    OO

    OO

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O OO

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O OO

    O

    OO

    O

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    O O O O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    O OO

    O

    O

    O O O O O O O O O O

    O OO

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    O OO

    O

    O

    O O O O O O O O O O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOO

    OO

    OO

    OO

    O

    O

    O

    OO

    OO

    OO

    O OO O

    O O

    O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O O

    O O

    O O

    O OOO

    OO

    O

    O

    OO

    OO

    O O O

    O O

    O O

    O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O O

    O O

    O O

    O

    O

    OO

    OO

    OOOOOOO

    OO

    OO

    OO

    OO

    OO

    O

    O O

    O OO

    OO

    OO

    OOOOOOO

    OOO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OOOOOO

    OOO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    OO

    OO

    OO

    OO

    O

    OO

    OO

    OOO O

    O O OO

    OO

    OO

    OO O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    OO

    OO

    OO O

    O

    OO

    OO

    OO O

    O O O O O O O OO

    OO

    OO

    OO

    OO

    O O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    OO O

    OO

    OO

    O

    O

    O O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O O OO

    OO

    OO

    OOO

    O

    O

    O

    O

    O

    OO

    OO

    O

    OO

    OO

    OO

    OO

    OOO

    O

    O

    OO

    OO

    O OO

    OO

    OO

    OO

    O OO

    O

    O

    O O OO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    OO

    O O OO

    O

    O

    OO

    OOO

    OO

    O

    OO

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O OO

    OOOOOOOOOOOOOOO

    O OO

    OO

    OO

    OO

    OO

    O

    O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O O O

    OO

    O

    OO

    OOOO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O OO

    O

    O

    O

    OO

    OO

    OOOOOOOOOOOOO

    O

    O O O O O O O O O O O O O

    O OO

    OO O

    O

    O

    O O O

    O O

    O O

    O O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOO

    O O

    O O

    O O

    O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    OO

    OOOOOOOO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    OO

    O

    OO

    O

    OO

    OO

    OO

    OO

    O O O O O OO

    OO

    OO

    OO

    OO

    O O

    O O

    O O

    O O

    O OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O OO

    O

    OO

    OOOO

    OO

    OO

    OOO

    OOOOOOO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    OO

    OO

    O

    OO

    OO

    OO

    O

    O

    OO O O

    OO O O O O O O

    O OO

    OO

    O

    O

    O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O O

    OOO

    OO

    OOOO

    O

    OO

    O O O OO O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O O

    O O

    O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OOOOOOO

    OO

    OO

    OO

    O

    O

    O

    O

    OO

    OO

    OO

    O O O O O OO

    OO

    OO

    OO O

    O

    O

    O

    OO

    OO

    OO

    O O O O O OO

    OO

    OO

    OO

    O

    O

    O

    OO

    OO

    OO

    OOOOO

    O O OO

    OO

    O

    O

    O

    OO

    OO

    OOOOOO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO OO

    OO

    O

    OO

    OOOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OOO

    OOOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    OO

    O O OO

    OO

    O O O OO

    OO

    O

    OO

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O O O

    O O

    O O

    O OOOOO

    OO

    OO

    OOO

    OO

    OO

    OO

    OOOO

    O

    OOOOOOOOOOOOOOOOOOOOOOOOOO

    O

    O

    O

    O

    O

    OOO

    O

    O

    O

    O

    O

    OO

    OO

    O O OO

    O O

    O O

    O O

    O OO

    OO

    O

    O

    OO

    OO O O O

    OO

    O

    O

    OO

    OO

    O O O O O O O O O O O O O

    O OO

    OO

    O OO

    O

    O

    O O O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO O O

    O

    O

    O

    O O O O O

    O O

    O O

    O OO

    OO

    O

    O

    O

    OO

    OO O

    O O OO

    OO

    O

    O

    OO

    OO

    O O O O O O O O

    O O

    O O

    O O

    O O

    O OOOOOO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    OO O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    O OO O O O O

    O OO

    O

    OO O

    O O

    O O

    O OO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOOOO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O OO O

    O O O O O O O O O O OO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    OO

    O O OO

    O

    OO

    OOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OOOOO

    O

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO O

    O O OO O O O O O O

    OO

    OO

    OO

    OO

    OO

    OOO

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OOOOOOO

    OO

    OO

    OOO

    O

    O

    OO

    OO

    OOO

    O

    O

    O

    O

    O

    O OO

    OO

    OOOO

    O

    OO

    O

    O

    OO

    O

    OO

    OOO

    OOO

    OO

    O

    O

    O

    O

    O

    O O O O O O O O O O O O O O O O O O O O O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    O O

    O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O

    O

    OO

    O O

    O OOOOOOOOOOO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    OOOOO

    OO

    O

    O

    OO

    O O O OO

    OO O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OOO

    OO

    OO

    OO

    OOOO

    OO

    O

    O

    O

    OO

    OO O O O

    OO

    OO

    O

    O

    OO

    O

    O

    O O O O O O O O OO O

    O OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    OO

    OO

    O

    O

    O

    O

    O

    OO

    O OO

    O

    O

    OO

    O

    O

    O

    OO

    OO

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O O OO

    OO

    OO

    OO

    OO

    OO

    OO O

    O

    O

    OO

    OO

    O

    O

    O

    OO

    OO

    OOOOOO

    OO

    OO

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O OO

    OO

    O

    OO

    OO

    O

    O O O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O OO

    O

    OOOOO

    OOOOO

    OOO

    O

    O

    O

    O

    O

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OOOO

    O O O O O O OO O

    OO

    OO

    OO

    OO

    OOOOOOOOOO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O OO

    O

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    OOOO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O O OO

    O

    O

    OO

    OOO

    O

    30

    70

    150

    190

    220

    370410

    450

    620

    670 710

    2705′ TAR

    5′

    PBSDIS

    PSI 760820

    890

    9701020

    1080

    940

    1150

    1190

    1230

    1370

    1480

    1520

    1570

    1580

    1610

    1630

    1650

    1590

    1660

    1700

    1720

    1740

    1770

    gag-polframeshift

    18401900

    1950

    2000

    2020

    2110

    2050

    2190

    2250

    2285 2410

    2460

    2540

    2640

    2610

    2590

    2680

    2740

    2770

    2800 2830

    2840

    2920

    3000

    3100

    3200

    32503310

    33503380

    3410

    3460

    3570

    37303760

    3820

    3920

    3940

    4000

    4170

    4240

    CPPT

    4530

    4550

    4580

    4630

    46904760

    4810

    48504890

    4930

    4975

    5030

    5120

    O O O O

    O

    O

    OO O O

    O

    OO

    O

    O

    OO

    O

    O

    4330

    4430

    BC

    DEFG

    H IJ

    KL

    MN

    O

    P Q

    RS

    T UV

    W

    X

    18701270

    1300

    345

    860

    140

    500

    650

    1935

    40504080

    3945

    4470

    4220

    4260

    4965

    2850

    O

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    tRNALys3

    OO

    OO

    OO

    OOO

    OO

    OO

    OO

    5′3′5′

    Longestcontinuous

    helix

    vif endtat exon1 startvpr end

    nef end

    YZ

    Aa

    tat/rev exon 1 endrev exon 1 start

    vpu start

    BbCcDd

    env signal peptide start

    env signal peptide endenv (gp120) start

    env (gp41) startenv (gp120) end

    EeFfGgHhIi

    tat/rev exon 2 startrev endenv end

    tat end

    nef start

    Jj

    KkLl

    MmNnOo

    Base-paired

    Single-stranded

    SH

    AP

    E r

    eact

    ivity

    0.0

    0.5

    1.0

    O O O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOOOOOOOOOOOOOOOOOOOOOOOOOOOO

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    O O OO O

    O OO O

    O OO O

    O OO

    OOOO

    OO

    OO

    OOOO

    OOO

    O

    O

    OO

    OO O O

    OOO

    OO

    OO

    O O O

    O O

    O O

    O O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOO

    OO

    O

    O

    O

    O

    O

    OO

    OO O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O

    OO O

    O O

    O O

    O OO

    OO O

    O O

    O OO

    OO O

    O OO

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OOOOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OOOOO

    OO

    OO

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    OO

    OO

    O OO

    O

    O OO O O

    O

    O

    O

    OO

    OO

    O OO O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    OO

    O

    OO O O

    O

    OO O O O O

    O

    O

    O

    O

    O

    OO

    O O OO

    O

    OO

    OO

    O

    O

    OO

    OO

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    O

    O

    OO

    OO

    OO

    O

    O

    O

    O OO

    OO

    OO

    O

    OO

    O OO OO OO

    O

    O

    O

    O

    O

    O

    OO

    O OO O

    OO

    O

    O

    OO

    OOOOO

    OOOO

    O

    O

    O

    O O O O O O O O O

    O O

    O O

    O O

    O OOO

    OO

    OO

    O

    O

    O

    OO

    OO

    OO O O O O

    OO

    OO

    O

    O

    O

    O

    OO

    OO

    OO

    O O O O O O O O

    O O

    O O

    O O

    O OO

    OO O

    O O

    O O

    O O

    O OO

    OO

    O

    O

    O

    O

    O

    O

    OOOO

    OO

    OO

    OO O O

    OO

    OO

    OO

    OO

    O

    OO

    OO O O

    OO

    O

    O

    O OO

    OO

    O

    OO

    OO

    O

    O

    O O O O O O O O O O O

    O O

    O O

    O O

    O OOO

    OO

    OO

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    O

    O

    OOOO

    OO

    O

    OO

    OO O O O

    OO

    OO O

    OO

    OO O O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    O

    O

    OO

    O O O

    O O

    O O

    O OO

    O

    O

    OO O

    O O

    O OOO

    OO

    O

    O

    O

    OO

    OO

    O O O

    O O

    O OO

    O

    OO O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    OO

    OO O O O

    OOO

    OO

    O

    O

    O

    OO

    OO

    OO

    O

    O

    O

    OO

    OO

    OO

    O

    OO

    OO

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    OO

    O O OO

    OO

    O OO O O O

    OO

    OO

    O

    O

    O

    O

    O

    OO

    O O O OO

    O

    OO

    OOOO

    O

    O

    OO

    OO

    OO

    OOOOOO

    OO

    O

    O

    OO

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OOOO

    OO

    OO

    OO

    OO

    OOO

    O

    O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O O

    O O

    O O

    O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OOOOOOO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    OO O O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO O

    O

    O O O O OO

    O

    O

    O

    O

    O

    OO O

    O

    OO

    OO

    O

    O

    O

    O

    O

    O O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    OO O O

    O

    O

    O

    O

    O O

    O OOO

    O O O

    O O

    O O

    O OOO

    O

    O

    O

    O

    OOO

    OO

    O

    O

    OO

    OO

    O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O O

    O O

    O O

    O OO

    OO O

    OO O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OOOO

    O

    O

    O O OO

    OO

    O

    O

    OO

    OOOO

    O

    OO

    OO

    OO

    OOO

    OO

    OOO

    OOOO

    OOOOO

    OO

    OOO

    O

    OO

    O O O

    O O

    O O

    O O

    O OOOOO

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    OOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O O

    OO O

    O O

    OO O O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O OO

    OOO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O O OO

    O

    OO

    OOOOO

    O

    O

    O O O O

    O O

    O OOOOO

    OO

    OO

    O

    OO

    OO

    O OO O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOO

    O

    O

    OO

    OO

    O O

    O O

    O O

    O OOO

    O O O

    O O

    O O

    O O

    O O

    O OOO

    OO

    OO

    O

    O

    O

    O

    OO

    OO

    OO O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O

    OO O

    O O

    O O

    O O

    O OO

    OOOOOOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    OO

    O

    O

    O

    O

    O

    OOO

    OO

    OO

    OO

    OOOO

    OOOO

    OOOO

    O O O

    O O O O O O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOOO

    OO

    O

    O

    OO

    O

    O

    O

    O

    OO O

    OO O

    O O

    O O

    O O

    O O

    O OO

    OO

    O

    O

    O

    OOO

    OO O O

    O O

    O O

    O OO

    OO

    O

    O

    O

    OO

    OO O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O O

    O O

    O O

    O O

    OO O

    O

    O

    OO

    O

    OO

    O

    OO

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    O OO O O

    O O O O O O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O O OOO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O O OO

    O

    O

    O

    O

    O

    O OO O O O

    O

    O

    O

    O

    O

    O

    OOO

    OO

    OO

    OOOOO

    OOOOOOOOOOOOOO

    O

    OO O

    O OO

    OO O O

    O

    O

    O OO O

    O O OO O O O O O O O O O O O O O O O O O O

    O

    O

    O

    OO O O

    OO

    OO

    OOOOO

    OO

    OO

    OO

    OOOOOOOOOOOOOOOOOOOOOO

    OOOO

    OO O

    O OO O O

    O O O O O O O O O O O OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O OO

    OO

    OOOOOO

    OOOOOOOOOOOOOOOOOO

    O O O O O O O O O O O O O O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OOOOOO

    OO

    OO

    OO

    OOO

    OO

    OO O O O O

    OO

    OO

    O OO

    OO

    OO O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    OO

    O O

    O O

    O

    O

    O

    O O

    O O

    O O

    O O

    O OOO

    OO

    O

    O

    OO

    OO

    O O O O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OOO

    OOOOOO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O OO

    OO

    O O O OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO O O

    OO

    O

    OO

    OO

    OO

    OO

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOOO

    OO

    O

    O

    OO

    OO O O O

    OO

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO O O O O O O O O O O

    OO

    OO

    OO

    OO

    O OO

    OO

    OO O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OOO

    OOOOOOOOOOOO

    OO

    OO

    O

    O

    O

    O

    OO

    O

    O

    O

    O O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    O O O OO

    OO

    OOO

    O

    O

    OO

    OO

    OOOOO

    OO

    OOOOOO

    OO

    O

    O

    OO

    O

    OO

    OO

    OO

    O

    O

    OO

    OO

    OOO

    O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

    O O

    O O

    O OO

    O

    OO O

    O OO

    OO O

    O O

    O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O O

    O O

    O O

    O OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    OO

    O

    O

    O

    O

    OO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    OO

    O

    O

    OO

    OO O O

    O O

    O O

    O OOOO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    O

    OO

    O OO

    OO

    OO

    OO

    O

    OO

    OO

    OO

    OO O

    O OO O O O O O

    OO

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O O

    O

    O

    O

    O

    O

    O

    O

    OO

    O OO

    O

    OO

    O

    O

    OOO

    OO

    OOO

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OOO

    O

    O

    O

    O

    O

    O

    OO

    OO

    O

    OO

    OOO

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO O

    OO

    OO

    OO

    O

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O OO

    O

    OO

    O

    OO

    OOO

    O

    O

    O

    O

    O

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O O

    O

    OOO

    OOOO

    O

    O

    O

    OO

    O

    OO

    O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O OO O

    OO O

    O O

    O O

    O O

    O OO

    O

    OO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOO

    O

    O

    O

    OO

    OO O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O

    O

    O

    OO O

    O O

    O O

    O O

    O O

    O O

    O OOOOOOOOO O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OO

    O

    OO

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    O

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O OOO

    O

    O

    OO

    OOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O

    O O O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    OO O O O

    OO

    OO O

    O O O O

    O

    O

    O

    O

    O

    OO O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O OO

    O

    OOOO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    OO

    O

    OO

    OO

    OO

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O O O

    O O

    O O

    O OO

    OO

    O OO

    O

    O

    O O O O O O O O O O O O

    O OO

    OO O

    O

    O

    O O O

    O O

    O O

    O OOO

    O O

    O O

    O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OOOOOOO

    OO

    OO

    O

    O

    O

    OO

    OO

    OO

    O O O O OO

    OO

    OO O

    O

    OO

    O O OO

    OO

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O OO

    O

    O

    O

    O

    OO

    OO O O O O

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O OO

    OOO

    OO

    OO

    OOOOOO

    OO

    OO

    OOOO

    O O O O O OOOOO

    O O O O O O O O O O O O O O O O O O O O O O (A)n

    7450

    7480

    7510

    7550

    75907250

    7270

    7390

    7420

    7980

    7850

    7800

    7750

    7700

    6830O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

    RRE

    I

    V

    IV

    III

    IIc

    IIb

    IIaRevB.S.

    3′ TAR

    3′ poly(A)

    3′5700

    5750

    5780

    5810

    5850

    5940

    5980

    5990

    6010 6060

    O6390

    6450

    6560

    6740

    6850

    6900

    6910

    6950

    7060

    7090

    7140

    7200

    8090

    8140

    8220

    8250

    8300

    8350

    8375

    8395

    8450

    8470

    8510

    8590

    8640

    8730

    8770

    8870

    8890

    8920

    PPT

    9000

    9050

    9091

    9142

    Ee

    FfGg

    HhIi

    Jj

    Kk

    Ll

    MmNn

    Oo

    O O O O

    O OOO

    O O

    O O

    O O

    O O

    O O

    O O

    O OOOOOO

    O

    O

    O

    O

    O

    O

    O

    OO

    OOOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OOOOO

    OO

    O

    OO

    OO O O O

    OO O

    O O O OO

    O

    O O

    O OO

    O O

    O O

    O O

    O O

    O O

    O

    O

    O O

    O O

    O O

    O O

    O O

    O O

    O O

    O OOO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    OOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OOOO

    O

    O

    OO

    O O O OO

    OO

    O

    OO

    OO

    OO

    OO

    OO

    O

    O

    OO

    OO

    OO

    OO

    O O

    O O OO

    O

    O

    OO

    O O O O O O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O O OO

    O

    OO

    OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    OO

    O

    O OO

    O

    O

    O

    O

    O

    OO

    OO

    OOOOO

    OO

    OO

    OO

    OO

    O

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    OO

    O OO

    OO

    OO

    OO

    O

    O

    O

    O

    O

    OO

    O

    OO O O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO O

    OO

    OO

    O OO O O O O

    OO

    OO

    OO

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    OO

    OO

    OO

    OO O

    O O O O O O OO

    OO

    O

    O

    O

    O

    O

    O O

    O

    O

    O

    O

    O

    O

    O

    O O OO

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    O OO

    O

    O

    O

    O

    OO

    O

    O

    O

    O

    O

    OO

    O

    O

    O

    O OO

    OO

    OO

    OOOOO

    O

    O

    OO

    OO

    OO

    OOOOOOOOO

    OO

    OO

    OO

    O

    O

    O

    OOOO

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    O

    OO

    O

    OO

    O

    O

    O

    O

    O

    O

    O

    OOOO

    OO

    OO

    O O

    O

    OO O

    OOOO

    OO

    OO

    OO

    OO

    OO

    OO

    O

    O O

    5200

    5220

    5240

    52805320

    5360

    5410

    5460

    5550

    56205670Y

    Z

    Aa

    Bb

    Cc

    Dd

    OOOO O

    6160

    6230

    6310

    6360

    6405

    6600

    6640

    6670

    6785

    7030

    7230

    SP-stem

    a

    b

    Figure 2 | Structure of the HIV-1 NL4-3 genome. The 59 (a) and 39 (b)genome halves are shown. Nucleotides are coloured by their absolute SHAPEreactivities (see scale in a). Every nucleotide is shown explicitly as a sphere;base pairing is indicated by adjacent parallel orientation of the spheres. Proteindomains are identified by letters; TF, transframe peptide; nts, nucleotides.Important structural landmarks are labelled explicitly. Full nucleotide

    identities and pairings are provided in the Supplementary Information(Supplementary Fig. 7). Intermolecular base pairs involving the tRNALys3

    primer and the genomic dimer are shown in grey. Inset shows a box plotillustrating SHAPE reactivities for single-stranded versus paired nucleotides inthe model. Median reactivities are indicated by bold horizontal lines; the largebox spans the central 50% of the reactivity data.

    NATURE | Vol 460 | 6 August 2009 ARTICLES

    713 Macmillan Publishers Limited. All rights reserved©2009

  • pairing probabilities. These include the RNase H coding domain,variable domains (Vx) in gp120, and the polypurine tract (Fig. 1b).On a smaller scale, the consensus sequences for the highly used splicesite acceptors are also unstructured (Supplementary Fig. 3). There arefour regions of apparent disagreement in the level of RNA structure,having high pairing probabilities and high SHAPE reactivities (oneeach in the reverse transcriptase, RNase, integrase and gp41 codingregions). This small group may reflect sequence conservation that isnot accounted for by the evolutionary model13, or may form criticalstructures at an alternative stage of the viral replication cycle.

    RNA structure encodes protein structure

    We first evaluated whether global RNA genome structure is linked toprotein structure. HIV-1 produces three major classes of messengerRNA. The 9 kilobase (kb) class encodes Gag and Gag-Pol and isidentical to the packaged genomic RNA analysed here except, as anmRNA, it is not dimerized at its 59 end2. There are very few differ-ences in the SHAPE reactivity of dimeric and monomeric RNAs at the59 end of the genome6. Thus, genome structures outside of thedimerization region will correlate closely to the mRNA that encodesGag and Gag-Pol. The most abundant 4 kb env mRNA is generated bysplicing nucleotide 288 (SD1, the major splice donor) to nucleotide5522 (termed the SA5 site)15. SA5 is followed by an unstructuredgenome region (Fig. 1a, b). Thus, RNA structures identified in theenv coding region probably exist in the spliced mRNA that encodesEnv. Structures for the 1.8-kb class of mRNAs, which generate Tatand Rev, cannot be predicted using the genomic RNA because dis-continuous segments are joined in the final mRNA.

    The Gag, Gag-Pol and Env polyprotein precursors are synthesizedroughly as beads on a string, and the constituent proteins are liberatedby proteolytic cleavage2,3 (Fig. 1a, d). Eight inter-protein peptides linkthe HIV proteins (Fig. 1c, green bars). The RNA sequences that encodethese spacer peptide linkers in Gag (at the matrix–capsid, capsid–nucleocapsid and nucleocapsid–p6 junctions), Pol (protease–reversetranscriptase and reverse transcriptase–RNase H junctions) and Env(signal peptide–gp120 and gp120–gp41 junctions) all (except theRNase–integrase junction) have SHAPE reactivities that are muchlower than the median (Fig. 1b). RNA sequences that encode theseinter-protein peptide linkers are more highly structured than 95.2% ofrandomly selected regions in the genome (Supplementary Fig. 4a).

    Domains in the individual HIV-1 proteins—capsid, reverse tran-scriptase and integrase—are also linked by unstructured peptide ele-ments, and each domain junction is encoded by an RNA region oflow SHAPE reactivity (compare yellow bars in Fig. 1c with dark bluetrace in Fig. 1b). Protein loops encoded by RNA regions with lowSHAPE reactivity include the cyclophilin loop and the linker betweenthe amino- and carboxy-terminal domains in capsid, both loops thatlink independently folded domains in reverse transcriptase, and theeight and nine amino acid loops linking the three domains in inte-grase (Fig. 1d, in yellow). These protein-domain junctions are morehighly structured than 88.9% of randomly selected equivalent-lengthregions in the genome (Supplementary Fig. 4b).

    In contrast to the other large HIV proteins, domains in gp120(termed inner, outer and bridging sheet) are not structurally auto-nomous. The C-terminal 35 residues of gp120 weave from the outerto the inner domain, and the bridging sheet is comprised of residuesthat are 315 positions distant16. Junctions between domains in gp120are also not encoded by highly structured RNA, suggesting that gp120folding is not linked to RNA structure in the same way as for otherHIV proteins because its constituent domains are not structurallyindependent.

    The recurring pattern of structure, conspicuously located near orafter autonomously folding protein coding domains, is consistentwith a model in which HIV protein structure is encoded in itsRNA at two distinct levels. The first is the linear relationship betweenRNA and protein primary sequences. In the second level, higher-order RNA structure directly encodes protein tertiary structure,because unstructured protein loops are derived from highly struc-tured RNA elements. Many proteins appear to fold during trans-lation17, highly structured RNA slows and causes ribosomalpausing during translation18,19, and changes in the extent of localRNA structure modulate protein activity20. Together, these observa-tions suggest that attenuation of ribosome elongation by highlystructured RNA at protein-domain junctions facilitates native fold-ing of HIV proteins by allowing time for domains to fold indepen-dently during translation.

    This model makes the clear prediction that ribosome pause sitesshould occur preferentially in the highly structured regions of anHIV-1 RNA that encode protein junctions. We tested this idea usinga toeprinting experiment, in which ribosome processivity is inhibitedby cycloheximide and sites preferentially occupied by the ribosomeare detected as stops to primer extension in an in vitro translationreaction21. Ribosome pause sites are statistically overrepresented atthe matrix–capsid and capsid–nucleocapsid junctions in Gag and atthe sequences encoding the cyclophilin loop in capsid (Supplemen-tary Fig. 5). Conversely, ribosome pause sites are underrepresented inflanking, but unstructured, regions of the HIV RNA (P 5 0.018).These experiments thus strongly support the model that mRNAstructure over a region spanning 60–100 nucleotides specificallymodulates ribosome processivity at protein-domain junctions.

    RNA secondary structure model for HIV-1

    Comprehensive SHAPE reactivity information can also be used todetermine a nucleotide-resolution secondary structure model for theentire NL4-3 HIV-1 genome (Fig. 2). SHAPE reactivities are converted

    U U G G G A U A U U G A U G A U AC GU AG CU AA UG

    UGC

    U AG CU AG CG CG CU AC G

    A UC GA UG CUC

    UA

    UUAU

    G

    C

    G

    C

    GGU A U

    G CU AG CU AG CG CA

    AG

    C

    G

    C

    AAGC

    A AAC GC GA UC

    UC

    UAUU

    UU G C

    U AG UC G

    A UU AC GA UGA UU AG CC GU

    A AA

    A

    A CAG

    AG

    AU

    AAU

    GU

    U

    G

    A

    U

    G

    C

    U

    G A

    0.4

    00.3

    0.6

    1.0

    SH

    AP

    E r

    eact

    ivity

    Fluo

    resc

    ence

    inte

    nsity

    Elution time (axis inverted)

    Nucleotide position

    1M7(+)(–)

    5940

    5930

    5900

    5915

    5990

    5980

    5965

    5955

    5885

    5875

    5852

    5835

    SPgp120

    a

    b

    c

    5′ 3′

    5′ 3′

    5′ 3′

    SP-stem secondary structure

    d

    Position of SP-stem during translation

    Ribosome

    A UU AC GA UGA UU AG CC GU

    A AA

    A

    envSP gp120 gp41

    UAAAA

    G U A C C C A C AA UC GA UG C

    U U G A U G A U C U G U A G U G C U A C A G A A A A A U G U G G G U C

    APE

    SP gp120

    5835 5852 5870 5885 5900 5915 5930 5940 5955 5965 5980 5990

    G A

    Figure 3 | SHAPE analysis of the signal peptide–gp120 region. a, Processedcapillary electrophoresis trace showing intensity versus position for the (1)and (2) reagent reactions. b, Histogram of integrated and normalizedSHAPE reactivities as a function of nucleotide position. The SHAPEreactivity scale shown here is used consistently throughout this work. c, RNAsecondary structure model for the signal peptide pause site stem. d, Locationof the signal-peptide stem relative to the eukaryotic ribosome at the pausesite. Base pairs disrupted when the ribosome is at the pause site are boxed.

    ARTICLES NATURE | Vol 460 | 6 August 2009

    714 Macmillan Publishers Limited. All rights reserved©2009

  • to free-energy change terms and used to constrain a thermodynamicfolding algorithm22,23. The final result is a thermodynamically favouredstructural model highly reflective of the experimental SHAPE data, atsingle nucleotide resolution. For example, most nucleotides assigned tosingle-stranded regions are reactive towards SHAPE (Fig. 2, red, orangeand green nucleotides), whereas base-paired nucleotides are predomi-nantly unreactive (Fig. 2, black nucleotides and inset). For a full dis-cussion of SHAPE-directed RNA folding and the fundamentalcorrectness of this model, see the Methods.

    The HIV-1 genome is less structured than ribosomal RNA but,similarly, contains independent RNA folding domains that extendfrom the overall genomic backbone. These domains include bothsmall stem-loops plus roughly 21 large and complexly folded struc-tures (Fig. 2). Although many genome regions are highly structured,only seven helices span a complete turn of an 11-base pair (bp) RNA

    duplex. The largest paired region, devoid of bulges, is the structuredRNA element that bridges the coding junction between the reversetranscriptase and RNase H folding domains (Fig. 1). This helix is 19-bp long, contains a non-canonical G-A base pair (Fig. 2a, nucleotides2015–2033/2103–2121), and is thus shorter than the 30-bp lengthcompetent to induce the interferon response24.

    The HIV-1 genome structural model provides a robust startingpoint for identifying previously unrecognized functional elementsand long-range RNA interactions. SHAPE reactivities describe awell-formed stem 39 to the signal-peptide coding region in the Envprotein (Fig. 3c). This stem (the signal-peptide stem) is evolutiona-rily conserved (Fig. 1b), reinforcing an important biological role. Thesignal recognition particle (SRP) binds the nascent Env signal peptideand translocates the cytoplasmic ribosome elongation complex to therough endoplasmic reticulum, where translation of gp120 and gp41continue25.

    RNA-induced translational pausing occurs as the ribosomeunwinds highly structured RNA, typically located 6–7 nucleotidesdownstream of the A-site18. The signal-peptide stem will be exactlyin this conformation when the final tRNAAla from the signal peptideand the first tRNAThr of gp120 are in the P- and A-sites (Fig. 3d,boxed nucleotides). We infer that ribosomal attenuation or pausingat the signal-peptide stem provides more time for SRP recruitmentand subsequent translocation of the elongation complex to the endo-plasmic reticulum.

    The SHAPE-constrained secondary structure is also informative forpreviously identified regulatory motifs. In HIV-1, pro and pol geneproducts are translated when the ribosome undergoes a 21 registershift from the gag to the pol reading frames. Frameshifting occurs at aslippery sequence (UUUUUUA) and is enhanced by a downstreamRNA structure. These elements are typically drawn as a single-stranded slippery sequence and a 12-bp stem-loop26. Direct analysisof intact genomic RNA shows that the gag-pol frameshift signal is onecomponent (identified here as P3) of a three-helix structure (Fig. 2and Supplementary Fig. 6a). The slippery sequence pairs to form oneof the three helices (P2). These two helices are stabilized by an anchor-ing helix (P1) that creates this discrete structural element(Supplementary Fig. 6a). This three-helix junction structure is con-served among HIV-1 group M sequences (Supplementary Fig. 6b).

    Most RNA viruses require a complex pseudoknotted structure toinduce ribosomal frameshifting27. The three-helix junction mayfunction, in part, to slow translation before the ribosome encountersP3, facilitating the prerequisite pause necessary for frameshifting.The three-helix junction model may also explain why changing theslippery site to sequences that allow alternative tRNA pairing andenhance frameshifting in other RNA viruses eliminates frameshiftingin HIV-1 (ref. 28). In the SHAPE-directed model, changes to theslippery sequence compromise base pairing in the conserved P2 helix(Supplementary Fig. 6).

    Unstructured motifs and insulator helices

    Analysis of the HIV-1 genome structure supports a role for RNAstructures in sequestering unstructured regions. Five variabledomains (V1–V5; see Fig. 1a, b) in the Env surface protein, gp120,account for much of the genetic diversity in HIV-1 (ref. 14) and are acritical component of the viral host evasion strategy. Four of thesedomains are hypervariable (hV1, hV2, hV4 and hV5) and exhibitlarge amino acid insertions and deletions between viral isolates14.

    Sequences encoding hypervariable domains are internally unstruc-tured and are bordered by evolutionarily conserved and stable RNAstructures (Fig. 4a, b). For example, hypervariable region hV1 isencoded by RNA sequences with high SHAPE reactivities and isflanked by two stable helices (with free energies of 210.9 and218.4 kcal mol21, Fig. 4c). Similar patterns are evident in the otherhypervariable regions (Fig. 4c). Some hypervariable regions, espe-cially hV4, contain internal helices with non-trivial free energies;however, these helices are not evolutionarily conserved (Fig. 4b)

    AUAG

    UA

    A CA G

    A

    C

    A

    AUC

    UA

    3'

    6318

    UA A UAACAACAA

    7140

    3′5′

    5' AG

    C

    UUAAA

    G UACUGAUUUGA

    AUGAUACUAAUAC GC G

    AAUA

    G U A GUAG

    C

    GAGAAUGAUAAUGGAGAAAGGAGAGAUAAAAAA

    A

    AG

    U

    G

    Nucleotide position

    hV2

    hV4

    hV5

    7125

    7160

    hV1

    CACUGA AAAAAAC

    hV1

    AAUAGA GCUA

    hV2 hV4 hV5

    UUAAU UCACCCU UGGUGG GAGAU

    a

    b

    c

    A UACUG

    UU

    UA

    AU

    AG

    C

    U

    A

    A

    U

    C

    G

    U

    A

    UGGU

    UU

    A A UUUGG

    AG

    UA

    UCGU

    GGA A G

    C A A A U A A C A C

    G

    U

    A

    G

    C

    AA

    G

    C

    G

    C

    A A

    U

    G

    C

    U

    A

    G

    C

    AC A C A

    AU

    CA

    CA

    UG

    AAUAAAACAA

    UU

    UA

    UAAACA

    69357030

    C

    G

    U

    G

    G

    U

    C

    G

    U

    G

    A

    U

    U

    A

    UA A C A

    AG

    AG

    G

    UG

    CG

    CU

    AC

    G

    CG

    C

    AGA

    UCUUA

    6160

    –10.9 –18.4

    -1.7

    –11.3–2.8

    –10.2

    –10.8

    –1.8

    –4.3

    -2.3

    -5.0

    –20.2

    –13.2

    –6.4

    –4.3

    ~0

    ~0

    ~0

    –1.1

    -2.3

    AAAUAUCAGCACAAGCAUAAGAGAUAAG UG UU AG CC GA U

    GAA

    A G AAUA

    CUUUUAUAAACUUGAUAUAGU

    -4.3

    6950

    6995

    6190

    6210

    6975

    –5.1

    -4.2

    0–10–20–30–40

    Helix energies (kcal mol–1)

    Median helixstability

    0

    0.5

    1.0

    6150

    6175

    6200

    6225

    6250

    6275

    6300

    6325

    6950

    6975

    7000

    7025

    7050

    7075

    7100

    7125

    7150

    7175

    6350

    Pairing probability

    Genome alignment

    Structure

    U

    6260

    6300

    d Helix energies

    –20.

    2

    –18.

    4

    –13.

    2

    –11.

    3

    3′ T

    AR

    5′ T

    AR

    Fram

    eshi

    ftR

    RE

    ste

    m I

    SP

    -ste

    m P

    1

    –10.

    9–1

    0.8

    –10.

    2

    Long

    est

    cont

    inuo

    us h

    elix

    Figure 4 | RNA structure in Env hypervariable regions. a, Schematicsequence alignment for group M reference sequences14 at the Envhypervariable regions (hV1, hV2, hV4 and hV5). Nucleotides arerepresented as vertical bars; light grey and black indicate low versus universalconservation, respectively. b, Evolutionary pairing probabilities. Breaksindicate extensive nucleotide insertions and deletions among the group Mconsensus sequences. c, RNA structures at the hypervariable coding regionshV1, hV2, hV4 and hV5. Calculated free energies are shown for each helix (inkcal mol21); energies for anchoring helices proposed to function asstructural insulators are emphasized in bold. d, Distribution of helixstabilities in the HIV genome shown in a box plot representation. Whiskersillustrate 1.5-times the interquartile range, and circles emphasize helices ofexceptionally high stability. Free-energy changes for proposed insulatinghelices are in bold; other significant helices are labelled.

    NATURE | Vol 460 | 6 August 2009 ARTICLES

    715 Macmillan Publishers Limited. All rights reserved©2009

  • and are much less stable than the flanking helices that have stabilitiesin the 10–20 kcal mol21 range (Fig. 4c). These helices are also highlystable relative to the distribution of duplex stabilities over the entiregenome (Fig. 4d).

    Collectively, these data suggest that RNA sequences encodinglength polymorphisms in env are segregated from the rest of thegenome by stable helices that function as structural insulators. Theobserved organization of hypervariable regions is thus well suited,first, to accommodate extensive substitutions, insertions or deletions,and second, to prevent these regions from forming non-functionalbase-pairing interactions with adjacent regulatory motifs, whichinclude the 39 splice site acceptors and the RRE.

    Perspective

    Structural analysis of a complete HIV-1 genome reveals that thisRNA is punctuated by previously unrecognized, but readily identifi-able and evolutionarily conserved, RNA structures. Most genomeregions with low SHAPE reactivities are associated with a regulatoryfunction (Fig. 1). SHAPE may be generally useful for identifying newregulatory elements in large RNAs. All of these elements representhypotheses and starting points that we hope will stimulate furtherdetailed examination. Our discovery that the peptide loops that linkindependently folded protein domains are encoded by highly struc-tured RNA indicates that these and probably other mRNAs encodeprotein structure at a second level beyond specifying the amino acidsequence. In this view, higher-order RNA structure directly encodesprotein structure, especially at domain junctions. The extraordinarydensity of information encoded in the structure of large RNA mole-cules (Figs 1, 2 and 4d) represents another level of the genetic code,one which we understand the least at present. This work makes clearthat there is much to be discovered by broad structural analyses ofRNA genomes and intact mRNAs.

    METHODS SUMMARYFull length HIV-1 genomic RNA was gently purified from NL4-3 virions

    (GenBank accession AF324493). The RNA was equilibrated in a native buffer(50 mM HEPES (pH 8.0), 200 mM potassium acetate (pH 8.0), 3 mM MgCl2)

    at 37 uC for 15 min and treated with 1M7 (ref. 10). Sites of 29-hydroxyl modifica-tion were identified over read lengths spanning several hundred nucleotides using

    31 primer extension reactions resolved by fluorescence-detected capillary electro-

    phoresis6,11. Pairing probabilities were determined using RNA-Decoder13 and

    secondary structure models were developed by incorporating SHAPE reactivities

    as a pseudo-free-energy change term, in conjunction with nearest-neighbour

    parameters, in an accurate thermodynamics-based prediction algorithm22,23.

    Full Methods and any associated references are available in the online version ofthe paper at www.nature.com/nature.

    Received 11 May; accepted 22 June 2009.

    1. Cann, A. J. Principles of Molecular Virology Ch. 2–5 (Elsevier, 2005).2. Coffin, J. M., Hughes, S. H. & Varmus, H. E. Retroviruses (Cold Spring Harbor

    Laboratory Press, 1997).3. Frankel, A. D. & Young, J. A. HIV-1: fifteen proteins and an RNA. Annu. Rev.

    Biochem. 67, 1–25 (1998).4. Damgaard, C. K., Andersen, E. S., Knudsen, B., Gorodkin, J. & Kjems, J. RNA

    interactions in the 59 region of the HIV-1 genome. J. Mol. Biol. 336, 369–379 (2004).5. Goff, S. P. Host factors exploited by retroviruses. Nature Rev. Microbiol. 5, 253–263

    (2007).6. Wilkinson, K. A. et al. High-throughput SHAPE analysis reveals structures in HIV-1

    genomic RNA strongly conserved across distinct biological states. PLoS Biol. 6,e96 (2008).

    7. Levin, J. G., Guo, J., Rouzina, I. & Musier-Forsyth, K. Nucleic acid chaperoneactivity of HIV-1 nucleocapsid protein: critical role in reverse transcription andmolecular mechanism. Prog. Nucleic Acid Res. Mol. Biol. 80, 217–286 (2005).

    8. Paillart, J. C., Skripkin, E., Ehresmann, B., Ehresmann, C. & Marquet, R. In vitroevidence for a long range pseudoknot in the 59-untranslated and matrix codingregions of HIV-1 genomic RNA. J. Biol. Chem. 277, 5995–6004 (2002).

    9. Merino, E. J., Wilkinson, K. A., Coughlan, J. L. & Weeks, K. M. RNA structureanalysis at single nucleotide resolution by selective 29-hydroxyl acylation andprimer extension (SHAPE). J. Am. Chem. Soc. 127, 4223–4231 (2005).

    10. Mortimer, S. A. & Weeks, K. M. A fast-acting reagent for accurate analysis of RNAsecondary and tertiary structure by SHAPE chemistry. J. Am. Chem. Soc. 129,4144–4145 (2007).

    11. Vasa, S. M., Guex, N., Wilkinson, K. A., Weeks, K. M. & Giddings, M. C.ShapeFinder: a software system for high-throughput quantitative analysis ofnucleic acid reactivity information resolved by capillary electrophoresis. RNA 14,1979–1990 (2008).

    12. Gherghe, C. M., Shajani, Z., Wilkinson, K. A., Varani, G. & Weeks, K. M. Strongcorrelation between SHAPE chemistry and the generalized NMR order parameter(S2) in RNA. J. Am. Chem. Soc. 130, 12244–12245 (2008).

    13. Pedersen, J. S., Meyer, I. M., Forsberg, R., Simmonds, P. & Hein, J. A comparativemethod for finding and folding RNA secondary structures within protein-codingregions. Nucleic Acids Res. 32, 4925–4936 (2004).

    14. Leitner, T. et al. HIV Sequence Compendium (Theoretical Biology and BiophysicsGroup, 2005).

    15. Purcell, D. F. & Martin, M. A. Alternative splicing of human immunodeficiencyvirus type 1 mRNA modulates viral protein expression, replication, and infectivity.J. Virol. 67, 6365–6378 (1993).

    16. Kwong, P. D. et al. Structure of an HIV gp120 envelope glycoprotein in complexwith the CD4 receptor and a neutralizing human antibody. Nature 393, 648–659(1998).

    17. Komar, A. A. A pause for thought along the co-translational folding pathway.Trends Biochem. Sci. 34, 16–24 (2009).

    18. Farabaugh, P. J. Programmed translational frameshifting. Microbiol. Rev. 60,103–134 (1996).

    19. Wen, J. D. et al. Following translation by single ribosomes one codon at a time.Nature 452, 598–603 (2008).

    20. Nackley, A. G. et al. Human catechol-O-methyltransferase haplotypes modulateprotein expression by altering mRNA secondary structure. Science 314,1930–1933 (2006).

    21. Hartz, D., McPheeters, D. S., Traut, R. & Gold, L. Extension inhibition analysis oftranslation initiation complexes. Methods Enzymol. 164, 419–425 (1988).

    22. Mathews, D. H. et al. Incorporating chemical modification constraints into adynamic programming algorithm for prediction of RNA secondary structure. Proc.Natl Acad. Sci. USA 101, 7287–7292 (2004).

    23. Deigan, K. E., Li, T. W., Mathews, D. H. & Weeks, K. M. Accurate SHAPE-directedRNA structure prediction. Proc. Natl Acad. Sci. USA 106, 97–102 (2009).

    24. Kim, D. H. et al. Synthetic dsRNA Dicer substrates enhance RNAi potency andefficacy. Nature Biotechnol. 23, 222–226 (2005).

    25. Stein, B. S. & Engleman, E. G. Intracellular processing of the gp160 HIV-1 envelopeprecursor. Endoproteolytic cleavage occurs in a cis or medial compartment of theGolgi complex. J. Biol. Chem. 265, 2640–2649 (1990).

    26. Wilson, W. et al. HIV expression strategies: ribosomal frameshifting is directed bya short sequence in both mammalian and yeast systems. Cell 55, 1159–1169(1988).

    27. Giedroc, D. P., Theimer, C. A. & Nixon, P. L. Structure, stability and function of RNApseudoknots involved in stimulating ribosomal frameshifting. J. Mol. Biol. 298,167–185 (2000).

    28. Biswas, P., Jiang, X., Pacchia, A. L., Dougherty, J. P. & Peltz, S. W. The humanimmunodeficiency virus type 1 ribosomal frameshifting site is an invariantsequence determinant and an important target for antiviral therapy. J. Virol. 78,2082–2087 (2004).

    Supplementary Information is linked to the online version of the paper atwww.nature.com/nature.

    Acknowledgements This project was supported by the US National Institutes ofHealth (AI068462 to K.M.W.) and by the National Cancer Institute, undercontracts N01-CO-12400 and HHSN261200800001E (to R.J.G. and J.W.B.).J.M.W. was supported as a Fellow of the UNC Lineberger Cancer Center and aNational Institutes of Health (NIH) Kirschstein Postdoctoral Fellowship. R.S. andK.K.D. were supported by NIH grants AI44667 and T32 AI07419, respectively. Weare indebted to D. Mathews and J. Low for assistance with the RNA structureprogram and genome secondary structure analysis, respectively. The content ofthis publication does not necessarily reflect the views or policies of the USDepartment of Health and Human Services, nor does mention of trade names,commercial products, or organizations indicate endorsement by the USGovernment.

    Author Contributions J.M.W., R.J.G. and K.M.W. conceived of and designed theHIV-1 genome structure analysis project. J.M.W. and K.M.W. analysed andinterpreted the HIV SHAPE structure information. K.K.D., R.S. and C.L.B. designedand performed the bioinformatic pairing probability analysis. J.M.W., R.J.G. andC.W.L. performed the experiments. J.M.W., C.L.B. and K.M.W. performed thestatistical analyses. J.W.B. produced and purified HIV-1 virions. J.M.W. and K.M.W.wrote the manuscript with contributions from all authors.

    Author Information Reprints and permissions information is available atwww.nature.com/reprints. Correspondence and requests for materials should beaddressed to K.M.W. ([email protected]).

    ARTICLES NATURE | Vol 460 | 6 August 2009

    716 Macmillan Publishers Limited. All rights reserved©2009

    www.nature.com/naturewww.nature.com/naturewww.nature.com/reprintsmailto:[email protected]

  • METHODSVirus production. HIV-1 strain NL4-3 (group M, subtype B) was used to infect anon-Hodgkin’s T cell lymphoma cell line (a modified version of the SupT1 cell

    line, which was a gift from J. Hoxie)29. The virus-containing inoculum for

    infecting SupT1 cells was generated by CaPO4/DNA coprecipitation30 and sub-

    sequent transfection of pNL43 (NIH AIDS Research and Reference Reagent

    Program; GenBank accession AF324493) into 293T cells31. HIV-1 virions were

    purified as described32 except cells were removed using a Millipore Opticap XL-

    5.0 micron filter. The total protein and CAp24 yields were 20.7 mg and 2.3 mg,

    on the basis of total protein (BioRad DC protein assay) and HPLC with sub-

    sequent amino acid analysis assays, respectively.

    Virions were purified from cellular debris by subtilisin treatment and cent-

    rifugation through a sucrose cushion. Concentrated virions (in 19 ml, corres-

    ponding to 19 l of infected cell-culture supernatant) were digested with subtilisin

    (1 mg ml21, in 20 mM Tris (pH 8.0), 1 mM CaC12, 37 uC, 18 h; stopped by theaddition of 5mg ml21 phenylmethylsulphonyl fluoride33). The resulting solutioncontained digested cellular proteins and viral particles free of surface proteins.

    The sample was centrifuged through a cushion of 20% (w/v) sucrose in PBS

    (Beckman SW41 rotor, 235,000g, 2 h, 4 uC); supernatant was carefully removed,and residual sucrose in the pellet was removed by overlaying PBS and repeating

    the centrifugation step (1 h at 4 uC).RNA extraction. The key features of this protocol are that genomic RNA wasgently extracted from purified virions in the presence of buffers containing

    monovalent and divalent ions, consistent with maintaining RNA secondary

    and tertiary structure. The HIV genomic RNA was not denatured by heat, chem-

    ical denaturants, magnesium chelation, or removal of monovalent cations dur-

    ing this process. Subtilisin-treated virions were suspended in virion lysis buffer

    (VLB; 50 mM HEPES (pH 8.0), 200 mM NaCl, and 3 mM MgCl2) and lysed with

    1% (w/v) SDS and 100mg ml21 proteinase K (,22 uC, 30 min). The digest wasextracted three times with phenol/chloroform/isoamyl alcohol (25:24:1, pre-

    equilibrated with VLB), followed by two extractions with pure chloroform.

    Quantitative reverse-transcriptase PCR was used to quantify viral RNA yield