Mycobacteriophages: Genes and Genomesbio.classes.ucsc.edu/bio121l/reading/Annu Rev Microbiol...

Click here to load reader

Transcript of Mycobacteriophages: Genes and Genomesbio.classes.ucsc.edu/bio121l/reading/Annu Rev Microbiol...

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    Mycobacteriophages:Genes and GenomesGraham F. HatfullPittsburgh Bacteriophage Institute, Department of Biological Sciences,University of Pittsburgh, Pittsburgh, Pennsylvania 15260; email: [email protected]

    Annu. Rev. Microbiol. 2010. 64:33156

    First published online as a Review in Advance onJune 7, 2010

    The Annual Review of Microbiology is online atmicro.annualreviews.org

    This articles doi:10.1146/annurev.micro.112408.134233

    Copyright c 2010 by Annual Reviews.All rights reserved

    0066-4227/10/1013-0331$20.00

    Key Wordsbacteriophage, genome evolution, genomics, tuberculosis,mycobacteria

    AbstractViruses are powerful tools for investigating and manipulating theirhosts, but the enormous size and amazing genetic diversity of the bacte-riophage population have emerged as something of a surprise. In lightof the evident importance of mycobacteria to human healthespeciallyMycobacterium tuberculosis, which causes tuberculosisand the difficul-ties that have plagued their genetic manipulation, mycobacteriophagesare especially appealing subjects for discovery, genomic characteriza-tion, and manipulation. With more than 70 complete genome sequencesavailable, the mycobacteriophages have provided a wealth of informa-tion on the diversity of phages that infect a common bacterial host,revealed the pervasively mosaic nature of phage genome architectures,and identified a huge number of genes of unknown function. My-cobacteriophages have provided key tools for tuberculosis genetics, andnew methods for simple construction of mycobacteriophage recombi-nants will facilitate postgenomic explorations into mycobacteriophagebiology.

    331

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    Mycobacteriophage:a bacteriophage thatinfects mycobacterialhosts

    ContentsINTRODUCTION . . . . . . . . . . . . . . . . . . 332GENERAL PROPERTIES OF

    MYCOBACTERIOPHAGES . . . . . . 333Mycobacteriophage Virion

    Morphologies. . . . . . . . . . . . . . . . . . . 333Host Range and Host Range

    Determinants . . . . . . . . . . . . . . . . . . . 333Life Cycles . . . . . . . . . . . . . . . . . . . . . . . . 337

    MYCOBACTERIOPHAGEGENOMICS . . . . . . . . . . . . . . . . . . . . . . 337Sequenced Mycobacteriophage

    Genomes . . . . . . . . . . . . . . . . . . . . . . . 337Overview of Genomic Diversity . . . . 338Genome Organizations. . . . . . . . . . . . . 339Genome Mosaicism . . . . . . . . . . . . . . . . 342Mechanisms for Generating

    Mosaic Genomes . . . . . . . . . . . . . . . 344Transposons and Other Mobile

    Elements . . . . . . . . . . . . . . . . . . . . . . . 346MYCOBACTERIAL GENE

    FUNCTION ANDEXPRESSION . . . . . . . . . . . . . . . . . . . . 347Lysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347Integration and Prophage

    Maintenance. . . . . . . . . . . . . . . . . . . . 347Gene Expression and Its

    Regulation. . . . . . . . . . . . . . . . . . . . . . 349Other Mycobacteriophage

    Gene Functions . . . . . . . . . . . . . . . . . 349MYCOBACTERIOPHAGE

    GENETIC MANIPULATION . . . . 349SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . 350

    INTRODUCTIONMycobacteriophages are viruses that infect my-cobacterial hosts. Interest in mycobacterio-phages began in the late 1940s with the isola-tion of phages that infect Mycobacterium smeg-matis (31, 121), followed shortly by phages thatinfect Mycobacterium tuberculosis (27). A pri-mary motivation of these early studies was totype mycobacterial clinical isolates, which wasfurther advanced by collecting sizable num-bers of mycobacteriophages from a variety

    of environmental and clinical sources (37, 57,105). The use of mycobacteriophages for typingpurposes dominated the literature over the next35 years, although important advances weremade in understanding mycobacteriophage bi-ology including the use of phage I3 as a gen-eralized transducing phage for M. smegmatis(91), lysogeny in environmental and clinicalstrains (55, 72, 77), visualization by electron mi-croscopy (100), and transfection of mycobacte-riophage DNA (59, 114).

    Mycobacteriophages emerged in the late1980s as key players in the establishment ofa facile genetic system for the mycobacteria(50). A breakthrough was established in 1987 byJacobs et al., who used phage TM4 to constructnovel shuttle phasmids that replicate as largecosmids in Escherichia coli and as phages in my-cobacteria (53). These shuttle phasmids can bemanipulated in E. coli using standard geneticengineering approaches and used to efficientlyintroduce foreign genes into mycobacteria. Inthe absence of other methods for direct manip-ulation of mycobacteriophage genomes, shuttlephasmids have proven invaluable for specializedtransduction (1), transposon delivery (2, 98),and diagnostic introduction of reporter genes(51, 88). They also facilitated the use of an-tibiotic selectable markers through temperatephage L1 shuttle phasmids (103) and character-ization of high-efficiency transformation mu-tants of M. smegmatis (104).

    A notable feature of shuttle phasmidconstruction is that it does not require phagegenomic information (52). However, realiza-tion of the full potential of mycobacteriophagesfor contributing to an understanding of theirhosts clearly requires genomic characterization,and the first sequenced genome was that of my-cobacteriophage L5 in 1993 (46). As the tech-nologies for DNA sequencing advanced and be-came both quicker and cheaper, a large collec-tion of complete mycobacteriophage genomesequences has emerged, revealing a delight-fully complex, diverse, and interesting set ofgenomes. Seventy genome sequences are avail-able in GenBank (Table 1) and a comparativeanalysis of 60 of these has been described (44).

    332 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    dsDNA: double-stranded DNA

    Mycobacteriophages hold considerablepromise for elucidating phage diversity andevolution, gaining novel insights into thephysiology and perhaps virulence of their my-cobacterial hosts, and aiding the developmentof tools for mycobacterial genetics. In thisreview I focus primarily on the first of these,although the last two aspects have been greatlyexplored, providing insights into biofilmformation (80), cell wall composition (82, 87),tools for transposon delivery (2), reportergene delivery (51), gene replacement (1, 118),point mutagenesis (119), single copy vectors(65), and non-antibiotic selectable markers(23), among others. Several additional reviewsprovide the reader with further informationabout mycobacteriophage genomics and appli-cations (3943, 75, 76). As our understandingof mycobacteriophage genomics expands, itwill undoubtedly invigorate further utilitiesand insights.

    GENERAL PROPERTIES OFMYCOBACTERIOPHAGES

    Mycobacteriophage VirionMorphologies

    All the characterized mycobacteriophages aredouble-stranded DNA (dsDNA) tailed phagesbelonging to the order Caudovirales. Most (61of 70) are of the family Siphoviridae, character-ized by relatively long flexible noncontractiletails, whereas nine are of the family Myoviri-dae, containing contractile tails (44). There isa notable absence of phages from the familyPodoviridae (containing short stubby tails), al-though it is unclear whether their absence isdue to evolutionary constraints or to physicalproblems in traversing the complex and rela-tively thick mycobacterial cell envelope.

    Although the nine myoviral mycobacterio-phages (Table 1) are morphologically indistin-guishable, the siphoviruses show considerablevariation. For example, the tail lengths vary byalmost a factor of three (105 to 300 nm) andthe structures at the tail tips are discernibly dif-ferent in many of these phages (44). For the

    most part, the heads are isometric, althoughthreeCorndog, Che9c, and Brujitacontainprolate heads, with the most extreme beingCorndog, whose length-to-width ratio is almostfour; the previously described but unsequencedphage R1 (106) has a prolate head similar to thatof Che9c and Brujita (44). Those with isometricheads span a range of sizes, with the smallest be-ing BPs and Halo (48 nm in diameter) and thelargest being Bxz1 and its relatives (85 nm indiameter). In general, the capsid size correlateswith genome size, suggesting there is a rela-tively constant DNA packaging density (44).

    Host Range andHost Range DeterminantsThe early phage-typing studies showed thatmycobacteriophages can have an almost end-less variety of preferences for different bacterialhosts. Some phages (e.g., D29) have broadhost ranges and infect many species of bothfast-growing and slowly growing mycobacteria,including M. smegmatis and M. tuberculosis (94),whereas others (e.g., Barnyard) have verynarrow preferences and infect only a singleknown host (94). At least one phage (DS6A) hasbeen reported whose host range is restricted tostrains composing the M. tuberculosis complex(10, 56), although only a partial genomesequence of this potentially extremely usefuland interesting phage is available. Severalphages discriminate between strains or isolatesof a particular species, and we note that phage33D differentiates between BCG strains andMycobacterium bovis, and several phages havepreferences for specific strains of M. smegmatis(C. Bowman, G. Broussard, D. Jacobs-Sera &G.F. Hatfull, unpublished observations).

    For the most part, the molecular and ge-netic barriers to mycobacteriophage host rangepreferences are not known. Presumably, dif-ferentiation occurs at the cell surface due tothe presence or absence of specific receptors,from the need for particular metabolic re-quirements after DNA has been injected intothe cell, or from specific phage protectionmechanisms such as immunity and restriction.

    www.annualreviews.org Mycobacteriophages 333

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    Tab

    le1

    Gen

    omet

    rics

    of70

    sequ

    ence

    dm

    ycob

    acte

    riop

    hage

    geno

    mes

    a

    Pha

    ge

    Size

    (bp)

    G

    C%

    N

    o. o

    f O

    RF

    s tR

    NA

    #

    tmR

    NA

    #

    End

    s A

    cces

    sion

    no.

    C

    lust

    er

    Ori

    gins

    R

    efer

    ence

    Bet

    hleh

    em

    52,2

    50

    63.3

    87

    0

    0 10

    -bas

    e 3

    A

    Y50

    0153

    A

    1 B

    ethl

    ehem

    , PA

    45

    Bxb

    1 50

    ,550

    63

    .7

    86

    0 0

    9-ba

    se 3

    A

    F271

    693

    A1

    Bro

    nx, N

    Y

    76a

    DD

    5 51

    ,621

    63

    .4

    87

    0 0

    10-b

    ase

    3

    EU

    7442

    52

    A1

    Upp

    . St.

    Cla

    ir, P

    A

    44

    Jasp

    er

    50,9

    68

    63.7

    94

    0

    0 10

    -bas

    e 3

    E

    U74

    4251

    A

    1 L

    exin

    gton

    , MA

    44

    KB

    G

    53,5

    72

    63.6

    89

    0

    0 10

    -bas

    e 3

    E

    U74

    4248

    A

    1 K

    entu

    cky

    44

    Loc

    kley

    51

    ,478

    63

    .4

    90

    0 0

    10-b

    ase

    3

    EU

    7442

    49

    A1

    Pitts

    burg

    h, P

    A

    44

    Solo

    n 49

    ,487

    63

    .8

    86

    0 0

    10-b

    ase

    3

    EU

    8264

    70

    A1

    Solo

    n, IA

    44

    U2

    51,2

    77

    63.7

    81

    0

    0 10

    -bas

    e 3

    A

    Y50

    0152

    A

    1 B

    ethl

    ehem

    , PA

    45

    Che

    12

    52,0

    47

    62.9

    98

    3

    0 10

    -bas

    e 3

    D

    Q39

    8043

    A

    2 C

    henn

    ai, I

    ndia

    45

    D29

    49

    ,136

    63

    .5

    77

    5 0

    9-ba

    se 3

    A

    F022

    214

    A2

    Cal

    ifor

    nia

    24

    L5

    52,2

    97

    62.3

    85

    3

    0 9-

    base

    3

    Z18

    946

    A2

    Japa

    n 46

    Puko

    vnik

    52

    ,892

    63

    .3

    88

    1 0

    10-b

    ase

    3

    EU

    7442

    50

    A2

    Ft. B

    ragg

    , NC

    44

    Peac

    hes

    51,3

    76

    63.9

    86

    0

    0 10

    -bas

    e 3

    G

    Q30

    3263

    .1

    A2

    Mon

    roe,

    LA

    U

    npub

    lishe

    d da

    ta

    Bxz

    2 50

    ,913

    64

    .2

    86

    3 0

    10-b

    ase

    3

    AY

    1293

    32

    A2

    Bro

    nx, N

    Y

    83

    Cha

    h 68

    ,450

    66

    .5

    104

    0 0

    Cir

    c Pe

    rm

    FJ17

    4694

    B

    1 R

    uffs

    dale

    , PA

    44

    Col

    bert

    67

    ,774

    66

    .5

    100

    0 0

    Cir

    c Pe

    rm

    GQ

    3032

    59.1

    B

    1 C

    orva

    llis,

    OR

    U

    npub

    lishe

    d da

    ta

    Ori

    on

    68,4

    27

    66.5

    10

    0 0

    0 C

    irc

    Perm

    D

    Q39

    8046

    B

    1 Pi

    ttsbu

    rgh,

    PA

    45

    PG1

    68,9

    99

    66.5

    10

    0 0

    0 C

    irc

    Perm

    A

    F547

    430

    B1

    Pitts

    burg

    h, P

    A

    45

    Puhl

    toni

    o 68

    ,323

    66

    .4

    97

    0 0

    Cir

    c Pe

    rm

    GQ

    3032

    64.1

    B

    1 B

    altim

    ore,

    MD

    U

    npub

    lishe

    d da

    ta

    Unc

    leH

    owie

    68

    ,016

    66

    .5

    98

    0 0

    Cir

    c Pe

    rm

    GQ

    3032

    66.1

    B

    1 St

    . Lou

    is, M

    O

    Unp

    ublis

    hed

    data

    Qyr

    zula

    67

    ,188

    69

    .0

    81

    0 0

    Cir

    c Pe

    rm

    DQ

    3980

    48

    B2

    Pitts

    burg

    h, P

    A

    45

    Ros

    ebus

    h 67

    ,480

    69

    .0

    90

    0 0

    Cir

    c Pe

    rm

    AY

    1293

    34

    B2

    Lat

    robe

    , PA

    83

    Phae

    drus

    68

    ,090

    67

    .6

    98

    0 0

    Cir

    c Pe

    rm

    EU

    8165

    89

    B3

    Pitts

    burg

    h, P

    A

    44

    Phly

    er

    69,3

    78

    67.5

    10

    3 0

    0 C

    irc

    Perm

    FJ

    6411

    82.1

    B

    3 Pi

    ttsbu

    rgh,

    PA

    U

    npub

    lishe

    d da

    ta

    Pipe

    fish

    69

    ,059

    67

    .3

    102

    0 0

    Cir

    c Pe

    rm

    DQ

    3980

    49

    B3

    Pitts

    burg

    h, P

    A

    45

    Coo

    per

    70,6

    54

    69.1

    99

    0

    0 C

    irc

    Perm

    D

    Q39

    8044

    B

    4 Pi

    ttsbu

    rgh,

    PA

    45

    Nig

    el

    69,9

    04

    68.3

    94

    1

    0 C

    irc

    Perm

    E

    U77

    0221

    B

    4 Pi

    ttsbu

    rgh,

    PA

    44

    Bxz

    1 15

    6,10

    2 64

    .8

    225

    35

    1 C

    irc

    Perm

    A

    Y12

    9337

    C

    1 B

    ronx

    , NY

    83

    Cal

    i 15

    5,37

    2 64

    .7

    222

    35

    1 C

    irc

    Perm

    E

    U82

    6471

    C

    1 Sa

    nta

    Cla

    ra, C

    A

    44

    334 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    Cat

    era

    153,

    766

    64.7

    21

    8 35

    1

    Cir

    c Pe

    rm

    DQ

    3980

    53

    C1

    Pitts

    burg

    h, P

    A

    45

    ET

    08

    155,

    445

    64.6

    22

    1 30

    1

    Cir

    c Pe

    rm

    GQ

    3032

    60.1

    C

    1 Sa

    n D

    iego

    , CA

    U

    npub

    lishe

    d da

    ta

    LR

    RH

    ood

    154,

    349

    64.7

    22

    7 30

    1

    Cir

    c Pe

    rm

    GQ

    3032

    62.1

    C

    1 Sa

    nta

    Cru

    z, C

    A

    Unp

    ublis

    hed

    data

    Riz

    al

    153,

    894

    64.7

    22

    0 35

    1

    Cir

    c Pe

    rm

    EU

    8264

    67

    C1

    Pitts

    burg

    h, P

    A

    44

    Scot

    t McG

    15

    4,01

    7 64

    .8

    221

    35

    1 C

    irc

    Perm

    E

    U82

    6469

    C

    1 Pi

    ttsbu

    rgh,

    PA

    44

    Spud

    15

    4,90

    6 64

    .8

    222

    35

    1 C

    irc

    Perm

    E

    U82

    6468

    C

    1 Pi

    ttsbu

    rgh,

    PA

    44

    Myr

    na

    164,

    602

    65.4

    22

    9 41

    0

    Cir

    c Pe

    rm

    EU

    8264

    66

    C2

    Upp

    . St.

    Cla

    ir, P

    A

    44

    Adj

    utor

    64

    ,511

    59

    .7

    86

    0 0

    Cir

    c Pe

    rm

    EU

    6760

    00

    D

    Pitts

    burg

    h, P

    A

    44

    But

    ters

    cotc

    h 64

    ,562

    59

    .7

    86

    0 0

    Cir

    c Pe

    rm

    FJ16

    8660

    D

    Pi

    ttsbu

    rgh,

    PA

    44

    Gum

    ball

    64,8

    07

    59.6

    88

    0

    0 C

    irc

    Perm

    FJ

    1686

    61

    D

    Pitts

    burg

    h, P

    A

    44

    P-lo

    t 64

    ,787

    59

    .7

    89

    0 0

    Cir

    c Pe

    rm

    DQ

    3980

    51

    D

    Pitts

    burg

    h, P

    A

    45

    PBI1

    64

    ,494

    59

    .7

    81

    0 0

    Cir

    c Pe

    rm

    DQ

    3980

    47

    D

    Pitts

    burg

    h, P

    A

    45

    Tro

    ll4

    64,6

    18

    59.6

    88

    0

    0 C

    irc

    Perm

    FJ

    1686

    62

    D

    Silv

    er S

    prin

    gs, M

    D

    44

    244

    74,4

    83

    62.9

    14

    2 2

    0 9-

    base

    3

    DQ

    3980

    41

    E

    Pitts

    burg

    h, P

    A

    45

    Cjw

    1 75

    ,931

    63

    .1

    141

    2 0

    9-ba

    se 3

    A

    Y12

    9331

    E

    Pi

    ttsbu

    rgh,

    PA

    83

    Kos

    tya

    75,8

    11

    62.9

    14

    3 2

    0 9-

    base

    3

    EU

    8165

    91

    E

    Was

    hing

    ton,

    DC

    44

    Pork

    y 76

    ,312

    62

    .8

    147

    2 0

    9-ba

    se 3

    E

    U81

    6588

    E

    C

    onco

    rd, M

    A

    44

    Pum

    pkin

    74

    ,491

    63

    .0

    143

    2 0

    9-ba

    se 3

    G

    Q30

    3265

    .1

    E

    Hol

    land

    , MI

    Unp

    ublis

    hed

    data

    Boo

    mer

    58

    ,037

    61

    .1

    105

    0 0

    10-b

    ase

    3

    EU

    8165

    90

    F1

    Pitts

    burg

    h, P

    A

    44

    Che

    8 59

    ,471

    61

    .3

    112

    0 0

    10-b

    ase

    3

    AY

    1293

    30

    F1

    Che

    nnai

    , Ind

    ia

    83

    Frui

    tloop

    58

    ,471

    61

    .8

    102

    0 0

    10-b

    ase

    3

    FJ17

    4690

    F1

    L

    atro

    be, P

    A

    44

    Llij

    56

    ,852

    61

    .5

    100

    0 0

    10-b

    ase

    3

    DQ

    3980

    45

    F1

    Pitts

    burg

    h, P

    A

    45

    Pacc

    40

    58,5

    54

    61.3

    10

    1 0

    0 10

    -bas

    e 3

    FJ

    1746

    92

    F1

    Pitts

    burg

    h, P

    A

    44

    PMC

    56

    ,692

    61

    .4

    104

    0 0

    10-b

    ase

    3

    DQ

    3980

    50

    F1

    Pitts

    burg

    h, P

    A

    45

    Ram

    sey

    58,5

    78

    61.2

    10

    8 0

    0 10

    -bas

    e 3

    FJ

    1746

    93

    F1

    Whi

    te B

    ear,

    MN

    44

    Tw

    eety

    58

    ,692

    61

    .7

    109

    0 0

    10-b

    ase

    3

    EF5

    3606

    9 F1

    Pi

    ttsbu

    rgh,

    PA

    86

    Che

    9d

    56,2

    76

    60.9

    11

    1 0

    0 10

    -bas

    e 3

    A

    Y12

    9336

    F2

    C

    henn

    ai, I

    ndia

    83

    Ang

    el

    41,4

    41

    66.7

    61

    0

    0 11

    -bas

    e 3

    E

    U56

    8876

    .1

    G

    OH

    ara

    Tw

    p, P

    A

    96

    BPs

    41

    ,901

    66

    .6

    63

    0 0

    11-b

    ase

    3

    EU

    5688

    76

    G

    Pitts

    burg

    h, P

    A

    96

    Hal

    o 42

    ,289

    66

    .7

    64

    0 0

    11-b

    ase

    3

    DQ

    3980

    42

    G

    Pitts

    burg

    h, P

    A

    45

    Hop

    e 41

    ,901

    66

    .6

    63

    0 0

    11-b

    ase

    3

    GQ

    3032

    61.1

    G

    A

    tlant

    a, G

    A

    Unp

    ublis

    hed

    data

    Kon

    stan

    tine

    68,9

    52

    57.3

    95

    0

    0 C

    irc

    Perm

    FJ

    1746

    91

    H1

    Pitts

    burg

    h, P

    A

    44

    Pred

    ator

    70

    ,110

    56

    .3

    92

    0 0

    Cir

    c Pe

    rm

    (Continued

    )

    EU

    7702

    22

    H1

    Don

    egal

    , PA

    44

    www.annualreviews.org Mycobacteriophages 335

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    Tab

    le1

    (Con

    tinue

    d)

    Bar

    nyar

    d 70

    ,797

    57

    .3

    109

    0 0

    Cir

    c Pe

    rm

    AY

    1293

    39

    H2

    Lat

    robe

    , PA

    83

    Bru

    jita

    47,0

    57

    66.8

    74

    0

    0 11

    -bas

    e 3

    FJ

    1686

    59

    I V

    irgi

    nia

    44

    Che

    9c

    57,0

    50

    65.4

    84

    0

    0 10

    -bas

    e 3

    A

    Y12

    9333

    I

    Che

    nnai

    , Ind

    ia

    83

    Cor

    ndog

    69

    ,777

    65

    .4

    99

    0 0

    4-ba

    se 3

    A

    Y12

    9335

    N

    on

    Pitts

    burg

    h, P

    A

    83

    Gile

    s 53

    ,746

    67

    .5

    78

    0 0

    14-b

    ase

    3

    EU

    2035

    71

    Non

    Pi

    ttsbu

    rgh,

    PA

    78

    Om

    ega

    110,

    865

    61.4

    23

    7 2

    0 4-

    base

    3

    AY

    1293

    38

    Non

    U

    pp. S

    t. C

    lair

    , PA

    83

    TM

    4 52

    ,797

    68

    .1

    89

    0 0

    10-b

    ase

    3

    AF0

    6884

    5 N

    on

    Col

    orad

    o 25

    Wild

    cat

    78,2

    96

    56.9

    14

    8 24

    1

    11-b

    ase

    3

    DQ

    3980

    52

    Non

    L

    atro

    be, P

    A

    45

    TO

    TA

    L

    5,07

    8,09

    0

    7930

    36

    3

    AV

    ER

    AG

    E

    73,5

    95.5

    63

    .7

    114.

    9 5.

    26

    a Col

    ored

    sha

    ding

    cor

    resp

    onds

    to g

    enom

    e gr

    oupi

    ngs

    acco

    rdin

    g to

    clu

    ster

    rela

    tions

    hips

    .

    For many mycobacteriophages the barriers ap-pear to be absolute, and no plaques are observedon a nonpermissive host even after plating largenumbers of phage particles. For other phagesplaques are observed at modest plating efficien-cies (104 to 106) on a nonpermissive host, andphages BPs and Halowhich were isolated onM. smegmatisform plaques on M. tuberculo-sis at a frequency of 105 (96). Further char-acterization shows that the plaques recoveredon M. tuberculosis are expanded host range mu-tants that infect both strains with equal platingefficiency (96).

    Although mycobacteriophage host prefer-ences are expected to be strongly dominated bythe availability of specific cellular receptors, fewhave been identified or studied. Lipid extractsof M. smegmatis have been shown to inhibitinfection by phages D29 and the uncharacter-ized D4 (113), and a specific peptidoglycolipid,mycoside C(sm), has been purified and pro-posed to play a role in phage D4 binding (29).Glycolipids may act as receptors for adsorptionof mycobacteriophage Phlei (8), and a subset oflyxose-containing molecules has been furtherchemically characterized (60). More recently,a single methylated rhamnose residue on theM. smegmatis cell wallassociated glycopep-tidolipid has been shown to be involved inadsorption of phage I3 (16).

    Isolation of spontaneous M. smegmatis mu-tants resistant to D29 infection is simplified bythe high efficiency with which this phage kills itshost. However, characterization of the mutantsis complicated by their poor growth and ge-netic instability. Surprisingly, robust resistanceto D29 can arise through simple overexpressionof the wild-type M. smegmatis mpr gene whenpresent on an extrachromosomal plasmid (4),from an integrated mpr gene expressed from astrong promoter (4), or by transposon activa-tion (93). It is thus plausible that spontaneousD29 resistance occurs through localized geneamplification at the mpr locus, leading to en-hanced expression of the Mpr protein, and thatthe locus reduces back to a single copy when se-lective pressure is removed. It is not clear what

    336 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    CRISPR: clusteredregularly interspacedshort palindromicrepeat

    the normal cellular function of mpr is, or whympr overexpression gives D29 resistance.

    In many bacterial hosts, clustered regu-larly interspaced short palindromic repeats(CRISPRs) play roles in phage resistance (3,116). Most sequenced mycobacterial genomesdo not appear to have CRISPRs, with the excep-tions being M. tuberculosis H37Rv (and relatedstrains) and M. avium strain 104. The CRISPRsare composed of short direct repeats (2147 bp) separated by short (3050 bp) uniquespacer sequences, and in the well-characterizedCRISPRs the spacers have near sequence iden-tity with phage genomic sequences, an im-portant component for phage resistance (117).The mycobacterial CRISPR spacer sequencesdo not have compellingly similar counterpartsin any of the sequenced mycobacteriophagegenomes, consistent with the idea that manyphages of these hosts remain unidentified.

    Life CyclesdsDNA tailed phages canonically are eithertemperate, forming stable lysogens at moderatefrequencies (e.g., lambda), or lytic, such that allinfections lead to phage growth and cell death(e.g., T4 and T7). Classification of mycobac-teriophages into two such groups is, however,complex. A good example of a temperate phageis L5, which forms obviously turbid plaquesfrom which stable lysogens immune to superin-fection can be readily isolated (23); in contrast,D29 forms completely clear plaques in whichvirtually all host cells are killed. Genomic anal-ysis, however, shows that D29 is a clear-plaquederivative of an L5-like temperate parent, notof a T4-like or T7-like phage (24). Of the ge-nomically characterized phages, 12 others (theCluster A phages; Table 1) behave similarlyto L5. Most other mycobacteriophages formlightly turbid plaques, rather than clear or obvi-ously turbid ones, and for Tweety, Giles, BPs,and Halo this reflects the ability to form lyso-gens at relatively low frequencies (35%) (78,86, 96). Approximately one-half of the char-acterized mycobacteriophages (36 of 70) havean integration cassette and are candidates for

    forming lysogens, albeit at relatively low fre-quency. Phage such as Bxz1 and its relativesalso form hazy plaques, although it is unclearwhether the cellular survivors are uninfectedcells, resistant mutants, or lysogens.

    MYCOBACTERIOPHAGEGENOMICS

    Sequenced MycobacteriophageGenomes

    The first completely sequenced mycobacterio-phage genome was that of phage L5 (46), atemperate phage isolated in Japan (22); it is aclose relative of phage L1, which shares a simi-lar restriction pattern but does not grow at 42C(65). Both L5 and L1 infect fast-growing andslowly growing mycobacterial strains, althoughefficient infection of slow-growers by L5 re-quires the presence of high calcium concentra-tions (28). Although the sequence of L1 has notbeen determined, derivatives that grow at both42 and 30C have been identified, followed byisolation and characterization of temperature-sensitive mutants (13, 15). The next completegenome reported was that of D29 (24), whichwas isolated in California from a soil sample byenrichment and infects both fast-growing andslowly growing strains, and is clearly lytic (27).D29 has considerable nucleotide sequence sim-ilarity to L5, especially in the left-most partsof the genomes that encode the virion struc-tural genes (24). Whereas D29 forms distinctlyclear plaquesperhaps more so than any othermycobacteriophagethe sequenced version islikely a recent derivative of a temperate par-ent, and Bowman (9) noted a mixture of plaquemorphologies in his starting D29 stock; ge-nomic comparison with L5 is consistent withthis.

    The third sequenced mycobacteriophage,TM4, was isolated by induction of a strain ofM. avium (112). It is unclear whether the orig-inal strain was lysogenic or pseudolysogenic,since TM4 is capable of lysing it as well as M.smegmatis and M. tuberculosis (112); it does notappear to form stable lysogens in either of these

    www.annualreviews.org Mycobacteriophages 337

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    strains. Genomic analysis shows that it is dis-tinct from L5 and D29 at the nucleotide se-quence level (25), and it does not encode anyknown integration system or any readily iden-tifiable phage repressor.

    All the other sequenced mycobacteriophagegenomes were from phages isolated over thepast 20 years, and all were isolated from envi-ronmental samples using M. smegmatis mc2155as a host. At the time of writing, the total num-ber of mycobacteriophage genomes depositedin GenBank is 70 (Table 1) and a detailed com-parative genomic analysis of 60 has been de-scribed (44). These phages have come from avariety of geographic locations, although abouthalf of them were isolated from the westernPennsylvania region. The isolation of new my-cobacteriophages has been greatly spurred bythe development of phage discovery and ge-nomics as an educational platform (38, 45). Itwould be of considerable interest to take ad-vantage of the faster and cheaper technologiesto sequence the numerous mycobacteriophagesisolated in the earlier period (19501980)for which detailed host range data have beenreportedif these can still be recovered. Wealso note that the use of other mycobacterialstrains for phage isolation will likely give dis-tinct landscapes of genetic diversity to that de-scribed below for the current collection.

    Overview of Genomic DiversityThe 70 sequenced mycobacteriophagegenomes encompass substantial genetic di-versity, and the genomic architectures aredominated by mosaic relationships. Althoughthe overall diversity is high, it is not uniform,and any two particular phages may share eitherextensive nucleotide sequence similarity overthe entire genome lengths with only a fewbase differences (e.g., phages Adjutor andPBI1), or as few as three genes whose productsshare greater than 25% amino acid identity(e.g., phages Barnyard and Giles) (Figure 1).Because of the mosaic nature of these genomes,many of the relationships lie between theseextremes, with substantial numbers of genes

    shared among genomes that are not otherwiseclosely related.

    To recognize the heterogeneous nature ofgenome diversity, the 70 genomes can begrouped into clusters according to their rela-tionships to each other (Figure 1) (44). Severaldifferent methods can be used for determiningthe cluster assignments, including nucleotidesequence similarities and gene content analy-ses. For many genomes the placement into aparticular cluster is simple because of extensiveand clear nucleotide sequence similarity, but forother genomes it is more complex either be-cause there is extensive but weaker similarity orbecause there is high nucleotide sequence sim-ilarity that extends over only a small genomesegment. An arbitrary cutoff measure has beenproposed that any two genomes with evidentnucleotide sequence similarity spanning morethan 50% of the genome lengths should be in-cluded within the same cluster (44). Using thesecriteria, an analysis of 60 sequenced genomesplaced 55 into nine major clusters (AI), and theremaining 5 were singleton genomes with noclose relatives (44); the additional 10 genomesavailable in GenBank all fit within the ninemajor clusters (Figure 1) (Table 1). Five ofthese clusters can be further subdivided intosubclusters, and it is anticipated that as addi-tional genomes are sequenced new clusters willbe formed (because of expected discovery of rel-atives of genomes that are currently singletons),and that current clusters will undergo furthersubdivision. The global population of mycobac-teriophages would seem more likely to form acontinuum of relationships, and the observedclusters may emerge from biases imposed bythe isolation procedures. It is also likely thatadditional genomes unrelated to any of thosesequenced to date remain to be discovered.Note that this clustering primarily provides aconvenient framework for further analysis anddoes not provide an accurate portrayal of wholegenome phylogenies, which involve reticu-late relationships due to genomic mosaicism(64, 69, 70).

    An indication that the current collection ofmycobacteriophages underrepresents their full

    338 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    A B C D E F G HB2 B4 F2 H1 H2A1 A2 B1 B3 C1 C2 F1

    I SinA

    BC

    DE

    FG

    HB2

    B4

    F2

    H1

    H2A1

    A2

    B1

    B3

    C1

    C2

    F1

    I

    Sin

    Figure 1Dotplot comparison of 70 sequenced mycobacteriophage genomes. Each of the 70 sequencedmycobacteriophages was concatenated into a single 5-Mbp sequence and compared with itself usingGepard (62). The genome order is the same as in Table 1 and the Cluster and Subcluster designations areshown above.

    diversity is provided by several prophages res-ident in mycobacterial genomes. Full-lengthprophages can be identified in the genomesof M. avium strain 104, M. abscessus (92), andM. marinum (108), and there are smallerprophage-like elements in M. tuberculosis (18,49) and Mycobacterium ulcerans (107, 109). How-ever, none of these is closely related to any ofthe sequenced mycobacteriophages and shouldbe generally classified as singletons in the clus-tering scheme described above. The roles of any

    of the prophages or prophage-like sequences invirulence of their hosts are not clear, but theyare of interest because many of the sequencedmycobacteriophage genomes do encode genescapable of influencing host physiology (83).

    Genome OrganizationsMycobacteriophage genome lengths varygreatly, from 41.4 (Angel) to 164.6 kbp(Myrna), with an average length of 73.6 kbp

    www.annualreviews.org Mycobacteriophages 339

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    31 32 33 34 35 36 37 39 41

    16 27 28 19 20 21 22 23 24 25 26 27 29 30

    Virion structure and assembly

    LysisIntegration/

    immunity

    Terminase PortalSca!old

    Capsid MTSTape measure protein

    1314291329151714332

    1328

    1358

    456 1

    393

    1371

    13731

    372

    1374

    1375 9

    34 122

    503

    142950912105381406 86432 16151413119753

    140618

    9020

    140622

    73 144728 706 1377306624 261406

    171333

    191334

    21 23 25 27 29

    2292

    1391

    1410

    13921

    379

    1380

    1378

    1396

    1381

    1396

    1390 4041388 1389138713841382323

    1410

    1386

    1383 4

    60

    58 6056545250 4151304424038 138561595551

    484644494743413937 45 53 57

    31

    32107

    137633

    34 3635

    1

    Lysin A Lysin BMinor tail proteins

    IntegraseRepressor

    RDF

    HNHEndo

    RuvCRecTRecE

    Recombination

    Halo Hope BPs

    MPME2MPME1

    MPME1

    Figure 2Organization of the mycobacteriophage Angel genome. The linear genome is represented by a horizontal bar with markers in kilobasepairs. Predicted genes are shown as colored boxes with the gene name shown inside the box; genes shown above the genome bar aretranscribed rightward, and those below it are transcribed leftward. Each of the genes has been grouped into a phamily of relatedmycobacteriophage genes (44), with the Pham number designation shown above the gene. Putative gene functions are noted whereknown. Angel is a member of Cluster G (see Table 1), in which there are three other members, Halo, Hope, and BPs. These fourgenomes are similar at the nucleotide level, and differ in structure primarily by insertions of a putative mycobacteriophage mobileelement (MPME). Angel contains no insertions, both Hope and BPs contain insertions of MPME1, and Halo contains an insertion ofMPME2 as shown.

    (Table 1). An example of genome organizationis shown in Figure 2, in which the virionstructure and assembly genes are arranged asan array in the left part of the genome, followedby the lysis cassette, an integration cassette, anda set of genes in the right part, some of whichencode DNA replication or recombinationfunctions, but most are of unknown function.However, there is considerable variation ingenome organization and several themesemerge for different phage clusters (Figure 3).The most obvious is that all the mycobac-teriophages with a siphoviral morphotype(all but Cluster C) share a syntenic group ofgenes encoding virion structure and assemblyproteinsas seen in all siphoviruses regardlessof their bacterial host and regardless of the lack

    of sequence similarity. For representationalpurposes these are shown in the left parts ofthe genomes (Figure 3).

    Clusters F, G, and I all contain genomeswith defined ends with short single-stranded DNA extensions (Table 1), andthe leftmost of the structure and assem-bly genes (terminase) is located close tothe genome end (Figure 3). In contrast,Clusters A and E, together with singletonsCorndog, Giles, Omega, TM4, and Wildcat,have defined genome ends but additional genesare present between the terminase and the end(Figure 3), most of which likely do not encodevirion structure and assembly functions. Thenumber of genes varies from 4 (Cjw1; ClusterE) to 31 in the singleton Corndog, and in

    340 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    Giles

    A

    B

    C

    D

    E

    F

    G

    I

    H

    ( )

    ( )( )

    ( )( )

    ( )

    ( )

    ( )

    Lysis Integration Immunity StructureReplication/

    recombination Other

    Wildcat

    TM4

    Omega

    Corndog

    Figure 3Schematic representations of mycobacteriophage genomes architectures. The genomes of phages in the ninemain clusters (AI) and the five singleton genomes are represented by black bars with genes regions shown ascolored boxes. Genes transcribed rightward are shown above the bar, and those transcribed leftward areshown below it. Putative functions of the gene blocks are represented by different colors, with the key shownat the bottom of the figure. In some clusters there is organizational variation within the cluster, andvariations are given in parentheses. The genome organizations are schematic and are not drawn to scale.

    Cluster A this is where the lysis genes arepositioned.

    Clusters B, D, and H have circularly per-muted genomes, and for purposes of gene num-bering and representing the genomes as linearmaps, an arbitrary position close to the termi-nase gene is chosen as nucleotide position #1.In some genomes (e.g., Subcluster B1) this cor-responds to the first base of the putative smallterminase subunit gene, whereas in others it iswithin an upstream noncoding interval. Thereis a close relationship between terminase phy-logeny and the nature of phage genome ends(12), and this is also observed in mycobacterio-phages (44).

    In many of the genomes (i.e., Clusters A,D, F, G, and I) the virion structure and assem-bly genes are in the canonical and largely unin-terrupted order: Terminase, Portal, Protease,Scaffold, Capsid, presumed head-tail joininggenes, major tail subunit, G/T tail chaperones,tape measure protein, minor tail proteins (44).Many genomes encode both small and largeterminase subunits (e.g., Clusters B1, B4, E,F, G, I, Corndog, TM4), whereas in othersa small terminase subunit gene has not beenidentified (e.g., Clusters A, B2, B3, D, H).Not all genomes encode a scaffold protein andthese functions may be incorporated into thecapsid subunit as they are in coliphage HK97

    www.annualreviews.org Mycobacteriophages 341

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    (19, 89). The tape measure protein gene is typ-ically the largest in the genomes of the my-cobacterial siphoviruses, reflecting their ratherlong tails (from 107 nm in L5 to nearly300 nm in Predator). There are, however,numerous genomes that contain additionalgenes in the structure and assembly gene array(Figure 3). These insertions occur at multi-ple locations, such as between the small andlarge terminase subunits (Cluster E), immedi-ately following the major capsid subunit gene(Subcluster B1), and between the portal andprotease genes (in Cluster H), and there are rel-atively large insertions in the singletons Corn-dog and Omega (Figure 3). The insertions inClusters B, E, and H correspond to Hollidayjunction resolving enzymes (RuvC-like in Clus-ter B and Endo VII-like in E and H), consistentwith a role for these genes in DNA packaging(36).

    As noted above, in the Cluster A genomesthe lysis genes are located between the termi-nase gene and the left end. However, this is un-usual, and it is more typically located immedi-ately downstream of the tail genes (Figures 2and 3) and transcribed in the same direction.This is a notable departure from the lambdaprototype, where the lysis functions are locatedclose to the right end of the genome (97). Clus-ters A, E, F, G, and singletons Giles and Omegaencode integration cassettes that are near thecenters of their genomes regardless of substan-tial differences in genome lengths (40). In Gilesthe integration cassette is in an atypical locationto the left of the lysis genes (78). Although genesinvolved in DNA replication (including DNAPol I, Pol III, and Holliday junction resolvingenzymes) and DNA metabolism (such as ThyXand ribonucleotide reductase) can be identified(Figure 3), most other genes in the siphoviralgenomes are of unknown function (44).

    All Cluster C mycobacteriophages havemyoviral morphologies and relatively largegenomes, and the virion structure and assemblygenes do not appear to be organized into a well-defined array as they are in the siphoviruses.However, relatively few of the structureand assembly genes have been identified

    and the virion proteins are not well charac-terized. A striking feature of these genomes isthat they encode a large number of tRNA genes(Table 1) organized into at least two large ar-rays. Myrna (Subcluster C2) is predicted to ex-press 41 tRNAs, only modestly fewer than itsM. smegmatis host (47 predicted tRNAs). TheSubcluster C1 phages, as well as the singletonWildcat, also encode a tmRNA gene (Table 1).

    Genome MosaicismA notable feature of all bacteriophage genomesis their mosaic architectures, where eachgenome can be thought of as a specific assem-blage of individual modules (81, 83, 101). Eachmodule may correspond to a single gene ora group of genes, and its modular nature isreflected by its location in genomes that areotherwise not closely related. The exchange ofmodules may have occurred relatively recentlyin evolutionary time, in which case the mod-ules may retain substantial similarity at the nu-cleotide sequence level, or it may have occurredat more distant times, with the only remain-ing evidence of common descent being weakbut statistically significant amino acid sequencesimilarity. Examples of both extremes can befound among the mycobacteriophage genomes.

    An excellent example of a relatively recentexchange is seen in the Cluster B genomes(Figure 4). Cluster B genomes can be readilysubdivided into four subclusters (B1B4) suchthat genomes within each subcluster have highlevels of nucleotide sequence similarity overtheir entire genome lengths, but nucleotidesequence similarity is poor between genomesof different subclusters. However, there is a1.9-kbp DNA sequence segment that departsfrom this pattern and is shared at a level of 94%nucleotide sequence similarity between phagesRosebush (a Cluster B2 member) and all six ofthe Cluster B1 genomes; the only other mem-ber of the B2 subcluster (Qyrzula) has a quitedistinct sequence in its place (Figure 4a). Be-cause sufficient evolutionary time has passedto allow for the accumulation of about 100nucleotide differences between the sequences,

    342 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    366 (18)37

    1406 (454)32

    GTCGTCTGGCACGTCGTCGTGGACGAGTAGGGAGGCCGCCAATGGCCGTTATGATCGTCTGGCACATCG-CG-G-ACGAGTGATGTCGACACCGCGC

    CAGCTGGACCGTGGTCGAGTAGGGAGGCCACCAATGGCCGTTATG

    )

    a

    b

    Orion

    PG1

    Qyrzula

    Qyrzula

    1406 (454)31

    1406 (454)32

    1406 (454)31

    1406 (454)32

    1406 (454)33

    364 (9) 366 (18) 367 (9)34 365 (9) 366 (18) 360 (18)

    3536

    3738

    39

    )1406 (454)33

    1406 (454)33

    32363 (9)31

    363 (9)29

    364 (9) 366 (18) 367 (9)34 365 (9) 366 (18) 360 (18)

    35

    364 (9)34 365 (9)

    35

    3637

    366 (18)36

    38

    367 (9)38

    366 (18)35

    1406 (454)31

    1406 (454)30

    364 (9)32 365 (9)

    33

    366 (18)34

    367 (9)36

    39

    Rosebush

    PG1Rosebush

    33 34

    35

    313029 32 33 34 35 36

    313029 32 33 34 35 36

    313029 32 33 34 35 36 37

    313029 32 33 34 35 36 37

    Sequence similarity:

    Increasing similarity

    Figure 4Recombination between Cluster B mycobacteriophages. (a) Phages Orion and PG1 are members of Subcluster B1 and are closelyrelated at the nucleotide level (Table 1). Phages Rosebush and Qyrzula are members of Subcluster B2 and are closely related at thenucleotide level across most of their genome spans. A short portion (7 kbp) of the genomes is shown and aligned, with sequencesimilarity represented as colored shading between the pairwise genomes. The strengths of the relationships are shown according to thecolor spectrum, with violet representing the closest similarity. Note the segment of Rosebush that is closely related to the Subcluster B1genomes, but not to its B2 relative Qyrzula. Genes are shown as gray boxes, with the gene name within the box, the phamily assignmentabove the box, and the number of phamily members in parentheses. Figure was generated using the program Phamerator (S. Cresawn,R. Hendrix & G.F. Hatfull, unpublished data). (b) Alignment of PG1, Rosebush, and Qyrzula sequences at the rightmost recombinantjunction. The arrow above the sequences shows the position of the 3 ends of genes 35 of PG1 and Rosebush; the arrows below showthe 3 and 5 ends of Qyrzula genes 33 and 34, respectively. The box shows a region of interrupted similarity between PG1 and Qyrzulawithin which recombination could have given rise to the Rosebush recombinant structure.

    www.annualreviews.org Mycobacteriophages 343

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    Phamily (Pham):a group ofmycobacteriophagegenes related to eachother as defined byBlastP and ClustalW

    examination of the recombinant junctionsshould be interpreted cautiously. Nonetheless,at the right junction, which corresponds closelywith the 3 ends of gene 35, there is a shortsegment of interrupted sequence similarity be-tween PG1 (and all of its five relatives) andQyrzula that could have served as a site for re-combination to give rise to the Rosebush struc-ture (Figure 4b). The common sequence atthe junction is not completely conserved andit is impossible to tell whether the differenceshave occurred subsequent to recombination, orwhether they might have been present in theparent genomes (which were not necessarilyQyrzula or other known Cluster B1 phages). Ithas been proposed that homeologous recombi-nation events (involving sequences that are sim-ilar but divergent) mediated by phage-encodedrecombinases (such as lambda Red or theRecET systems) acting at partially conservedsequences could give rise to junctions such asthese (74).

    The mycobacteriophages appear to have nu-merous examples in which individual modulescorrespond to single genes, with the relation-ships made evident by amino acid sequence sim-ilarity (83). When the phylogenies of individ-ual genes are determined, they are often dif-ferent, revealing distinct evolutionary paths toresidence in any particular phage genome. Tosimplify the representation of this, we have uti-lized phamily circles, which have the advan-tage of displaying all genome members used inthe analysis, including those that do not con-tain a particular gene member of the pham-ily being analyzed (83). Examples are shown inFigure 5 in which both Pham 233 and Pham471 have a member in phage Omega but havePham members in a variety of other genomes.

    In the Omega genome, genes 126 and 127represent these two phamilies, respectively,and their distinct phylogenetic relationshipsstrongly suggest they have evolved separately,and have been juxtaposed by a recombina-tion event between them. This is further il-lustrated by examining the locations of the re-lated pham members in other genomes. Forexample, Pham 233 has a member in phageCjw1 (gene 73) that is flanked on both sidesby genes unrelated to those flanking Omegagene 126. Likewise, Pham 471 has a memberin phage KBG (gene 84) flanked by genes un-related to those in Omega. This single-genemosaicism, especially among the nonstructuralgenes, is a prominent feature of these genomesand underscores the dominant role of hor-izontal exchange processes in bacteriophageevolution.

    Mechanisms for GeneratingMosaic GenomesThere has been considerable speculation re-garding the specific molecular mechanisms thatgive rise to mosaic phage genomes (47, 48).An early model suggested that short, conservedboundary sequences located at gene boundariesmay serve as targets for genetic exchange (110),and such boundary sequences have been de-scribed in coliphage HK620 (17). Boundary se-quences are not, however, prevalent among my-cobacteriophages (or other groups of phages)and thus seem unlikely to solely account forthe pervasive mosaicism. A second view is thatmosaicism results from events that are primar-ily illegitimate or nonsequence determined. Al-though most of these events will be destructive,

    Figure 5Examples of mycobacteriophage mosaicism. (a) A segment of the Omega genome is shown that encodes for genes 125128. Gene 125and 128 are orphams and have no known mycobacteriophage homologs, and genes 126 and 127 are members of Pham 233 and Pham471, respectively, which have five and eight members, respectively. Members of Pham 233 and Pham 471 are found in phages Cjw1 andKBG, and in each case they are in distinct genomic contexts. Presumably, recombination events between these genes occurred indistant evolutionary time to generate these mosaic structures. (b) Phamily circle representations for Pham 233 and Pham 471. Each ofthe sequenced mycobacteriophage genomes is represented around the circumference of each circle, grouped according to cluster. Mapsand circles were generated using the program Phamerator (S. Cresawn, R. Hendrix & G.F. Hatfull, unpublished data).

    344 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    Bxz1

    Coop

    er

    )

    Beth

    lehe

    mBe

    thle

    hem

    Bxb1

    Bxb1

    DD

    5D

    D5

    Jasp

    erJa

    sper

    KBG

    KBG

    (gp8

    4)Lo

    ckle

    yLo

    ckle

    ySo

    lon U2 B

    xz2 C

    he12 D29

    Solo

    nU

    2Bx

    z2 Che

    12 D29

    L5

    Puko

    vnik

    Chah

    Orio

    n

    PG1

    L5 Puk

    ovni

    k

    Chah

    Orio

    n

    PG1

    Qyr

    zula

    Rose

    bush

    Phae

    drus

    Pipe

    !sh

    Qyr

    zula

    Rose

    bush

    Phae

    drus

    Pipe

    !sh

    Nig

    el

    Cali

    Cate

    raM

    yrna

    Riza

    lSc

    ottM

    cG

    Bxz1

    Coop

    er

    Nig

    el

    Cali

    Cate

    raM

    yrna

    Riza

    lSc

    ottM

    cGSp

    udAd

    juto

    rSp

    udAd

    juto

    rBu

    tter

    scot

    chBu

    tter

    scot

    chG

    umba

    llPB

    I1PL

    ot

    Gum

    ball

    PBI1

    PLot

    Trol

    l4Tr

    oll4

    244

    (gp7

    3)24

    4

    Cjw

    1 (g

    p73)

    Cjw

    1

    Kost

    ya (g

    p72)

    Pork

    y (g

    p71)

    Boom

    er

    Kost

    yaPo

    rky

    Boom

    er

    Che8

    Frui

    t loo

    p

    Llij

    PMC

    Pacc

    40

    Che8

    Frui

    t loo

    p

    Llij

    PMC

    Pacc

    40

    Ram

    sey

    Twee

    ty

    Ram

    sey

    (gp6

    9)

    Twee

    ty (g

    p69)

    Twee

    ty (g

    p72)

    Che9

    dCh

    e9d

    (gp8

    0)

    BPs

    Hal

    o

    Kons

    tant

    ine

    Pred

    ator

    Barn

    yard

    Bruj

    itaChe9

    c

    BPs

    Hal

    o

    Kons

    tant

    ine

    Pred

    ator

    Barn

    yard

    Bruj

    itaChe9

    c

    Corn

    dog

    Corn

    dog

    (gp6

    )Co

    rndo

    g (g

    p7)

    Gile

    sG

    iles

    Om

    ega

    (gp1

    26)

    Om

    ega

    (gp1

    27)

    TM4

    TM4

    Wild

    cat

    Wild

    cat

    Pham

    233

    Pham

    471

    b

    a73

    5 (1

    )

    125

    126

    128

    127

    736

    (1)

    68

    5051

    69O

    meg

    a

    Cjw

    1

    KBG

    233

    (5)

    471

    (8)

    30%

    27.2

    %

    233

    (5)

    234

    (5)

    231

    (4)

    232

    (4)

    71

    7274

    7375

    228 (

    14)

    8284

    972

    (3)

    65 (4

    )

    471

    (8)

    85

    1890

    (1)

    8348

    Rela

    tions

    hip

    iden

    ti!ed

    by

    Blas

    tPRe

    latio

    nshi

    p id

    enti!

    ed b

    y Cl

    usta

    lW

    www.annualreviews.org Mycobacteriophages 345

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    MPME:mycobacteriophagemobile element

    they have the capacity to position two unrelatedDNA segments together in a highly creativeprocess. The generation of successful progenywould likely require multiple low-frequencyevents, coupled with selection either for genefunction or for DNA segments of packagablesize. The low frequency of such events wouldnot seem to be a serious impediment in light ofthe dynamic nature of phage-host interactions(1024 infections per second globally), the vastnumber of phage particles (1031), and probableearly origins extending back perhaps 3 billionyears (48, 111).

    A third view is that homeologous recombi-nation plays an important role. Support for thisis provided by the observation that lambda Redrecombination is more proficient at recom-bination between divergent sequences thanare host RecABCD pathways and can act atvery short regions of sequence similarity (74).However, exchanges occurring at extremelyshort regions of sequence similarity may notbe readily distinguishable from illegitimaterecombination events, and exchanges at longersegments may not necessarily lead to disrup-tions of synteny (Figure 4). Nonetheless, theproperties of phage-encoded recombinationsystems make them attractive for playingimportant roles in phage evolution, mediatingexchange between short partially conservedsequences such as ribosome binding sites,transcriptional terminators, and repressorbinding sites (11).

    A potential caveat for a general role oflambda Redlike recombinases in generatingphage mosaicism is that not all genomesobviously encode such recombination systems.In the mycobacteriophages, Clusters G, I,and Giles encode Escherichia coli RecET-likeproteins, some of which are active in re-combination (118120); Wildcat encodes anErf-like recombinase, and a number of othermycobacteriophages (Clusters C and E) en-code RecA homologs. But recombinase genesmediating homologous exchange cannot bereadily identified in the remaining 48 genomes,suggesting that they are absent or that theseactivities lie within the large number of genes of

    unknown function. It is noteworthy that highlevels of recombination among TM4-derivedcosmids were observed during shuttle phas-mid construction (53) even though no TM4recombination genes have been identified.

    Transposons andOther Mobile ElementsWhile not all phage genomes necessarily har-bor transposons or other mobile elements, theyare not uncommon, and transposition is ex-pected to contribute to genomic mosaicism.Curiously, although dozens of transposons andinsertion sequences have been identified in my-cobacterial genomes, none occurs in any ofthe sequenced mycobacteriophages (44). How-ever, comparative genomics has revealed anovel class of mycobacteriophage mobile ele-ments (MPMEs) that are broadly distributedamong mycobacteriophage genomes (primarilyin Clusters F, G, and I) but absent from otherphages and mycobacterial chromosomes (96).Two main subclasses (MPME1 and MPME2)share 79% nucleotide sequence identity, al-though the MPME1 and MPME2 share 100%nucleotide identity within their own group (96).The MPMEs are atypically small (MPME1 is439 bp and MPME2 is 440 bp) and generateunusual 6-bp insertions between target DNAand the left inverted repeat at the insertionsite.

    There is good evidence to support one ad-ditional transposon insertion. In Llij (ClusterF), gp83 is related to transposases of the IS200family and shares 73% amino acid identity witha putative transposase from Nocardia farcinia.A comparison of the Cluster F genomes at thenucleotide sequence level reveals a discontinu-ity 96 bp upstream of the beginning of gene 83(coordinate 48209) that likely defines the junc-tion between the left end of a putative IS200family element and the target. The right endis not easy to identify, and Cluster F sequencesimilarities are less well defined, with possiblejunctions either at coordinate 49751 or at coor-dinate 49831. The ends of other IS200 familyelements form hairpin loop structures 16 bp and

    346 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    RDF: recombinationdirectionality factor

    6 bp from the left and right end junctions, re-spectively, and a plausible structure is present tothe left of Llij gene 83. A second structure cor-responding to the right end is less clear, raisingthe possibility that this element may have un-dergone subsequent rearrangements and mayno longer be mobile.

    Mycobacteriophage genomes are devoid ofany clearly identifiable introns, although thereare several inteins located within a variety ofgenes, all of which have inteinless counter-parts. Five of these are terminases (encoded byphages Bethlehem, Cjw1, Kostya, Omega, andPipefish), but the Pipefish terminase is distinctfrom the others in that it is circularly permutedand does not have a cos-packaging genome (44).An intein is also present in three genes relatedto the Bxb1 recombination directionality factor(RDF) (gene 47) and a related intein is presentin a putative nucleotidyltransferase gene in Cali(gene 3). The inteins represent highly divergentsequences, and the intein in Bethlehem gene 51has recently been shown to represent a novelfunctional class (115).

    MYCOBACTERIAL GENEFUNCTION AND EXPRESSION

    Lysis

    A lysis cassette was first described for mycobac-teriophage Ms6 (30, 90) and was proposed tocontain five genes (Orfs 15). Although thecomplete genome sequence for Ms6 is notavailable, approximately 5 kbp of a 6.2-kbpsequenced segment is closely related to ClusterF phages, with Fruitloop the nearest relative(98% nucleotide identity). Of the five openreading frames (ORFs) identified, three areimplicated in lysis: lysin A (Orf 2), lysin B(Orf4), and a holin (Orf 4). All the sequencedmycobacteriophages appear to encode anendolysin (lysin A), even though they arean unusual and complex group of proteinsequences composed of a large number ofmodules assembled in multiple combinations.These modules contain many different pepti-doglycan hydrolysis motifs including glycoside

    hydrolases, amidases, and peptidases, as well aspeptidoglycan binding motifs. A direct role forthese modules in lysis is demonstrated by thebehavior of a lysin Adefective mutant of phageGiles, and in peptidoglycan hydrolysis by theendolysins of phages Corndog, Bxz1, and Che8(82). The Ms6 lysin B has lipolytic activity (34)and a Giles lysin Bdefective mutant formssmall plaques and exhibits a lysis defect (82);the D29 lysin B protein is structurally relatedto cutinase-like enzymes and functions as a my-colylarabinogalactan esterase (82). Curiously,four mycobacteriophages (Che12, Rosebush,Qyrzula, and Myrna) lack a lysin B homologyet do not exhibit small plaque morphotypeslike the Giles lysB mutant. An intriguingpossibility is that these phages have evolveda mechanism for utilizing a host-encodedcutinase-like enzyme for this function.

    Integration and ProphageMaintenanceThirty-six of the sequenced mycobacterio-phages (Clusters A, E, F, G, I, and singletonsGiles and Omega) harbor integration cassettescomposed of an integrase gene, an attP attach-ment site, and an RDF. With the exceptionof Cluster A1 phages, and phages Bxz2 andPeaches (both in Cluster A2), which encode ser-ine integrases, most of the integrases are of thetyrosine-recombinase family. In each genomeencoding a tyrosine integrase, a putative attPsite can be identified owing to a short 25- to45-bp common core region shared between theattP and attB sites in the host chromosome. Fre-quently, the attB core overlaps a host tRNAgene and this is observed for all characterizedmycobacteriophages. In phages L5, D29, Halo,Ms6, Tweety, and Giles the attB site has beenconfirmed experimentally (26, 65, 78, 85, 86,96) and it has been predicted for Che8, Llij,PMC (which are closely related to Tweety),Che9d, Omega, Che9c, Cjw1, and 244 (86).Of the remaining phages, Fruitloop integrase(gp40) is closely related to that of Ms6 andpresumably uses the same attB site; Boomerand Pacc40 integrases are similar to Tweety

    www.annualreviews.org Mycobacteriophages 347

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    integrase and its relatives. A putative attP sitefor Brujita has yet to be identified. The M. tu-berculosis prophage-like element phiRv2 is inte-grated into a host tRNAVal gene (49).

    Integrase-mediated excisive recombinationtypically requires an RDF, and the best-characterized RDF for the mycobacteriophage-encoded tyrosine integrases is the 56-residuegp36 Xis of L5 (66). However, the RDF classof proteins is highly diverse (67) and onlythe fellow Cluster A2 phages D29, Che12,and Pukovnik encode closely related homologs.Ramsey (Subcluster F1) encodes a more dis-tant relative (gp34), although its location ad-jacent to the Ramsey integrase strongly impli-cates it in recombination directionality control.The functions conferring directional control inthe other Cluster F phages as well as those inClusters E and Brujita (Cluster I) remain elu-sive. In the Cluster G phages a putative RDFwith similarity to other Xis proteins is locatednear the integrase gene (Figure 2), and a simi-lar situation is observed in singletons Giles andOmega (genes 30 and 84, respectively). Che9c(Cluster I) encodes a putative RDF that is re-lated to Giles gp30 but is located over 6 kbpaway from the integrase gene.

    Identification of attP and associated attBsites of serine integrases is more complicatedbecause the common core sequence can beas short as 23 bp (102). However, thesesystems are of interest because the attB sitesdo not overlap tRNA genes but are within hostprotein-coding genes and therefore have thecapacity to influence host physiology throughgene inactivation or modification. A goodexample of this is Bxb1, which integrates intothe 3 end of the groEL1 gene of M. smegmatis(61, 80). As a result, Bxb1 lysogens are unableto form normal mature biofilms, unveilingthe role of the unusual GroEL1 chaperone inthe regulation of mycolic biosynthesis (80).The other Subcluster A1 phages share closelyrelated integrases (>95% amino acid identity)and likely integrate into the same site. TheBxz2 integrase is more distantly related (27%amino acid identity with Bxb1 integrase) andintegrates into a different attB site within

    Mmseg_5156 (86). The phage Peaches alsoencodes a serine integrase that is most closelyrelated to the Bxb1 integrase (59%) butwhose integration site specificity remainsundetermined. The M. tuberculosis phiRv1prophage-like element encodes a serine inte-grase whose attB site is unusually located withina repetitive element that provides multiplepotential integration sites (7). The partialsequence of the glycopeptidolipid biosynthesisgene cluster of M. avium strain A5 shows thepresence of a related serine integrase (63) thatmay be part of a prophage in this strain.

    There are two types of RDFs associated withthe mycobacteriophage-encoded serine inte-grases. The phiRv2 RDF is related to Xis pro-teins that are otherwise associated with tyrosineintegrases, although its mechanism of actionremains poorly defined (6). The Bxb1 RDFis not related to other known RDF proteinsand was identified as the product of gene 47through use of a genetic screen (33). Biochem-ical characterization shows that it is not a DNAbinding protein but interacts directly withintegrase-DNA complexes to promote forma-tion of excisive synaptic complexes (32, 33).Bxb1 gene 47 is curious in that it is conservedamong mycobacteriophages encoding tyrosineintegrases including L5, for which all the com-ponents required for integrative and excisiverecombination are known (and do not includethe L5 gp54 homolog of Bxb1 gp47) (68, 84). Itpresumably is involved in some function otherthan recombinational control, and its genomiclocation among DNA replication genes isconsistent with a replication function. Fur-thermore, Bxb1 gp47 has sequence similarityto proteins of the PP2A class of phosphatases,raising the question whether phosphataseactivity plays any role in recombination.

    Several mycobacteriophages encode pro-teins containing putative nuclease domains ofthe ParBc superfamily, including Cluster B3,C1, and Corndog (Cluster E phages encodegenes with similarity to these but which donot include ParBc domains). None of thesegenomes encodes an integration cassette, andit is plausible that these form lysogens in

    348 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    which the genomes replicate extrachromoso-mally. However, none of these (or any of the 70)mycobacteriophages encodes ParA homologsand their mode of prophage maintenance, ifany, remains unclear.

    Gene Expression and Its RegulationLittle is known about gene expression in mostmycobacteriophages. Perhaps the best under-stood is phage L5, where an early leftward pro-moter (Pleft) is under the control of the phagerepressor (gp71) [the closely related L1 phagehas an identical repressor (99)]. The Pleft pro-moter is similar to E. coli sigma-70 promoters,and a binding site (operator) for gp71 overlapsthe promoter (11, 79). The gp71 repressor rec-ognizes a 13-bp asymmetric sequence that ispresent 30 times in the L5 genome, mostly insmall intergenic intervals and in one orienta-tion relative to the direction of transcription;gp71 binding has been demonstrated for 24 ofthese sites (11). It is proposed that a repressorbound to these stoperator sites prevents un-wanted transcripts from extending into cyto-toxic phage genes during lysogeny (11). Bxb1encodes a related repressor protein (gp69) thatalso binds to multiple operator and/or stopera-tor sites throughout the genome, although thebinding site has a different consensus sequenceand the phages are heteroimmune (54).

    All the other Cluster A1 phages encode re-pressors that share at least 98% amino acididentity with Bxb1 gp69 and presumably arehomoimmune. A rightward promoter, whichis also repressor regulated, occurs at the rightend of the Bxb1 genome (54). Multiple promot-ers for repressor synthesis in L1, L5, and Bxb1are presumably required for establishment andmaintenance of lysogeny (14, 54, 79), althoughtheir specific roles remain unclear. Transcrip-tional promoters for Ms6 lysis genes (30) are lo-cated 214 bp upstream of the first of the genesin that region, Orf 1; however, it is not clearwhether this is a general feature of phages thatshare closely related lysis genes (in Cluster F), asthe extent of sequence similarity ends approx-imately 60 bp upstream of Ms6 Orf 1. No late

    promoters for any mycobacteriophages havebeen identified, even though protein expressionpatterns suggest that these may be among themost active of all mycobacterial expression sys-tems (25, 46). A mutant defective in late syn-thesis of phage L1 has been reported, but thespecific genes involved are not known (21).

    Other MycobacteriophageGene FunctionsSeveral mycobacteriophage genes involved inDNA metabolism have been cloned and char-acterized. Phage L5 encodes both a thymidylatesynthase (ThyX) and a ribonucleotide reductase(RNR) (gp48 and gp50, respectively), and theyare expressed early in lytic growth and appearto function as a complex (5). A mutant defectivein early gene expression influences expressionof a proposed phage nuclease (20). Giri et al.(35) characterized an early nuclease encoded bygene 65 of phage D29 and showed that it is astructure-specific nuclease with a preference forforked structures.

    Initial studies of mycobacteriophage L5identified at least two segments of the phage L5genome that are not well tolerated in M. smeg-matis and presumably encode cytotoxic pro-teins. Further analysis identified three cyto-toxic proteins encoded by L5 genes 77, 78, and79 (95) that prevent growth of M. smegmatiswhen expressed and presumably interrupt spe-cific cellular processes, although these proteinsremain ill-defined. We predict that the broadermycobacteriophage collection encodes numer-ous additional cytotoxic proteins with consider-able potential for development of antitubercu-losis drugs as proposed for Staphylococcus phages(71).

    MYCOBACTERIOPHAGEGENETIC MANIPULATIONAs noted above, shuttle phasmids have beeninvaluable tools for constructing recombinantmycobacteriophages and for using them to de-liver transposons, allelic exchange substrates,and reporter genes (1, 2, 51). However, with

    www.annualreviews.org Mycobacteriophages 349

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    MYCOBACTERIAL RECOMBINEERING

    M. tuberculosis is unusual among bacteria in that when linearDNA substrates are introduced by electroporation there is a highpropensity for illegitimate recombination (58). Mycobacterio-phages have provided two useful strategies for constructing genereplacement mutants. First, mycobacteriophage shuttle phasmidscan be used to introduce allelic exchange substrate by infection,and after selection a high proportion of the progeny are the resultof homologous replacement (1). Second, a mycobacterial-specificrecombineering system has been developed in which RecET-likeproteins encoded by mycobacteriophage Che9c are expressed toconfer high levels of recombination (118). Introduction of ds-DNA or single-stranded DNA substrates into recombineeringstrains of M. smegmatis or M. tuberculosis provides an efficientmeans of generating gene replacement mutants and point mu-tations (118, 119). Single-stranded DNA recombineering is par-ticularly attractive for generating isogenic strains with definedpoint mutations with applicability to determining the contri-butions of single base substitutions to the drug resistant phe-notypes of multiple-drug-resistant tuberculosis and extensivelydrug-resistant tuberculosis clinical strains.

    BRED: bacteriophagerecombineering ofelectroporated DNA

    an average mycobacteriophage genome lengthof over 70 kbp and packaging constraints of50 kbp in lambda particles, many mycobacte-riophages are not amenable to this technology.

    Bacteriophage recombineering of electro-porated DNA (BRED) provides a technique fordirect genetic manipulation of mycobacterio-phages that takes advantage of a mycobacteria-specific recombineering system (73, 118). Thisrecombineering approach is based on the useof the RecET-like recombination system en-coded by phage Che9c, such that expressionof genes 60 and 61 generates high levels ofrecombination in both M. smegmatis and M.tuberculosis (118, 119). In the BRED applica-tion, recombineering-proficient cells are co-electroporated with two DNA substrates; oneis genomic DNA of the phage to be ma-nipulated and the other is a short (typically200 bp) substrate that contains the desired mu-tation (73). For example, a defined gene dele-tion can be constructed by creating a 200-bpsubstrate containing 100 bp homologous to

    each of the upstream and downstream regions(120). The mutation can be designed to mini-mize genetic polarity, and because recombina-tion is efficient, there is no need to include aselectable marker or identification tag.

    Following co-electroporation, plaques arerecovered by plating onto lawns of a permissivebacterial host (M. smegmatis) in an infectiouscenter configuration, i.e., by plating prior tophage replication and lysis. Each plaque there-fore derives from a single cell that has takenup phage genomic DNA. Screening of 1218plaques by PCR typically identifies at least oneplaque that is mixed, containing both wild-typeand mutant alleles. Importantly, this is typicallyobserved whether or not the gene is essentialfor phage growth, because if the gene is essen-tial, then the presence of wild-type helper phagesupports mutant growth in the mixed plaque.Replating for isolated plaques and screening byPCR usually identify a homogenous viable mu-tant (73). If a mutant is not viable, then it canbe recovered with a complementing strain inwhich the essential gene is expressed from aplasmid (73). Because no selection is required,BRED can be used to make virtually any recom-binant that is desired, including defined nonpo-lar deletions, insertions, point mutations, andaddition of gene tags (73). BRED appears tobe broadly applicable to mycobacteriophage ge-netic manipulation provided that plaques can berecovered by electroporation of phage genomicDNA. The BRED technology thus circum-vents a major hurdle in mycobacteriophage ma-nipulation: providing facile genetic approachesfor addressing a multitude of questions in my-cobacteriophage biology.

    SUMMARYIn conclusion, mycobacteriophage genomicsreveals that the diversity of the population islarge, and that substantial parts of the pop-ulation at large remain unexplored. Five ofthe 70 sequenced genomes have no close rel-atives, prophages emerging from mycobacte-rial genome sequencing projects are not closelyrelated to known phages, and the diversity

    350 Hatfull

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    of the CRISPR spacers in M. tuberculosisand M. avium genomes suggests there aremany genomes yet to be discovered. Withnew technologies for global expression analyses

    and mycobacteriophage functional genomics,a new chapter in postgenomic mycobacterio-phage biology is anticipated with considerableexcitement.

    SUMMARY POINTS

    1. Mycobacteriophage genomes are genetically highly diverse.

    2. Mycobacteriophages can be grouped into clusters according to their sequencerelationships.

    3. Mycobacteriophage genomes are architecturally mosaic.

    4. Approximately 80% of mycobacteriophage gene phamilies are of unknown function.

    5. Mycobacteriophages are sources of genetic novelty, including new classes of inteins andmobile elements.

    6. BRED recombineering provides a facile and general means for constructing recombinantand mutant forms of mycobacteriophages.

    7. Mycobacteriophages are rich resources for mycobacterial genetics.

    FUTURE ISSUES

    1. Newly discovered mycobacteriophages isolated on a variety of different mycobacterialstrains are needed to fully understand mycobacteriophage genetic diversity.

    2. The potential for generating new tools for mycobacterial genetics and gaining insightsinto mycobacterial physiology is great and many advances await development.

    3. Elucidating the function of mycobacteriophage genes will provide a fuller understandingof their biology and their evolution.

    DISCLOSURE STATEMENTThe author is not aware of any affiliations, memberships, funding, or financial holdings that mightbe perceived as affecting the objectivity of this review.

    ACKNOWLEDGMENTSI thank my colleagues and collaborators, Roger Hendrix, Jeffrey Lawrence, Craig Peebles, SteveCresawn, Bill Jacobs, Debbie Jacobs-Sera, and the numerous members of the Hatfull laboratorywho have participated in mycobacteriophage isolation, sequencing, and analysis.

    LITERATURE CITED

    1. Bardarov S, Bardarov S Jr, Pavelka MS Jr, Sambandamurthy V, Larsen M, et al. 2002. Specializedtransduction: an efficient method for generating marked and unmarked targeted gene disruptions inMycobacterium tuberculosis, M. bovis BCG and M. smegmatis. Microbiology 148:300717

    www.annualreviews.org Mycobacteriophages 351

    Ann

    u. R

    ev. M

    icro

    biol

    . 201

    0.64

    :331

    -356

    . Dow

    nloa

    ded

    from

    ww

    w.a

    nnua

    lrevi

    ews.o

    rgby

    Uni

    vers

    ity o

    f Cal

    iforn

    ia -

    Sant

    a Cr

    uz o

    n 01

    /11/

    11. F

    or p

    erso

    nal u

    se o

    nly.

  • MI64CH18-Hatfull ARI 17 August 2010 15:32

    2. Bardarov S, Kriakov J, Carriere C, Yu S, Vaamonde C, et al. 1997. Conditionally replicating mycobac-teriophages: a system for transposon delivery to Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. USA94:1096166

    3. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, et al. 2007. CRISPR provides acquiredresistance against viruses in prokaryotes. Science 315:170912

    4. Barsom EK, Hatfull GF. 1996. Characterization of Mycobacterium smegmatis gene that confers resistanceto phages L5 and D29 when overexpressed. Mol. Microbiol. 21:15970

    5. Bhattacharya B, Giri N, Mitra M, Gupta