Supporting Information - PNAS · 25/6/2010  · Best PSI-BLAST result (identity %, E-value)...

9
Supporting Information Gloux et al. 10.1073/pnas.1000066107 s e t o e a h c r a n e r C a i r e t c a b o r e t n E a i r e t c a b o e t o r P - s e t u c i m r i F p u o r g i b o r o l h C / s e t e d i o r e t c a B a i b o r c i m o c u r r e V a i r e t c a b o s u F s a m s a l p o c y M G B - 1 1 G 1 1 H i l o c a i h c i r e h c s E ) y t i v i t c a G B ( 2 1 K 2 8 7 9 3 4 4 0 _ P Z A m u l i h p o m r e h t m u l l e c o r e a n 0 2 3 1 - Z / 5 2 7 6 M S D ) n o i t a t o n n a G B ( 6 3 9 3 7 5 2 0 0 _ P Y s u v a n g s u c c o c o n i m u R ) y t i v i t c a G B ( 1 E 8 8 7 1 8 5 4 3 : I G s u s o n m a h r s u l l i c a b o t c a L 1 - 2 S M L ) n o i t a t o n n a G B ( 2 8 7 9 3 4 4 0 _ P Z s n e g n i r f r e p . m u i d i r t s o l C ) y t i v i t c a G B ( 9 3 2 8 C T C N 2 7 8 2 4 6 2 0 _ P Z s e t u c i m r i F p u o r g G B - 1 1 G 1 1 H Fig. S1. Distance tree view of H11G11-BG Blastp results and localization of the Firmicutes H11G11-BG group and of several BGs of the UidA group. H11G11 C7D2 C. Bartletti ZP_02210507 S. variabile ZP_03776158 B. formatexigens ZP_03687412 R. gnavus ZP_02040206 R. gnavus ZP_02041835 Paenibacillus sp. ZP_02847334 R. gnavus ZP_02042987 B. S. ZP_03776204 B. formatexigens ZP_03686296 R. ZP_03753494 B. formatexigens ZP_03685645 S. satelles ZP_04455048 S. variabile _ ZP_03774451 F. prausnitzii ZP_02090684 E. coli P05804 L. gasseri gi12802352 R. gnavus gi34581788 C. perfringens B1RQV9 H11G11-like BG (Firmicutes) Known BGs variabile inulinivorans capillosus ZP_02038110 PD819476 PD002797 PD339503 PD002797 PDA2N0N2 PD002163 PDA253F2 PD884831 PDA1N8L2 Fig. S2. Comparative domain organization of unique Firmicutes BGs and known BGs. Domain arrangements of proteins in families were performed using the ProDom database on September 2009. Conserved characterized domains are represented by colored boxes and documented in ProDom database; un- characterized conserved domains are numbered. Gloux et al. www.pnas.org/cgi/content/short/1000066107 1 of 9

Transcript of Supporting Information - PNAS · 25/6/2010  · Best PSI-BLAST result (identity %, E-value)...

  • Supporting InformationGloux et al. 10.1073/pnas.1000066107

    setoeahcranerCairetcaboretnEairetcaboetorP

    -setucimriF

    puorgiborolhC/setedioretcaBaiborcimocurreV

    airetcabosuFsamsalpocyM

    GB-11G11H

    i l o c a i h c i r e h c s E ) y t i v i t c a G B ( 2 1 K 2 8 7 9 3 4 4 0 _ P Z

    A m u l i h p o m r e h t m u l l e c o r e a n 0 2 3 1 - Z / 5 2 7 6 M S D ) n o i t a t o n n a G B ( 6 3 9 3 7 5 2 0 0 _ P Y

    s u v a n g s u c c o c o n i m u R ) y t i v i t c a G B ( 1 E 8 8 7 1 8 5 4 3 : I G

    s u s o n m a h r s u l l i c a b o t c a L 1 - 2 S M L ) n o i t a t o n n a G B ( 2 8 7 9 3 4 4 0 _ P Z

    s n e g n i r f r e p . m u i d i r t solC ) y t i v i t c a G B ( 9 3 2 8 C T C N 2 7 8 2 4 6 2 0 _ P Z

    setucimriFpuorgGB-11G11H

    Fig. S1. Distance tree view of H11G11-BG Blastp results and localization of the Firmicutes H11G11-BG group and of several BGs of the UidA group.

    H11G11

    C7D2

    C. BartlettiZP_02210507

    S. variabileZP_03776158

    B. formatexigensZP_03687412

    R. gnavusZP_02040206

    R. gnavusZP_02041835

    Paenibacillus sp. ZP_02847334

    R. gnavusZP_02042987

    B.

    S. ZP_03776204

    B. formatexigensZP_03686296

    R.ZP_03753494

    B. formatexigensZP_03685645

    S. satellesZP_04455048

    S. variabile_ZP_03774451

    F. prausnitziiZP_02090684

    E. coli P05804

    L. gasseri gi12802352

    R. gnavus gi34581788

    C. perfringens B1RQV9

    H11

    G11

    -like

    BG

    (Firm

    icut

    es)

    Kno

    wn

    BG

    s

    variabile

    inulinivorans

    capillosusZP_02038110

    PD81

    9476

    PD00

    2797

    PD33

    9503

    PD00

    2797

    PDA

    2N0N

    2

    PD00

    2163

    PDA

    253F

    2

    PD88

    4831

    PDA

    1N8L

    2

    Fig. S2. Comparative domain organization of unique Firmicutes BGs and known BGs. Domain arrangements of proteins in families were performed using theProDom database on September 2009. Conserved characterized domains are represented by colored boxes and documented in ProDom database; un-characterized conserved domains are numbered.

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 1 of 9

    www.pnas.org/cgi/content/short/1000066107

  • -New PS00719 pattern: [NT]-x-[LIVMFYWD]-R-[STACNL](2)-H-Y-[PQ]-x(4)-[LIVMFYWS](2)-x(3)-[DN]-x(2) -G- [LIVMFYWA](4)

    >H11G11 CAT TAT CAG>ZP_02210507 CAC TAT CAA C. bartlettii>ZP_03776158 CAC TAC CAG S. variabile>ZP_036887412 CAC TAT CAG B. formatexigens >C7D2 CAT TAT CAG >ZP_02040206 CAT TAT CAG R. gnavus >ZP_02041835 CAT TAT CAG R. gnavus >ZP_02847334 CAT TAC CAG Paenibacillus sp.>ZP_02042987 CAT TAT CAG R. gnavus>ZP_02038110 CAC TAT CAG B. capillosus>ZP_03776204 CAC TAC CAG S. variabile >ZP_03686296 CAC TAC CAG B. formatexigens>ZP_03753494 CAC TAT CAG R. inulinivorans>ZP_03685645 CAT TAT CAG B. formatexigens>ZP_04455048 CAC TAT CAG S. satelles>ZP_3774451 CAC TAC CAG S. variabile>EDP22313 CAC TAC CAG F. prausnitzii>ZP_02065512 CAT TAT CCG B. ovatus>ZP_03478350 CAT TAC CCG P. johnsonii>ZP_02031489 CAT TAT CCG P. merdae>P05804 CAT TAC CCT E. coli >C2JTS9 CAT TAT CCA L. rhamnosus>C1CAWO CAT TAT CCA S. pneumoniae >Q3DSU4 CAT TAT CCT S. agalactiae>C2CG53 CAC TAC CCT A. tetradius>B1RQV9 CAT TAT CCA C. perfringens>B9MLH3 CAT TAT CCT A. thermophilum>Q6W7J7 CAT TAT CCT R. gnavus

    Firmicutes H11G11-BG group

    Bacteroidetes H11G11-BG group

    known BGs (uidA homologs)

    Fig. S3. A specific “HYQ” motif in the H11G11-BG group from Firmicutes. Codon usage within the HYP conserved domain of pattern 1 (Prosite PS00719) wascompared between H11G11-like BGs and known BGs (UidA homologs).

    IepyT

    IIepyT

    Ty IIIep

    VIepyT

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart ]msilobatem

    ,29700RGIT hpg , ragus ( edisocylG - edisotneP -edinoruxeH retropsnart)

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,29700RGIT hpg , ragus ( edisocylG - edisotneP -edinoruxeH retropsnart)

    ,29700RGIT hpg , ragus ( edisocylG - edisotneP -edinoruxeH retropsnart)

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,52920KRP,52920KRP etanoruculg esaremosi

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,1122GOC BleM /+aN, esoibilem retropmys dnasretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,1122GOC BleM /+aN, esoibilem retropmys dna detalersretropsnart [ etardyhobraC dnatropsnart msilobatem ]

    ,4350GOC MroN +aN, - nevird gurditlum xulffe pmup]smsinahcemesnefeD[ e9 13

    e2 62

    e3 62

    e4 52

    e6 33

    e1 251

    e3 23

    e3 04

    e7 34

    e4 86

    e8 15

    e3 37

    e2 76

    e1 96

    e5 76

    e9 - 13

    e2 - 62

    e3 - 62

    e4 - 52

    e6 - 33

    e1 - 251

    e3 - 23

    e3 - 04

    e7 - 34

    e4 - 86

    e8 - 15

    e3 - 37

    e2 - 76

    e1 - 96

    e5 - 76

    GB niamoddevresnoctsebretropmySretropmyS puorgretropmyStih)seigolomohpuorglanretni()eulaV-E(

    mulunargilodbuS elibairav 67151MSD 40267730PZ dyhylg 70267730PZ

    alletnayrB snegixetamrof 96441MSD

    succoconimuR suvang 4192CCTA

    airubesoR snaroviniluni 14861MSD

    sedioretcaB susollipac 99792CCTA

    54658630PZ 64658630PZalletnayrB snegixetamrof 96441MSD

    69268630PZ 79268630PZ

    78924020PZ 88924020PZ

    49435730PZ 39435730PZ

    01183020PZ 21183020PZ

    15447730PZ 25447730PZmulunargilodbuS elibairav 67151MSD

    iiztinsuarp 31322PDE 41322PDE

    aihtrowelttuhS selletas 00641MSD 84055440PZ 94055440PZ

    53374820PZsullicabineaP RDJ.ps -2

    11G11H 02eneg 12eneg

    85167730PZ 75167730PZmulunargilodbuS elibairav 67151MSD

    succoconimuR suvang 4192CCTA

    43374820PZ 43374820PZ-

    succoconimuR suvang 4192CCTA

    2D7C 6eneg 7eneg

    53814020PZ 73814020PZ

    60204020PZ 70204020PZ

    muiretcabilaceaF 2/12M

    Fig. S4. Genetic environments of the unique Firmicutes BGs. The conserved genetic environments close to the homolog Firmicutes BGs were restricted tosymporters whose homologies were grouped into four types determined after ClustalW multiple alignment and homology search. Red arrows represent thedifferent BG genes and green arrows the associated symporters. The best conserved domains and E values were those found after Blastp against nonredundantprotein sequences (NCBI).

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 2 of 9

    www.pnas.org/cgi/content/short/1000066107

  • Fig. S5. Amino acid motifs conserved in symporters associated with the Firmicutes BGs. The different types of putative symporters and their identities/sim-ilarities within types were determined after homology search and ClustalW Multiple alignment. The patterns proposed were obtained by using the discoverpatterns tool PRATT (http://www.expasy.ch/tools/pratt/).

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 3 of 9

    http://www.expasy.ch/tools/pratt/www.pnas.org/cgi/content/short/1000066107

  • 00+E00,0

    80-E00,5

    70-E00,1

    70-E05,1

    70-E00,2

    70-E05,2

    BnI EnI U1FMnI

    d.n d.n d.n

    stnafnI

    Freq

    uenc

    y in

    met

    agen

    omes

    (h

    its/b

    p)

    00+E00,0

    80-E00,5

    70-E00,1

    70-E05,1

    70-E00,2

    70-E05,2

    8bus7busY2FX2FW2FV2FT1FS1FRnIDnIAnI

    11G11H)%14(muidirtsolC

    70501220_PZiitteltrab.C

    85167730_PZelibairav.S

    31322PDEiiztinsuarp.F

    4943573_PZsnaroviniluni.R

    43374820_PZ.pssullicabineaP

    43374820_PZ.pssullicabineaP

    21556020_PZsutavo.B

    05387430_PZiinosnhoj.P

    98413020_PZeadrem.P

    98413020_PZeadrem.P

    5E1Q1CiitoverpsuccocoreanA

    07TF8BesneinfahmuiretcabotifluseD

    7J7W6Qsuvang.R

    3HLM9BmulihpomrehtmullecoreanA

    9VQR1BsnegnirfrepmuidirtsolC

    35GC2CsuidartetsuccocoreanA

    35GC2CsuidartetsuccocoreanA

    4USD3QeaitcalagasuccocotpertS

    0WAC1CeainomuenpsuccocotpertS

    9STJ2CsusonmahrsullicabotcaL

    9STJ2CsusonmahrsullicabotcaL

    40850P21Kiloc.E

    nerdlihcdnastludA

    Freq

    uenc

    y in

    met

    agen

    omes

    (h

    its/b

    p)

    sGBnwonKsgolomohAdiU

    sGB-11G11HsgolomohsetucimriF

    sGB-11G11HsgolomohsetedioretcaB

    Fig. S6. Information details about frequencies and distribution of the unique and known BGs among individuals. This figure presents the detailed results ofFig. 7 with the frequency per each BG gene tested.

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 4 of 9

    www.pnas.org/cgi/content/short/1000066107

  • Table

    S1.

    Gen

    esiden

    tified

    aspotential

    BGs

    Clone

    No.of

    aminoacids

    BestPS

    I-BLA

    STresult

    (iden

    tity

    %,E-va

    lue)

    Simila

    rity

    with

    glyco

    sylhyd

    rolase

    family

    2Prosite

    patterns

    Activity,

    arch

    itecture,an

    dgen

    etic

    context

    Putative

    function

    COG

    functional

    category

    H3D

    458

    5gi29

    3462

    82-aspartyl-tRNA

    synthetase

    (Bacteroides

    thetaiotaomicronVPI-548

    2)(91%

    ,0.0).

    1:<50

    %,2:<50

    %*

    PNP-G

    deg

    lucu

    ronidation

    Aminoacid

    activa

    tion

    COG01

    73Aspartyl-

    tRNA

    synthetase

    C11

    H2

    165

    gi118

    7487

    50-2C-m

    ethyl- D-erythritol2,4-

    cyclodiphosphatesynthase

    (Marinomonas

    sp.MW

    YL1

    )(59%

    ,1e

    -51

    ).Pa

    ttern2C

    -methyl- D-erythritol2,4-

    cyclodiphosphatesynthasesignature

    (100

    %).

    1:<50

    %,2:<50

    %PN

    P-G

    deg

    lucu

    ronidation

    Enzy

    mes

    ofthemethyl

    erythritolphosphate

    pathway

    (terpen

    oid

    biosynthesis).Interaction

    withthehost.

    COG02

    45Lipid

    tran

    sport

    andmetab

    olism

    H11

    G11

    755

    gi293

    4616

    7β-galactosidase(Bacteroides

    thetaiotaomicronVPI-548

    2)(40%

    ,3e

    -118

    )

    1:82

    %,2:93

    %PN

    P-G

    deg

    lucu

    ronidation.

    Glucu

    ronidepermea

    seupstream

    .Noβ-galactosidaseactivity.

    β-Glucu

    ronidase,

    glucu

    ronide

    metab

    olism.

    COG32

    50Carbohyd

    rate

    tran

    sport

    andmetab

    olism

    233

    and

    ≥60

    5

    ABC-typ

    ean

    timicrobialpep

    tidetran

    sport

    system

    :gi158

    9681

    9-ABCtran

    sporter,

    ATP

    -bindingprotein

    (Carnobacterium

    sp.

    AT7

    )(70%

    ,5e

    -93).ABCtran

    sporter,

    permea

    seprotein

    (Carnobacterium

    sp.

    AT7

    )(28%

    ,1e

    -54).

    1:<50

    %,2:<50

    %(forboth

    proteins)

    PNP-G

    deg

    lucu

    ronidation

    permea

    se.Pa

    tternglyco

    syl

    hyd

    rolase

    family

    16active

    site

    (65%

    ).Xyloglucanen

    do-

    tran

    sglyco

    sylase

    Cterm

    inal

    domain.

    SalX-A

    BC-typ

    ean

    timicrobial

    pep

    tidetran

    sport

    system

    .COG11

    36.2

    defen

    semechan

    isms

    331

    gi122

    5826

    56sporulationprotein

    and

    relatedproteins(Teth39

    DRAFT

    _034

    0)(Thermoan

    aerobacterethan

    olicusATC

    C33

    223)

    (35%

    ,1e

    -44).

    1:<50

    %,2:55

    %PN

    P-G

    deg

    lucu

    ronidation,cell

    wallhyd

    rolase,MraY

    family

    signature

    1(53%

    )

    Sporulationstag

    eII,

    protein

    DFirm

    icutes(IPR

    0142

    25).

    COG23

    85Sp

    orulation

    protein

    andrelated

    proteins

    H11

    B1

    916

    gi149

    8117

    21serine/threonineprotein

    kinase(Roseobactersp.Azw

    K-3b)

    (39%

    ,3e

    -44).Le

    ngth:44

    4am

    inoacids.

    1:<50

    %,2:51

    %PN

    P-G

    deg

    lucu

    ronidation.

    UDP-glyco

    syltransferase/glyco

    gen

    phosphorylase

    superfamily

    .Domainglyco

    syltran

    sferase

    group1.

    TPRrepea

    tSL

    1domain.

    Patternhex

    apep

    tide-repea

    t–co

    ntainingtran

    sferasesignature

    (68%

    )

    Bifunctional

    protein,signal

    tran

    sductionmechan

    isms,

    polysaccharideproduction/

    tran

    sport

    interactionwith

    thehost.

    COG

    gen

    eral

    function

    prediction.

    Gen

    esiden

    tified

    onfosm

    idinsertsco

    nferringdeg

    lucu

    ronidationactivity

    werean

    alyz

    edusingPS

    I-BLA

    STsearch

    onNCBIa

    gainst

    nonredundan

    tprotein

    sequen

    cesofGen

    Ban

    k(relea

    se16

    3.0,

    http://www.ncb

    i.nlm

    .nih.gov/BLA

    ST/)

    and

    EXPA

    SY(http://www.exp

    asy.ch

    /tools/blast/)

    datab

    ases.Domain

    arch

    itectures,

    potential

    functions,

    and

    associated

    patternswere

    analyz

    edusing

    InterProScan

    (http://www.ebi.a

    c.uk/

    InterProScan

    /),MyH

    itsfrom

    theSw

    issInstitute

    ofBioinform

    atics(http://www.isb-sib.ch/),PR

    OSITE

    (http://www.exp

    asy.org/prosite/),SM

    ART(http://sm

    art.em

    bl-heidelberg.de/),PF

    AM

    (http://www.san

    ger.ac.uk/

    Software/Pfam

    /),MotifScan

    (http://myh

    its.isb-sib.ch/cgi-bin/m

    otif_scan

    ),an

    dSU

    PERFA

    MILY

    (http://supfam.org/SUPE

    RFA

    MILY/).Proteinsim

    plicated

    intran

    sport

    system

    swerean

    alyzed

    withtheTran

    sport

    Clas-

    sificationDatab

    ase(http://www.tcd

    b.org/).

    *1an

    d2forPS

    0071

    9an

    dPS

    0060

    8Prosite

    patterns,respective

    ly.Conserved

    patternswerefoundonPR

    OSITE

    (http://www.exp

    asy.org/prosite/),a

    ndsimila

    rities

    weredetermined

    onPB

    IL(http://npsa-pbil.ibcp

    .fr/)

    usingPA

    TTINPR

    OT.

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 5 of 9

    http://www.ncbi.nlm.nih.gov/BLAST/http://www.ncbi.nlm.nih.gov/BLAST/http://www.expasy.ch/tools/blast/http://www.ebi.ac.uk/InterProScan/http://www.ebi.ac.uk/InterProScan/http://www.isb-sib.ch/http://www.expasy.org/prosite/http://smart.embl-heidelberg.de/http://www.sanger.ac.uk/Software/Pfam/http://www.sanger.ac.uk/Software/Pfam/http://myhits.isb-sib.ch/cgi-bin/motif_scanhttp://supfam.org/SUPERFAMILY/http://www.tcdb.org/http://www.expasy.org/prosite/http://npsa-pbil.ibcp.fr/www.pnas.org/cgi/content/short/1000066107

  • Table S2. Glycosyl hydrolase family 2 signatures conservation in the unique BG and homologs

    Metagenomic clone or genomeNCBI reference andno. of amino acids

    Prosite pattern PS00719(position and similarity %)

    Prosite pattern PS00608(position and similarity %)

    Metagenomic clone H11G11 309–334, 82% 362–376, 93%Metagenomic clone C7D2 314–339, 86% 369–383, 90%R. gnavus ATCC 29149 ZP_02041835 (747 aa) 304–329, 76% 359–373, 90%

    ZP_02040205 (757 aa) 316–341, 86% 371–385, 90%ZP_02042987 (640 aa) 298–323, 86% 351–365, 89%

    S. variabile DSM 15176 ZP_03776158 (735 aa) 312–337, 86% 365–379, 81%ZP_03774451 (727 aa) 319–344, 86% 372–386, 93%ZP_03776204 (639 aa) 329–354, 86% 382–396, 89%

    B.a formatexigens DSM 14469 ZP_03687412 (756 aa) 322–347, 86% 377–391, 90%ZP_03685645 (756 aa) 331–356, 86% 384–398, 71%ZP_03686296 (642 aa) 305–330, 76% 358–372, 77%

    F. prausnitzii M21/2 EDP22313 (735 aa) 302–327, 86% 355–369, 93%R. inulinivorans DSM 16841 ZP_03753494 (640 aa) 298–323, 86% 351–365, 93%C. bartlettii DSM 16795 ZP_02210507 (746 aa) 311–336, 76% 364–378, 93%B. capillosus ATCC 29799 ZP_02038110 (641 aa) 299–324, 86% 352–366, 89%S. satelles DSM 14600 ZP_04455048 (766 aa) 321–346, 86% 374–388, 93%Paenibacillus sp. JDR-2 ZP_02847334 (767 aa) 326–351, 86% 381–395, 90%

    Homologies of signatures (Prosite patterns) of glycosyl hydrolase family 2 (galactosidases/glucuronidases) and unique BG homologs were studied using thePATTINPROT tool (http://npsa-pbil.ibcp.fr/).

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 6 of 9

    http://npsa-pbil.ibcp.fr/www.pnas.org/cgi/content/short/1000066107

  • Table

    S3.

    UidA

    homologsresearch

    ingen

    omes

    possessingtheuniqueBG

    sequen

    ce

    Knownan

    dco

    nfirm

    edBG-positive

    strains

    Protein

    sources

    for

    homologyresearch

    BesttBlastn,localiz

    ation,

    andhomology(%

    iden

    tity,

    %simila

    rity)

    Aminoacid

    sequen

    cehomologywithH11

    G11

    (localiz

    ation,%

    iden

    tity,%

    simila

    rity)

    Assigned

    asH11

    G11

    BG-like

    group(positiononco

    ntig,

    %H11

    G11

    iden

    tity,%

    H11

    G11

    simila

    rity)

    S.va

    riab

    ileDSM

    1517

    6P0

    5804

    E.co

    liS_va

    riab

    ile-1.0.1_C

    ont0.25

    (198

    49–21

    483)

    26%

    ,42

    %(198

    73–22

    053)

    48%

    ,61

    %ZP

    0377

    4451

    (198

    73–22

    056)

    45%

    ,57

    %gi128

    0235

    2L.

    gasseri

    S_va

    riab

    ile-1.0.1_C

    ont0.19

    (592

    10–60

    709)

    25%

    ,44

    %(590

    90–61

    003)

    55%

    ,69

    %ZP

    0377

    6204

    (590

    90–61

    009)

    43%

    ,54

    %gi345

    8178

    8R.gnav

    us

    S_va

    riab

    ile-1.0.1_C

    ont0.19

    (591

    92–60

    709)

    25%,41

    %B.ova

    tusATC

    C84

    83P0

    5804

    E.co

    liB_o

    vatus-MSIQ_C

    ont517

    (133

    275–

    1316

    08)

    26%

    ,44%

    (133

    134–

    1316

    08)25

    %,44%

    <50

    simila

    rity

    gi128

    0235

    2L.

    gasseri

    B_o

    vatus-MSIQ_C

    ont460

    (264

    053–

    2656

    39)27

    %,

    42%

    (263

    942–

    2657

    77)40

    %,54

    %ZP

    0206

    5512

    (263

    912–

    2659

    09)34

    %,46

    %

    gi345

    8178

    8R.gnav

    us

    B_o

    vatus-MSIQ_C

    ont517

    (133

    293–

    1315

    90)26

    %,

    41%

    (133

    134–

    1316

    08)25

    %–44

    %<50

    simila

    rity

    P.johnsoniiDSM

    1831

    5P0

    5804

    E.co

    liP_

    johnsonii-1.0_

    Cont16.36

    (888

    373

    12)28

    %,42

    %(901

    5–70

    90)40

    %,55

    %ZP

    0347

    8350

    (703

    9 –90

    75)

    34%

    ,46

    %gi128

    0235

    2L.

    gasseri

    P_johnsonii-1.0_

    Cont16.36

    (894

    6–73

    12)27

    %,43

    %(901

    570

    90)40

    %,55

    %

    gi345

    8178

    8R.gnav

    us

    P_johnsonii-1.0_

    Cont16.36

    (894

    0–73

    18)25

    %,41

    %(901

    5–70

    90)40

    %,55

    %

    F.prausnitziiM21

    /2P0

    5804

    E.co

    liF_prausnitzii_M21

    2-2.0.1_

    Cont750

    (314

    70–

    3326

    0)35

    %,53

    %

    (314

    97–33

    233)

    25%

    ,41

    %<50

    simila

    rity

    gi128

    0235

    2L.

    gasseri

    F_prausnitzii_M21

    2-2.0.1_

    Cont750

    (314

    73–

    3324

    2)42

    %,57

    %gi345

    8178

    8R.gnav

    us

    F_prausnitzii_M21

    2-2.0.1_

    Cont750

    (314

    73–

    3323

    3)38

    %,53

    %R.gnav

    usATC

    C29

    149

    P058

    04E.

    coli

    R_g

    nav

    us-1.0.1_

    Cont39.1

    (267

    4–41

    13)27

    %,41

    %(249

    1–44

    07)53

    %,68

    %ZP

    0204

    2987

    (249

    1–44

    10)

    45%

    ,57

    %gi128

    0235

    2L.

    gasseri

    R_g

    nav

    us-1.0.1_C

    ont39.1

    (261

    1–41

    13)26

    %,45

    %gi345

    8178

    8R.gnav

    us

    R_g

    nav

    us-1.0.1_

    Cont244

    (468

    95–48

    577)

    25%

    ,41

    %(468

    92–49

    153)

    46%

    ,60

    %ZP

    0204

    0206

    (468

    86–49

    156)

    44%

    ,57

    %B.capillosusATC

    C29

    799

    P058

    04E.

    coli

    B_cap

    illosus-2.0.1_

    Cont317

    (459

    01–47

    340)

    27%,39

    %(457

    21–47

    634)

    52%

    ,67

    %ZP

    0203

    8110

    (457

    15–47

    637)

    44%

    ,56

    %gi128

    0235

    2L.

    gasseri

    B_cap

    illosus -2.0.1_

    Cont317

    (457

    30–47

    340)

    26%,44

    %gi345

    8178

    8R.gnav

    us

    B_cap

    illosus-2.0.1_

    Cont317

    (457

    24–47

    352)

    24%,41

    %

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 7 of 9

    www.pnas.org/cgi/content/short/1000066107

  • Table

    S3.Cont.

    Knownan

    dco

    nfirm

    edBG-positive

    strains

    Protein

    sources

    for

    homologyresearch

    BesttBlastn,localiz

    ation,

    andhomology(%

    iden

    tity,

    %simila

    rity)

    Aminoacid

    sequen

    cehomologywithH11

    G11

    (localiz

    ation,%

    iden

    tity,%

    simila

    rity)

    Assigned

    asH11

    G11

    BG-like

    group(positiononco

    ntig,

    %H11

    G11

    iden

    tity,%

    H11

    G11

    simila

    rity)

    B.form

    atex

    igen

    sDSM

    1446

    9P0

    5804

    E.co

    liB_form

    atex

    igen

    s-1.0.1_

    Cont23.1(74

    444–

    7625

    2)43

    %,57

    %

    (746

    96–76

    225)

    26%

    ,41

    %<50

    simila

    rity

    gi128

    0235

    2L.

    gasseri

    B_form

    atex

    igen

    s-1.0.1_

    Cont2.1

    (232

    634–

    2344

    24)37

    %,55

    %

    (232

    901–

    2344

    15)28

    %,44

    %

    gi345

    8178

    8R.gnav

    us

    B_form

    atex

    igen

    s-1.0.1_

    Cont2.1(232

    643–

    2344

    21)40

    %,54

    %P.

    merdae

    ATC

    C43

    184

    P058

    04E.

    coli

    P_merdae

    -MSIQ_C

    ont63

    (214

    079–

    2125

    08)28

    %,

    42%

    (214

    211–

    2122

    86)41

    %,55

    %ZP

    0203

    1489

    (214

    271–

    2122

    38)34

    %,46%

    gi128

    0235

    2L.

    gasseri

    P_merdae

    -MSIQ_C

    ont63

    (214

    211–

    2125

    08)26

    %,

    43%

    gi345

    8178

    8R.gnav

    us

    P_merdae

    -MSIQ_C

    ont63

    (214

    091–

    2125

    14)26

    %,

    41%

    C.bartlettiiDSM

    1679

    5P0

    5804

    E.co

    liC_b

    artlettii-2.0.1_

    Cont15.1

    (406

    77–42

    332)

    24%

    ,39

    %(406

    74–42

    899)

    58%

    ,70

    %ZP

    0221

    0507

    (406

    74–42

    911)

    54%

    ,68

    %gi128

    0235

    2L.

    gasseri

    C_b

    artlettii-2.0.1_

    Cont15.1

    (406

    56–42

    332)

    22%

    ,39

    %gi345

    8178

    8R.gnav

    us

    C_b

    artlettii-2.0.1_

    Cont15.1

    (407

    22–42

    33223

    %,39

    %)

    R.inulin

    ivoransDSM

    1684

    1P0

    5804

    E.co

    liR_inulin

    ivorans-

    1.0.1_

    Cont419

    .1(752

    33–

    7373

    4)26

    %,42

    %

    (753

    56–73

    458)

    53%

    ,67

    %ZP

    0375

    3494

    (753

    56–73

    437)

    44%

    ,57

    %

    gi128

    0235

    2L.

    gasseri

    R_inulin

    ivorans-

    1.0.1_

    Cont419

    .1(752

    36–

    7373

    4)27

    %,45

    %gi345

    8178

    8R.gnav

    us

    R_inulin

    ivorans-

    1.0.1_

    Cont419

    .1(752

    33–

    7373

    1)25

    %,40

    %

    Todeterminetherelationship

    betwee

    nthesestrain

    activities

    andtheirsequen

    cepotentialitiesforboth

    UidA(E.coliK12

    P058

    04)an

    dH11

    G11

    -BGhomologs,UidAhomologsweresearch

    edusingtblastnin

    the

    gen

    omes

    concerned

    andfurther

    aligned

    toH11

    G11

    -BGhomologs.Comparativean

    alysisofiden

    tity

    percentages

    betwee

    nthetw

    oBGclassesallowed

    iden

    tificationofstrainsforwhichaH11

    G11

    -BGhomologwas

    theonly

    knownBG

    gen

    eab

    leto

    gen

    eratean

    activity.Th

    esameap

    proachwas

    perform

    edwiththethreeBacteroidetes

    strainsstudied(uniqueBG

    group)ex

    ceptthat

    theinco

    mplete

    16SrRNA

    sequen

    ceof

    Parabacteroides

    merdae

    ATC

    C43

    184T

    was

    replacedbytheclosest

    sequen

    cefrom

    uncu

    lturedbacterium

    cloneTS

    4_a0

    2f06

    .

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 8 of 9

    www.pnas.org/cgi/content/short/1000066107

  • Table S4. Oligonucleotides and PCR conditions used in this study

    Gene (metagenomic clone) Primer Sequences

    Putative aspartyl-tRNA synthetase H3D4 F-H3D4-Synth-Gt 5′-GGA TCC AAT GTA TAG ATC ACA CAC CTG CGG AGA ATT G-3′R-H3D4-Synth-Gt 5′-CTG CAG TCA CTT AAA AGT GAC CTT TTT ATT CAT CAG GAA-3′

    Putative 2C-methyl-D-erythritol 2,4-cyclodiphosphatesynthase C11H2

    F-C11H2-Ery 5′-GGA TCC AAT GAG TTC GGG AAT GAT GAA ATT CAG A-3′

    R-C11H2-Ery 5′-CTG CAG CTA TTC CCT GTA AAT CAG GGC GAC CGC-3′Putative serine/threonine protein kinase H11B1 F-H11B1 5′-GAA TTC GAT GCC AGC AGA AGG CCA GCC AGA C-3′

    R-H11B1 5′-CTG CAG TTA ATT GAT TTC AAT TCC GGA GTT TTT-3′Putative sporulation protein H11G11 F-H11G11-Spo 5′-GGA TCC AAT GAA AAC ATA CGG TAT TTT ATG CTT -3′

    R-H11G11-Spo 5′-CTG CAG TCA AAT AAT TTC GGT GTT CGG ATA GTA-3′Putative ABC transporter, permease protein H11G11 F-H11G11-Perm 5′-GGA TCC AAT GAG AAA AGA TTT TCA GCG GGA AAT A-3′

    R-H11G11-Perm 5′-CTG CAG TTA TAC CAA AGC GGC AAC AAG GAA GAA TAT-3′Putative β-galactosidase H11G11 F-H11G11-Bgal 5′-GGA TCC AAT GCG AGA AGT AAT AAA TAT AAA CAA-3′

    R-H11G11-Bgal 5′-CTG CAG TTA TTT CTG CTT TTT CTT AAT CTT GTT-3′

    Underlining indicates restriction enzymes sites (BamHI and PstI except for putative serine/threonine protein kinase H11B1: EcoRI and PstI). PCR was conductedunder the following conditions: putative aspartyl-tRNA synthetase H3D4—(F0.3 μM, R0.3 μM), PCR (15′ 95 °C, 5 cycles: 94 °C 1′, 62 °C 1′, 72 °C 2′, 5 cycles: 94 °C 1′,57 °C 1′, 72 °C 2′, 10 cycles: 94 °C 1′, 52 °C 1′, 72 °C 2′10 and10-min elongation step at 72 °C); putative serine/threonineprotein kinaseH11B1—(F0.2 μM,R0.2 μM), PCR(15′ 95 °C, 5 cycles: 94 °C 1′, 68 °C 1′, 72 °C 2′, 5 cycles: 94 °C 1′, 63 °C 1′ and 72 °C 2′, 20 cycles: 94 °C 1′, 58 °C 1′, 72 °C 3′ and 10-min elongation step at 72 °C); putativeABC transporter, permease proteinH11G11—(F0.2 μM,R0.2 μM), PCR (15′ 95 °C, 5 cycles: 94 °C 1′, 66 °C 1′, 72 °C 1′ 30 20 cycles: 94 °C 1′, 61 °C 1′, 72 °C 1′ 30 and10-minelongation step at 72 °C); putative 2C-methyl-D-erythritol 2,4-cyclodiphosphate synthase C11H2—(F0.3 μM, R0.3 μM, 3% DMSO); PCR (15 ′95 °C, 5 cycles: 94 °C 1′,68 °C 1′, 72 °C 30′′, 20 cycles: 94 °C 1′, 62 °C 1′, 72 °C 45′′ and 10-min elongation step at 72 °C); putative sporulation protein H11G11—(F0.3 μM, R0.3 μM); PCR (15′95 °C, 5 cycles: 94 °C 1′, 63 °C 1′, 72 °C 1′, 20 cycles: 94 °C 1′, 58 °C 1′, 72 °C 1′ and 10-min elongation step at 72 °C); putative β-galactosidase H11G11—(F0.6 μM,R0.6 μM,3% DMSO), (15′ 95 °C, 5 cycles: 94°C 1′, 52 °C 1′, 72 °C 2′, 20 cycles: 94 °C 1′, 47 °C 1′, 72 °C 2′ and 10-min elongation step at 72 °C).

    Gloux et al. www.pnas.org/cgi/content/short/1000066107 9 of 9

    www.pnas.org/cgi/content/short/1000066107