Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced...

26
Supporting Materials Outcrossing, mitotic recombination, and life-history tradeoffs shape genome evolution in Saccharomyces cerevisiae Paul M. Magwene, Ömür Kayıkçı, Joshua A. Granek, Jennifer Reininga, Debra Murray 1 Supplemental Methods 1.1 SRA accession information for Illumina sequencing reads Short-read sequences, generated on the Illumina GAII platform, have been deposited in the NIH Sequence Read Archive (SRA), under Submission accession number SAR025864 (Study accession number SRP004040). The read files for each genome are in FASTQ format (3). Table S1 provides information about information about experiment accession numbers for each strain. Table S1: Additional information regarding the strains that were analyzed by whole genome sequencing on the Illumina GAII platform. Alternate strain designations are given in parentheses. The average read depth was calculated as the mean number of short reads the were mapped to each site in the reference genome. Strain Number Strain Name Origin Avg. Read Depth Reference SRA # PMY070 EM93 Fig 14.0 (20) SRX030121 PMY127 YJM128 Clinical 23.8 (18) SRX030122 PMY132 YJM223 Clinical 55.0 (17) SRX030123 PMY141 YJM308 Clinical 14.4 (18) SRX030124 PMY142 YJM309 Clinical 26.9 (18) SRX030125 PMY144 YJM311 Clinical 14.2 (18) SRX030126 PMY110 PMY110 (ARA316a) Vineyard 23.4 (23) SRX030131 PMY112 PMY112 (ARA412a) Vineyard 48.3 (23) SRX030579 PMY093 PMY093 (ARN239a) Muscadine grape 15.1 (23) SRX030132 PMY131 YJM222 Clinical 26.3 (18) SRX030133 PMY017 YPS670 Oak 16.3 (21) SRX030134 PMY068 Σ1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies, we include some notes below that highlight the relationships of the strains we sequenced to other commonly used strains. Key to previous studies: FB2005 = Fay and Benavides (7); DD2009 = Diezmann and Dietrich (5); L2009 = Liti et al. (14); S2009 = Schacherer et al. (24). EM93 – EM93 is a diploid strain isolated from a fig in the 1930s and is the primary contributor to the genome of the the reference strain S288c (20). A lab engineered version of EM93 was included in S2009. YJM128 – YJM128 is the diploid ancestor of strains YJM145 and YJM789 (18), which are common backgrounds used for studies of fungal virulence. The genome sequence of YJM789 is described in Wei et al. (28). YJM789 or YJM145 have been included in multiple poplation genetic studies: FB2005, DD2009, L2009, S2009. 1

Transcript of Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced...

Page 1: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

Supporting MaterialsOutcrossing, mitotic recombination, and life-history tradeoffs shape genomeevolution in Saccharomyces cerevisiae

Paul M. Magwene, Ömür Kayıkçı, Joshua A. Granek, Jennifer Reininga, Debra Murray

1 Supplemental Methods

1.1 SRA accession information for Illumina sequencing reads

Short-read sequences, generated on the Illumina GAII platform, have been deposited in the NIHSequence Read Archive (SRA), under Submission accession number SAR025864 (Study accessionnumber SRP004040). The read files for each genome are in FASTQ format (3). Table S1 providesinformation about information about experiment accession numbers for each strain.

Table S1: Additional information regarding the strains that were analyzed by whole genomesequencing on the Illumina GAII platform. Alternate strain designations are given in parentheses.The average read depth was calculated as the mean number of short reads the were mapped toeach site in the reference genome.

StrainNumber

Strain Name Origin Avg. ReadDepth

Reference SRA #

PMY070 EM93 Fig 14.0 (20) SRX030121PMY127 YJM128 Clinical 23.8 (18) SRX030122PMY132 YJM223 Clinical 55.0 (17) SRX030123PMY141 YJM308 Clinical 14.4 (18) SRX030124PMY142 YJM309 Clinical 26.9 (18) SRX030125PMY144 YJM311 Clinical 14.2 (18) SRX030126PMY110 PMY110 (ARA316a) Vineyard 23.4 (23) SRX030131PMY112 PMY112 (ARA412a) Vineyard 48.3 (23) SRX030579PMY093 PMY093 (ARN239a) Muscadine grape 15.1 (23) SRX030132PMY131 YJM222 Clinical 26.3 (18) SRX030133PMY017 YPS670 Oak 16.3 (21) SRX030134PMY068 Σ1278b (MLY61) Lab 23.2 (15) SRX030135

1.2 Notes on sequenced strains

For the purposes of facilitating comparison to previous and future studies, we include some notesbelow that highlight the relationships of the strains we sequenced to other commonly used strains.Key to previous studies: FB2005 = Fay and Benavides (7); DD2009 = Diezmann and Dietrich (5);L2009 = Liti et al. (14); S2009 = Schacherer et al. (24).

• EM93 – EM93 is a diploid strain isolated from a fig in the 1930s and is the primary contributorto the genome of the the reference strain S288c (20). A lab engineered version of EM93 wasincluded in S2009.

• YJM128 – YJM128 is the diploid ancestor of strains YJM145 and YJM789 (18), which arecommon backgrounds used for studies of fungal virulence. The genome sequence of YJM789is described in Wei et al. (28). YJM789 or YJM145 have been included in multiple poplationgenetic studies: FB2005, DD2009, L2009, S2009.

1

Page 2: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

• YJM223 – clinical isolate, collected in southern California in 1990. Described in McCuskeret al. (17).

• YJM308 – clinical isolate, collected in southern California in 1990. Described in McCuskeret al. (18). Included in: DD2009.

• YJM309 – clinical isolate, collected in San Franciso bay area in 1991. Described in McCuskeret al. (18). Included in: DD2009; derivative strain YJM320 included in: FB2005, S2009.

• YJM311 – clinical isolate, collected in San Franciso bay area in 1991. Described in McCuskeret al. (18). Included in: DD2009.

• PMY110 (ARA316a) – vineyard isolate, collected in Adelaide, Australia, by Ann Rouse.Described in Rouse (23). Included in: DD2009.

• PMY112 (ARA412a) – vineyard isolate, collected in Adelaide, Australia, by Ann Rouse.Described in Rouse (23). Included in: DD2009.

• PMY093 (ARN239a) – Strain isolated from a muscadine grape (Vitis rotundifolia) in NorthCarolina, USA; collected by Ann Rouse. Described in Rouse (23). Included in: DD2009.

• YJM222 – clinical isolate, collected in San Franciso bay area in 1989. Described in McCuskeret al. (18). YJM222 and YJM454 are subclones that differ in virulence (2). Closely relatedstrain YJM454 included in: FB2005, DD2009, S2009.

• YPS670 – isolated from exudate of an oak tree, Pennsylavania, USA. Desribed in Kuehne (9).

• Σ1278b (MLY61) – obtained from laboratory of Joseph Heitman, Duke University. ThisΣ1278b isolate has a different history than the Σ1278b isolate sequenced in Dowell et al.(6). In particular this Σ1278b does not have a history of backcrossing to S288c, as describedin Styles, C. “The History of Sigma1278b and notes on other Sigma1278b derivative sets”(http://wiki.yeastgenome.org/index.php/History_of_Sigma). For an example of one ofseveral genetic differences between the Heitman Σ1278b and Fink Σ1278b strain backgroundssee Granek and Magwene (8).

1.3 Aligning reads and calling SNPs

Short-reads were mapped to the genome of the standard S. cerevisiae reference genome (obtainedfrom the Saccharomyces Genome Database, January 2010) using the short-read mapping softwareMAQ, version 0.7.1 (12) and BWA, version 0.5.0 (11), and SAMtools, version 0.1.7 (13)

MAQ Settings MAQ alignments were done using the ‘easyrun’ option of the maq.pl script usingthe default parameter settings. Key parameters for the SNP filtering used in this script are aminimum read depth (-e 3), minimum consensus quality for SNPs (-q 30), minimum neighborconsensus quality (-E 20) and maximum number of SNPs in a window (-B 2).

BWA / SAMtools Settings BWA alignments were generated use the ‘samse’ (single-end reads)option of the BWA aligner to generate alignments in the SAM format. The SAMtools (13) softwarepackage was used to generate raw pileup files which were further filtered using the ‘varFilter’option of the samtools.pl script with the maximum read depth set to 100 (-D 100). As recommendedin the SAMtools documentation we used an awk script to further filter out short indel calls withquality scores less than 50 or SNP calls with quality scores less than 20.

2

Page 3: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

Combining MAQ and BWA/SAMtools We used custom Python scripts to combine the ‘final’pileup files generated separately by the MAQ and BWA pipelines described above. BWA/SAMtoolsmakes short indel calls, while MAQ does not. Indels typically represent 4-5% of the total SNP callsfrom BWA. However, indels are likely to have a much higher error rate for short-reads so we choseto exclude them for the purposes of estimating genomic heterozygosity.

Subsequent analyses of heterozygosity and loss-of-heterozygosity were carried out on the subsetof SNPs that were identically called by both mapping algorithms. For SNPs as a whole this reduceddata set represents, on average, ∼96% of the calls made by each algorithm individually. For theheterozygous strains, the corresponding figure for the heterozygous sites is ∼92%.

Assessing the effect of repetive sequences on SNP calls Our data exploration suggested thatSNP calls based on the consensus of the MAQ and BWA alignments tend to be conservativewith respect to the number of SNPs identified. We explored the effects of using different SNPquality thresholds, minimum read depths, filtering of repetitive regions, etc. on the number ofcalled SNPs. These factors change the absolute number of SNPs called, but do not substantivelyeffect the genomic patterns we report. For example, in Table S2, we give the estimated number ofSNPs and heterozygous sites estimated after filtering out all SNP calls in repetive regions of thereference genome. Repetitive regions were identified in two ways: 1) genomic features identiedas belonging to one of the following features types in the GFF3 sequence feature file ‘saccha-romyces_cerevisiae.gff’ available from the Saccharomyces Genome Database: (LTR_retrotransposon,centromere, long_terminal_repeat, repeat_regions, telomere, ARS, transposable_element_gene); and2) by using the repeat masking algorithm implemented in the Tandem Repeats Finder (TRF) pro-gram (1). Combining these two criteria masks ∼727 Kb across the ∼12 Mb reference genome. Thepercentage change in the number of heterozygous SNPs called in the heterozygous backgroundsafter masking is on the order of 1.8-5.75% (compare to Table 1).

Table S2: The number of SNPs and heterozygous sites identified after masking out repetetivegenomic features.

Strain Origin Filtered SNPs Filtered Het. SitesEM93 Fig 29,376 23,841YJM128 Clinical 62,975 32,849YJM223 Clinical 61,007 36,177YJM308 Clinical 49,126 21,825YJM309 Clinical 58,986 22,362YJM311 Clinical 50,180 23,309PMY110 Vineyard 41,576 5,849PMY112 Vineyard 42,423 6,177PMY093 Muscadine grape 53,768 3,851YJM222 Clinical 47,607 610YPS670 Oak 56,630 403Σ1278b Lab 27,622 231

1.4 Confirmatory sequencing to verify heterozygosity

In order to estimate the reliability of the heterozygous SNP calls we used conventional Sangersequencing to genotype 20 loci in each of five strains showing evidence of heterozygosity. Eachlocus was predicted to have either two or three heterozygous SNPs. In total, we genotyped 400heterozygous SNPs. We failed to confirm the heterozygosity of only a single site in a single strain.Based on this, we estimate the false positive rate of heterozygous SNP calls to be less than 0.005. Inaddition, examination of this sequence data revealed a number of additional heterozygous sites that

3

Page 4: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

were not classified as SNPs in our genomic analyses because there was not complete agreementbetween both mapping algorithms. For those false negative heterozygous calls identified in Sangersequencing, BWA had a lower false negative rate than MAQ for the settings we used.

1.5 Determining LOH regions

To estimate the chromosomal coordinates of large loss-of-heterozygosity (LOH) regions we devel-oped an LOH calling algorithm. The algorithm works as follows:

1. For each chromosome estimate the number of heterozygous sites, at intervals di, per bin ofwidth δ = h/m using an average shifted histogram with m shifts (26). For the estimation ofLOH regions we used parameters: h = 20Kb, m = 16 and di = 1Kb to estimate the ASH.

The average shifted histogram (ASH) is the equally weighted average of m shifted histogramswith bin width h. The ASH is a useful non-parameteric density and frequency estimatorthat is robust to the choice of bin origin (25). The ASH provides smoothed estimates of thefrequency of heterozygous and homozygous sites and was used to generate figures S1-S13below.

2. Based on the ASH estimate of the frequency of heterozygous sites, define a ‘homozygousinterval’ as an ASH bin with a estimated frequency of no more than th heterozygous sites. thcorresponds to a maximum threshold of heterozygous sites per bin interval, δ.

3. Estimate LOH regions as runs of ‘homozygous intervals’ spanning at least WL bases.

For the estimates reported below we used th = 3, corresponding to a threshold of threeheterozygous sites per 20Kb window and WL = 190, 000, corresponding to a minimum LOHregion size of 190Kb.

The estimated LOH regions for each strain are given in Table S3.

1.6 Estimating the number of clonal generations since outcrossing

If we assume LOH events are selectively neutral, we can use the observed number of LOH regionsobserved per strain to estimate the number of generations since outcrossing based on simple‘molecular clock’ reasoning (10). We assume that all LOH regions are due to mitotic recombination,and used the observed number of LOH regions from Table S3 and published estimates of mitoticrecombination rate to carry out this estimation.

Mandegar and Otto (16) used multiple published estimates of the mitotic recombination ratein S. cerevisiae to derive a regression model describing mitotic recombination rate per cell pergeneration as a function of distance from the centromere. Given an average distance to thecentromere of 224 Kb in S. cerevisiae, they predicted an average mitotic recombination rate of0.8 × 10−4 per cell per generation. This rate is similar to rates of LOH estimated on the right armof chromosome IV as estimated by McMurray and Gottschling (19) (average rate: 1 × 10−4 per celldivision). McMurray and Gottschling showed that rates of LOH are significantly higher on theright arm of chromosome XII, perhaps as much as 10-fold higher than on chromosome IV (averagerate: 7 × 10−4 per cell division).

In Table S4 we provide two estimates of the number of generations since outcrossing – one thatexcludes from the LOH counts the right arm of chromosome XII and one that includes chromosomeXII LOH regions. The exclusion of chromosome XII is motivated by the observation that becauseLOH events on chromosome XII occur at a much higher frequency, they are likely to arise andbe fixed (or lost) before other LOH events involving other chromosomal regions. We used anestimated LOH rate of 0.8 × 10−4 (see above) for both calculations.

4

Page 5: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

Table S3: Estimated LOH regions for each of the heterozygous strains. LOH coorrdinates have beenround to the nearest thousand.

StrainNumber

Strain Name Chrom. LOH start LOH end

PMY070 EM93 4 682,000 1,532,0005 377,000 577,00012 459,000 1,078,00016 748,000 948,000

PMY127 YJM128 7 22,000 258,00012 463,000 1,045,000

PMY132 YJM223 4 693,000 956,000

PMY141 YJM308 2 1 191,00010 408,000 726,00012 867,000 1,078,00013 479,000 907,00016 43,000 316,000

PMY142 YJM309 2 304,000 813,0004 26,000 242,0004 513,000 1,509,0009 54,000 328,000

PMY144 YJM311 2 581,000 784,00012 466,000 1,052,000

PMY110 PMY110 12 240,000 1,078,00015 784,000 1,059,000

PMY112 PMY112 4 774,000 1,493,00012 244,000 1,078,000

PMY093 PMY093 – – –

Table S4: Estimated number of clonal generations since outcrossing based on the observed numberof LOH regions.

StrainNumber

Strain Name ExcludingChr XII

IncludingChr XII

PMY070 EM93 37,500 50,000PMY127 YJM128 12,500 25,000PMY132 YJM223 12,500 12,500PMY141 YJM308 50,000 62,500PMY142 YJM309 50,000 50,000PMY144 YJM311 12,500 25,000PMY110 PMY110 12,500 25,000PMY112 PMY112 12,500 25,000

Average 25,000 34,375

1.7 Estimating FIS

We estimated FIS values (22) by treating the strains as representing either two or three sub-populations, based on cluster analyses and neighbor-joining trees estimated from the SNP data

5

Page 6: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

(Fig. S15). This population structure is consistent with that estimated by Fay and Benavides (7)and Diezmann and Dietrich (5)). FIS were calculated using an R-script written by Eva Chan(http://www.evachan.org). Each FIS estimate below is the average of 100 pseudo-replicates, eachestimated by drawing a random sample of 1000 SNP sites from the total set of SNPs identifiedacross the 11 genomes of interest.

For the two-group population structure the groups were:

1. ‘Natural’ – YPS670, PMY093, YJM222

2. ‘Domesticated’ –YJM223, PMY110, PMY112, EM93, YJM308, YJM309, YJM128, YJM311.

FIS

Natural 0.95Domesticated 0.47

For the three-group population structure the groups were:

1. ‘Natural’ – YPS670, PMY093, YJM222

2. ‘Vineyard’ – YJM223, PMY110, PMY112

3. ’Clinical’ – EM93, YJM308, YJM309, YJM128, YJM311

FIS

Natural 0.95Vineyard 0.44Clinical 0.31

1.8 Phenotyping

Sporulation Sporulation was assayed 48 hours after transfer of cells to LA-SPO medium. Sporu-lation was quantified as the percentage of sporulated cells (including tetrads, triads, and dyads),based on counts of at least 200 cells, typically with at least three replicates per strain.

To rule out that heterozygosity itself is the cause of poor sporulation we scored eight haplo-selfed segregants of EM93 for sporulation ability. Even after 7 days in LA-SPO medium we observedfewer than 1% of cells had sporulated. This strain background is not incapable of sporulationhowever since we can ‘coax’ it to sporulate by growing it in a 6% glucose + YPD medium forovernight and then transfering it to sporulation medium for an extended period of time. Thisobservation is consistent with our suggestion that this and other ‘poor sporulating’ strains are notobligately asexual, but rather are slow to sporulate.

Pseudohyphal growth Pseudohyphal growth was assayed in two different ways, using both aquantitative and qualitative (binary) assay. In both cases cells were grown on SLAD medium. Forthe quantitative assay (see Figure S13) pseudohyphal growth was quantified as the percentage ofmicrocolonies that exhibited elongate cells and filamentous chains of cells after 48 hours of growthon SLAD. For the qualitative assay pseudohyphal growth was scored based on the appearance ofelongated cells, unipolar budding, and agar invasion.

6

Page 7: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

1.9 A broader survey of heterozygosity

We sequenced nine loci in 18 additional S. cerevisiae strains to assess broader patterns of het-erozygosity among diploid isolates of S. cerevisiae. For each locus we sequenced the designatedcoding region plus 500bp of upstream sequence. The number of heterozygous sites per locus isreported in Table S5 below along with strain information and GenBank accession numbers for thecorresponding sequence data.

1.10 Classification of heterozygosity

We assigned strains to one of two heterozygosity classes based on the amount of observed heterozy-gosity. For strains for which we generated whole genome sequence data, we classified strains with> 3000 heterozygous sites as being in the ‘high’ heterozygosity class; all other strains were assignedto the ‘low’ class. For strains for which we had genotype data only on the nine loci described abovewe assigned strains to the ‘high’ class if they were heterozygous at three or more loci.

1.11 RME1 genotyping

Deutschbauer and Davis (4) identified a polymorphic site in the promoter of RME1 (positions308-310 bp upstream of the start codon) which they showed was a QTN for sporulation ability in across between S288c and SK1. They genotyped this site in a number of other strains backgroundsand identified three alleles. We sequenced the RME1 locus (coding sequence + 500 bp upstream)using Sanger sequencing and assigned a RME1 genotype based on the observed allele.

7

Page 8: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

Tabl

eS5

:B

road

ersu

rvey

ofhe

tero

zygo

sity

.E

ach

entr

yin

the

tabl

ein

dic

ates

the

num

ber

ofhe

tero

zygo

us

site

sob

serv

edfo

ra

give

nst

rain

atth

eco

rres

pond

ing

locu

s.

Stra

inN

umbe

rSt

rain

Nam

eO

rigi

nR

efer

ence

FLO

8IM

E1IM

E2M

EP2

PHD

1R

ME1

SOK

2ST

E12

TEC

1

PMY

011

YPS

602

Oak

,PA

,USA

(9)

00

00

00

00

0PM

Y01

5Y

PS63

0O

ak,P

A,U

SA(9

)0

00

00

00

00

PMY

018

YPS

681

Oak

,PA

,USA

(9)

00

00

00

00

0PM

Y07

2SG

U10

9V

iney

ard,

Ital

y(2

7)0

00

11

20

10

PMY

074

SGU

112

Vin

eyar

d,It

aly

(27)

20

20

00

00

1PM

Y08

3A

RN

056A

Vin

eyar

d,N

C,U

SA(2

3)10

21

00

02

07

PMY

084

AR

N05

6BV

iney

ard,

NC

,USA

This

stud

y0

01

01

02

27

PMY

086

AR

N17

9BV

iney

ard,

NC

,USA

This

stud

y0

00

00

00

00

PMY

087

AR

N20

2AV

iney

ard,

NC

,USA

This

stud

y0

00

00

00

00

PMY

088

AR

N20

2BV

iney

ard,

NC

,USA

(23)

00

00

00

00

0PM

Y09

4A

RN

244-

1M

usca

dine

grap

e,N

C(2

3)0

00

00

00

00

PMY

095

AR

N24

4AM

usca

dine

grap

e,N

C(2

3)0

00

00

00

00

PMY

111

AR

A32

4AV

iney

ard,

Aus

tral

ia(2

3)3

01

11

00

10

PMY

112

AR

A41

2AV

iney

ard,

Aus

tral

ia(2

3)3

01

11

00

11

PMY

113

YJM

336

Win

eye

ast

(18)

00

00

31

10

0PM

Y14

7Y

JM33

4N

atur

alfe

rmen

tati

on(1

8)1

11

01

01

00

PMY

129

YJM

210

Clin

ical

(18)

20

00

00

00

0PM

Y13

3Y

JM22

4D

isti

llery

(18)

21

53

01

30

0

8

Page 9: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

Supplemental Figures

Supplemental Figures S1-S11 illustrate the distribution of SNPs and heterozygous sites across the 16nuclear chromosomes of each of the yeast strains sequenced in this study. The blue curves indicatethe number of SNPs (both homozygous and heterozygous) per 10Kb, relative to the referencegenome. Red curves indicate the number of heterozygous SNPs per 10kb. Green triangles indicatethe location of the centromeres.

9

Page 10: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY070 SNPs and Hets

Figure S1: Strain EM93.

10

Page 11: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY127 SNPs and Hets

Figure S2: Strain YJM128.

11

Page 12: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY132 SNPs and Hets

Figure S3: Strain YJM223.

12

Page 13: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY141 SNPs and Hets

Figure S4: Strain YJM308.

13

Page 14: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY142 SNPs and Hets

Figure S5: Strain YJM309.

14

Page 15: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY144 SNPs and Hets

Figure S6: Strain YJM311.

15

Page 16: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY110 SNPs and Hets

Figure S7: Strain PMY110.

16

Page 17: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY112 SNPs and Hets

Figure S8: Strain PMY112.

17

Page 18: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY093 SNPs and Hets

Figure S9: Strain PMY093.

18

Page 19: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY131 SNPs and Hets

Figure S10: Strain YJM222.

19

Page 20: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY017 SNPs and Hets

Figure S11: Strain YPS670.

20

Page 21: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

200000 400000 600000 800000 1000000 1200000 1400000

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

100

PMY068 SNPs and Hets

Figure S12: Strain Σ1278b.

21

Page 22: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

0 20 40 60 80 100

020

4060

80100

Sporulation %

Pse

udoh

ypha

l %

Figure S13: The joint distribution of sporulation and pseduohyphal growth phenotypes for 71 yeaststrains. Sporulation and pseudohyphal growth were assayed 48 hours after transfer to the appro-priate medium. Sporulation was quantified as the percentage of sporulated cells (including tetrads,dyads, and triads) and pseudohyphal growth was quantified as the percentage of microcoloniesthat exhibited elongate cells and filamentous chains of cells.

22

Page 23: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

-11 -10 -9 -8 -7 -6

020

40

60

80

100

log2 RME1 expression

Sporu

lation %

C

020

40

60

80

100

S288c/S288c S288c/SK1 SK1/SK1

RME1 Genotype

Sporu

lation %

A

B

-12

-11

-10

-9-8

-7-6

S288c/S288c S288c/SK1 SK1/SK1

RME1 Genotype

log2 R

ME

1 e

xpre

ssio

n

Figure S14: (A) RME1 expression as a function of RME1 promoter genotype; (B) sporulation as afunction of RME1 promoter genotype; (C) sporulation efficiency vs. RME1 expression. Analysesbased on 30 S. cerevisiae strains. The allelic names follow that in (4).

23

Page 24: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

YJM311

YJM223

RM11-1a

PMY112

PMY110

EM93

Sigma 1278b

YJM308

YJM309

YJM128

YJM222

PMY093

YPS670

Figure S15: Neighbor-joining tree estimated from SNP data for strains included in this study plusthe strain RM11-1a (Saccharomyces cerevisiae RM11-1a Sequencing Project. Broad Institute ofHarvard and MIT).

24

Page 25: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

References

[1] G. Benson. Tandem repeats finder: a program to analyze dna sequences. Nucleic Acids Res, 27(2):573–580, Jan 1999.

[2] K. V. Clemons, J. H. McCusker, R. W. Davis, and D. A. Stevens. Comparative pathogenesis ofclinical and nonclinical isolates of saccharomyces cerevisiae. J Infect Dis, 169(4):859–867, Apr1994.

[3] Peter J A Cock, Christopher J Fields, Naohisa Goto, Michael L Heuer, and Peter M Rice.The sanger fastq file format for sequences with quality scores, and the solexa/illumina fastqvariants. Nucleic Acids Res, 38(6):1767–1771, Apr 2010. doi: 10.1093/nar/gkp1137. URLhttp://dx.doi.org/10.1093/nar/gkp1137.

[4] Adam M Deutschbauer and Ronald W Davis. Quantitative trait loci mapped to single-nucleotide resolution in yeast. Nat Genet, 37(12):1333–1340, Dec 2005. doi: 10.1038/ng1674.URL http://dx.doi.org/10.1038/ng1674.

[5] Stephanie Diezmann and Fred S Dietrich. Saccharomyces cerevisiae: population divergenceand resistance to oxidative stress in clinical, domesticated and wild isolates. PLoS One, 4(4):e5317, 2009. doi: 10.1371/journal.pone.0005317. URL http://dx.doi.org/10.1371/journal.pone.0005317.

[6] Robin D Dowell, Owen Ryan, An Jansen, Doris Cheung, Sudeep Agarwala, Timothy Danford,Douglas A Bernstein, P. Alexander Rolfe, Lawrence E Heisler, Brian Chin, Corey Nislow, GuriGiaever, Patrick C Phillips, Gerald R Fink, David K Gifford, and Charles Boone. Genotype tophenotype: a complex problem. Science, 328(5977):469, Apr 2010. doi: 10.1126/science.1189015.URL http://dx.doi.org/10.1126/science.1189015.

[7] Justin C Fay and Joseph A Benavides. Evidence for domesticated and wild populations ofsaccharomyces cerevisiae. PLoS Genet, 1(1):66–71, Jul 2005. doi: 10.1371/journal.pgen.0010005.URL http://dx.doi.org/10.1371/journal.pgen.0010005.

[8] Joshua A Granek and Paul M Magwene. Environmental and genetic determinants of colonymorphology in yeast. PLoS Genet, 6(1):e1000823, 2010. doi: 10.1371/journal.pgen.1000823.URL http://dx.doi.org/10.1371/journal.pgen.1000823.

[9] H. A. Kuehne. The genetic structure and biogeography of natural Saccharomyces populations. PhDthesis, University of Pennsylvania, 2005.

[10] Robert Lanfear, John J Welch, and Lindell Bromham. Watching the clock: studying variationin rates of molecular evolution between species. Trends Ecol Evol, 25(9):495–503, Sep 2010. doi:10.1016/j.tree.2010.06.007. URL http://dx.doi.org/10.1016/j.tree.2010.06.007.

[11] Heng Li and Richard Durbin. Fast and accurate short read alignment with burrows-wheelertransform. Bioinformatics, 25(14):1754–1760, Jul 2009. doi: 10.1093/bioinformatics/btp324. URLhttp://dx.doi.org/10.1093/bioinformatics/btp324.

[12] Heng Li, Jue Ruan, and Richard Durbin. Mapping short dna sequencing reads and callingvariants using mapping quality scores. Genome Res, 18(11):1851–1858, Nov 2008. doi: 10.1101/gr.078212.108. URL http://dx.doi.org/10.1101/gr.078212.108.

[13] Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth,Goncalo Abecasis, Richard Durbin, and 1000 Genome Project Data Processing Subgroup. Thesequence alignment/map format and samtools. Bioinformatics, 25(16):2078–2079, Aug 2009.

[14] Gianni Liti, David M Carter, Alan M Moses, Jonas Warringer, Leopold Parts, Stephen AJames, Robert P Davey, Ian N Roberts, Austin Burt, Vassiliki Koufopanou, Isheng J Tsai,Casey M Bergman, Douda Bensasson, Michael J T O’Kelly, Alexander van Oudenaarden,

25

Page 26: Supporting Materials - PNAS · PMY068 S1278b (MLY61) Lab 23.2 (15) SRX030135 1.2 Notes on sequenced strains For the purposes of facilitating comparison to previous and future studies,

David B H Barton, Elizabeth Bailes, Alex N Nguyen, Matthew Jones, Michael A Quail, IanGoodhead, Sarah Sims, Frances Smith, Anders Blomberg, Richard Durbin, and Edward JLouis. Population genomics of domestic and wild yeasts. Nature, 458(7236):337–341, Mar 2009.doi: 10.1038/nature07743. URL http://dx.doi.org/10.1038/nature07743.

[15] M. C. Lorenz and J. Heitman. Yeast pseudohyphal growth is regulated by gpa2, a g proteinalpha homolog. EMBO J, 16(23):7008–7018, Dec 1997. doi: 10.1093/emboj/16.23.7008. URLhttp://dx.doi.org/10.1093/emboj/16.23.7008.

[16] Mohammad A Mandegar and Sarah P Otto. Mitotic recombination counteracts the benefits ofgenetic segregation. Proc Biol Sci, 274(1615):1301–1307, May 2007. doi: 10.1098/rspb.2007.0056.URL http://dx.doi.org/10.1098/rspb.2007.0056.

[17] J. H. McCusker, K. V. Clemons, D. A. Stevens, and R. W. Davis. Saccharomyces cerevisiaevirulence phenotype as determined with cd-1 mice is associated with the ability to grow at 42degrees c and form pseudohyphae. Infect Immun, 62(12):5447–5455, Dec 1994.

[18] J. H. McCusker, K. V. Clemons, D. A. Stevens, and R. W. Davis. Genetic characterization ofpathogenic saccharomyces cerevisiae isolates. Genetics, 136(4):1261–1269, Apr 1994.

[19] Michael A McMurray and Daniel E Gottschling. An age-induced switch to a hyper-recombinational state. Science, 301(5641):1908–1911, Sep 2003. doi: 10.1126/science.1087706.URL http://dx.doi.org/10.1126/science.1087706.

[20] R. K. Mortimer and J. R. Johnston. Genealogy of principal strains of the yeast genetic stockcenter. Genetics, 113(1):35–43, May 1986.

[21] Helen A Murphy, Heidi A Kuehne, Chantal A Francis, and Paul D Sniegowski. Mate choiceassays and mating propensity differences in natural yeast populations. Biol Lett, 2(4):553–556,Dec 2006. doi: 10.1098/rsbl.2006.0534. URL http://dx.doi.org/10.1098/rsbl.2006.0534.

[22] M. Nei. F-statistics and analysis of gene diversity in subdivided populations. Ann Hum Genet,41(2):225–233, Oct 1977.

[23] A. L. Rouse. Evolution of a yeast transcriptional regulatory network: Genetic, phenotypic and fitnessvariation. PhD thesis, Duke University, 2007.

[24] Joseph Schacherer, Joshua A Shapiro, Douglas M Ruderfer, and Leonid Kruglyak. Com-prehensive polymorphism survey elucidates population structure of saccharomyces cere-visiae. Nature, 458(7236):342–345, Mar 2009. doi: 10.1038/nature07670. URL http://dx.doi.org/10.1038/nature07670.

[25] D. W. Scott. Averaged shifted histogram. Wiley Interdisciplinary Reviews: Computational Statistics,2:160–164, 2010. doi: 10.1002/wics.54.

[26] David W. Scott. Averaged shifted histograms: Effective nonparametric density estimators inseveral dimensions. The Annals of Statistics, 13(3):pp. 1024–1040, 1985. ISSN 00905364. URLhttp://www.jstor.org/stable/2241123.

[27] F. Sebastiani, F. Pinzauti, E. Casalone, I. Rosi, G. Fia, M. Polsinelli, and C. Barberio. Biodiversityof saccharomyces cerevisiaestrains isolated from san-giovese grapes of chianti area. Annals ofMicrobiology, 54:415–426, 2004. URL http://www.annmicro.unimi.it/full/54/sebastiani_54_415-426.pdf.

[28] Wu Wei, John H McCusker, Richard W Hyman, Ted Jones, Ye Ning, Zhiwei Cao, Zhenglong Gu,Dan Bruno, Molly Miranda, Michelle Nguyen, Julie Wilhelmy, Caridad Komp, Raquel Tamse,Xiaojing Wang, Peilin Jia, Philippe Luedi, Peter J Oefner, Lior David, Fred S Dietrich, Yixue Li,Ronald W Davis, and Lars M Steinmetz. Genome sequencing and comparative analysis ofsaccharomyces cerevisiae strain yjm789. Proc Natl Acad Sci U S A, 104(31):12825–12830, Jul2007. doi: 10.1073/pnas.0701291104. URL http://dx.doi.org/10.1073/pnas.0701291104.

26