Stgpwisg sglgetion on homgologous PRR ggngs ... -...

28
ARTICLES https://doi.org/10.1038/s41588-020-0604-7 1 Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou, China. 2 The Innovative Academy of Seed Design, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, China. 3 State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China. 4 University of Chinese Academy of Sciences, Beijing, China. 5 Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou, China. 6 Anhui Academy of Agricultural Sciences, Hefei, China. 7 State Key Laboratory of Agricultural Microbiology, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China. 8 Root Biology Center, College of Resources and Environment, Fujian Agriculture and Forestry University, Fuzhou, China. 9 National Center for Soybean Improvement, National Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China. 10 Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China. 11 Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China. 12 Key Laboratory of Plant Stress Biology, School of Life Sciences, Henan University, Kaifeng, China. 13 School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China. 14 School of Natural Sciences, University of Tasmania, Hobart, Tasmania, Australia. 15 These authors contributed equally: Sijia Lu, Lidong Dong, Chao Fang, Shulin Liu, Lingping Kong, Qun Cheng, Liyu Chen. e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected] M odern crops are shaped by a long history of conscious and unconscious human selection. Understanding the associ- ated genetic and physiological changes provides a valu- able window into the history of agriculture, and important insights that are vital to address ongoing demand for improved crop yield and quality. Crops are distinguished from their wild progenitors by changes collectively known as the domestication syndrome 15 . These changes typically include loss of seed dormancy and disper- sal mechanisms, along with reduced branching, increased fruit or seed size, changes in photoperiod sensitivity and early, more uni- form flowering and maturation 4,5 . Several of these are considered to have arisen and become fixed at an early stage in the domesti- cation process, facilitating the initial intensification of cultivation through increased efficiency, ease and synchrony of harvest 4,5 . In contrast, other traits that are typically variable across domesticated germplasm, such as adaptation to specific climates or latitudes, seed composition, fruit pigmentation and fruit morphology, are gener- ally considered to be diversification traits 6 . Soybean (Glycine max (L.) Merr.) is one of the most economi- cally important oil and protein crops, and provides >25% of the world’s protein for food and animal feed 7 . Cultivated soybean was domesticated from wild G. soja (Sieb. & Zucc.) 8,000 years ago 8,9 in temperate regions of China between 32° N and 40° N, where its genomic diversity is highest 10 . The development of early flowering and maturity are widely recognized as a key factor in soybean crop evolution, and a growing number of gene variants contributing to these changes have been identified 11 . However, none of these are known to have been central in the initial stages of domestication and diversification 11 , and we have only a limited understanding of the sequence of events leading to their appearance and incorporation. In this study we investigated domestication-related changes in soybean phenology, combining whole-genome resequencing, genome-wide association studies (GWAS) and positional cloning of quantitative trait loci (QTL) to identify two genes, Tof11 and Tof12, that contributed to changes in flowering and maturity early in soybean crop evolution. Tof11 and Tof12 encode homeologous Stepwise selection on homeologous PRR genes controlling flowering and maturity during soybean domestication Sijia Lu  1,2,15 , Lidong Dong  1,15 , Chao Fang  1,15 , Shulin Liu  3,4,15 , Lingping Kong  1,15 , Qun Cheng  1,15 , Liyu Chen 1,15 , Tong Su 2,4 , Haiyang Nan 1 , Dan Zhang 5 , Lei Zhang 6 , Zhijuan Wang 7 , Yongqing Yang 8 , Deyue Yu 9 , Xiaolei Liu 10 , Qingyong Yang  11 , Xiaoya Lin 1 , Yang Tang 1 , Xiaohui Zhao 1 , Xinquan Yang 1 , Changen Tian  1 , Qiguang Xie 12 , Xia Li 7 , Xiaohui Yuan  13 , Zhixi Tian  3,4 , Baohui Liu  1,2 , James L. Weller  14 and Fanjiang Kong  1,2,4 Adaptive changes in plant phenology are often considered to be a feature of the so-called ‘domestication syndrome’ that dis- tinguishes modern crops from their wild progenitors, but little detailed evidence supports this idea. In soybean, a major legume crop, flowering time variation is well characterized within domesticated germplasm and is critical for modern production, but its importance during domestication is unclear. Here, we identify sequential contributions of two homeologous pseudo-response- regulator genes, Tof12 and Tof11, to ancient flowering time adaptation, and demonstrate that they act via LHY homologs to promote expression of the legume-specific E1 gene and delay flowering under long photoperiods. We show that Tof12-dependent acceleration of maturity accompanied a reduction in dormancy and seed dispersal during soybean domestication, possibly pre- disposing the incipient crop to latitudinal expansion. Better understanding of this early phase of crop evolution will help to identify functional variation lost during domestication and exploit its potential for future crop improvement. NATURE GENETICS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics 428

Transcript of Stgpwisg sglgetion on homgologous PRR ggngs ... -...

Page 1: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articleshttps://doi.org/10.1038/s41588-020-0604-7

1Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou, China. 2The Innovative Academy of Seed Design, Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin, China. 3State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China. 4University of Chinese Academy of Sciences, Beijing, China. 5Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou, China. 6Anhui Academy of Agricultural Sciences, Hefei, China. 7State Key Laboratory of Agricultural Microbiology, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China. 8Root Biology Center, College of Resources and Environment, Fujian Agriculture and Forestry University, Fuzhou, China. 9National Center for Soybean Improvement, National Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China. 10Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, China. 11Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China. 12Key Laboratory of Plant Stress Biology, School of Life Sciences, Henan University, Kaifeng, China. 13School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China. 14School of Natural Sciences, University of Tasmania, Hobart, Tasmania, Australia. 15These authors contributed equally: Sijia Lu, Lidong Dong, Chao Fang, Shulin Liu, Lingping Kong, Qun Cheng, Liyu Chen. ✉e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]

Modern crops are shaped by a long history of conscious and unconscious human selection. Understanding the associ-ated genetic and physiological changes provides a valu-

able window into the history of agriculture, and important insights that are vital to address ongoing demand for improved crop yield and quality. Crops are distinguished from their wild progenitors by changes collectively known as the domestication syndrome1–5. These changes typically include loss of seed dormancy and disper-sal mechanisms, along with reduced branching, increased fruit or seed size, changes in photoperiod sensitivity and early, more uni-form flowering and maturation4,5. Several of these are considered to have arisen and become fixed at an early stage in the domesti-cation process, facilitating the initial intensification of cultivation through increased efficiency, ease and synchrony of harvest4,5. In contrast, other traits that are typically variable across domesticated germplasm, such as adaptation to specific climates or latitudes, seed composition, fruit pigmentation and fruit morphology, are gener-ally considered to be diversification traits6.

Soybean (Glycine max (L.) Merr.) is one of the most economi-cally important oil and protein crops, and provides >25% of the world’s protein for food and animal feed7. Cultivated soybean was domesticated from wild G. soja (Sieb. & Zucc.) 8,000 years ago8,9 in temperate regions of China between 32° N and 40° N, where its genomic diversity is highest10. The development of early flowering and maturity are widely recognized as a key factor in soybean crop evolution, and a growing number of gene variants contributing to these changes have been identified11. However, none of these are known to have been central in the initial stages of domestication and diversification11, and we have only a limited understanding of the sequence of events leading to their appearance and incorporation.

In this study we investigated domestication-related changes in soybean phenology, combining whole-genome resequencing, genome-wide association studies (GWAS) and positional cloning of quantitative trait loci (QTL) to identify two genes, Tof11 and Tof12, that contributed to changes in flowering and maturity early in soybean crop evolution. Tof11 and Tof12 encode homeologous

Stepwise selection on homeologous PRR genes controlling flowering and maturity during soybean domesticationSijia Lu   1,2,15, Lidong Dong   1,15, Chao Fang   1,15, Shulin Liu   3,4,15, Lingping Kong   1,15, Qun Cheng   1,15, Liyu Chen1,15, Tong Su2,4, Haiyang Nan1, Dan Zhang5, Lei Zhang6, Zhijuan Wang7, Yongqing Yang8, Deyue Yu9, Xiaolei Liu10, Qingyong Yang   11, Xiaoya Lin1, Yang Tang1, Xiaohui Zhao1, Xinquan Yang1, Changen Tian   1, Qiguang Xie12, Xia Li7, Xiaohui Yuan   13 ✉, Zhixi Tian   3,4 ✉, Baohui Liu   1,2 ✉, James L. Weller   14 ✉ and Fanjiang Kong   1,2,4 ✉

Adaptive changes in plant phenology are often considered to be a feature of the so-called ‘domestication syndrome’ that dis-tinguishes modern crops from their wild progenitors, but little detailed evidence supports this idea. In soybean, a major legume crop, flowering time variation is well characterized within domesticated germplasm and is critical for modern production, but its importance during domestication is unclear. Here, we identify sequential contributions of two homeologous pseudo-response-regulator genes, Tof12 and Tof11, to ancient flowering time adaptation, and demonstrate that they act via LHY homologs to promote expression of the legume-specific E1 gene and delay flowering under long photoperiods. We show that Tof12-dependent acceleration of maturity accompanied a reduction in dormancy and seed dispersal during soybean domestication, possibly pre-disposing the incipient crop to latitudinal expansion. Better understanding of this early phase of crop evolution will help to identify functional variation lost during domestication and exploit its potential for future crop improvement.

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics428

Page 2: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs

pseudo-response-regulator (PRR) proteins, and we demonstrate that their effects on flowering are genetically dependent on E1, a key suppressor in soybean photoperiodic flowering, and are medi-ated through downregulation of LHY genes. We show that accumu-lated impairments in Tof11 and Tof12 function are likely to have permitted an earlier harvest and improved adaptation to the limited summer growth period at higher latitudes during soybean domes-tication.

resultsResequencing of soybean accessions. Whole-genome resequenc-ing was conducted for a panel of 424 soybean accessions, mainly collected from across the Huanghuai region and north region of China, the inferred soybean domestication center10 (Fig. 1a and Supplementary Table 1). Analysis of relationships among these accessions using phylogenetic and principal component analy-sis (PCA) of the whole-genome SNP marker set (Supplementary Note) identified three groups (wild soybeans, landraces and improved cultivars) (Fig. 1b,c and Supplementary Table 1), consis-tent with findings from previous reports12. In all three groups, link-age disequilibrium (r2) decreased with physical distance between SNPs (Fig. 1d).

Field evaluations in five locations spanning 23° N (Guangzhou) to 45° N (Harbin) revealed significantly earlier flowering and maturity in domesticated compared to wild accessions in all loca-tions (Extended Data Fig. 1), with the earliest phenology seen in improved cultivars. This pattern suggests that selection for early flowering and maturity was a feature of soybean domestication and continued during diversification and cultivar improvement.

Identification of Tof11 and Tof12 loci. GWAS analysis identified three significant association loci (P < 10−8), on chromosomes 6, 11 and 12 (Fig. 1e,g and Supplementary Note). A similar analysis of phenology data from two field seasons for an 809-accession panel that had been resequenced previously13 also identified sig-nificant associations in similar regions of chromosomes 11 and 12 (Fig. 1f,h and Supplementary Note). The detection of loci in the same regions of chromosomes 11 and 12 in both panels and across different environments indicates that these loci make a robust and important contribution to control of flowering time (Fig. 1e,f). Other significant (P < 10−5) associations found in this second panel, on chromosomes 5, 7, 10, 16 and 19 (Fig. 1f,h), are likely to represent known (for example, E2, E3, E4 and FT2a)11 and novel loci (Supplementary Tables 2 and 3), but are not explored further here.

Previous studies of wild × domesticated recombinant inbred line (RIL) populations14–17 identified growth period QTLs in regions around the GWAS peaks on chromosomes 11 and 12. We also con-firmed flowering time QTLs in these locations in an F2 population between domesticated accessions H3 and Harosoy (Supplementary Fig. 1). These analyses across multiple biparental and natural populations indicate that variation at these two loci (subsequently referred to as Time of Flowering 11 (Tof11) and Tof12) is widespread and made a substantial contribution to flowering time adaptation during soybean crop evolution.

Tof11 and Tof12 encode PSEUDO-RESPONSE-REGULATOR proteins. We next refined Tof11 and Tof12 genomic locations in F6 heterozygous inbred populations (n = 1,737 and 2,859, respec-tively), locating Tof11 within a 267-kilobase (kb) region that harbors 20 genes in the Zhonghuang 13 (ZH 13) reference genome18, and Tof12 within a 155-kb region containing 10 genes (Extended Data Fig. 2a,b and Supplementary Tables 4 and 5). These partially over-lapping homeologous regions both included a paralog of PSEUDO-RESPONSE-REGULATOR 3 (PRR3); SoyZH13_11G141200 (PRR3a) and SoyZH13_12G067700 (PRR3b) (Supplementary Tables

4 and 5 and Supplementary Fig. 2). Both genes encode full-length proteins in H3, but in Harosoy they carry frameshift and nonsense mutations, respectively, that predict protein truncation and loss of function (Extended Data Fig. 2b,d). These results are consistent with the recent identification of these PRR3 paralogs as candi-dates for the growth period QTLs Gp11 and Gp12 (ref. 15). They are also consistent with our observation that the dominant H3 alleles of Tof11 and Tof12 conferred late flowering and maturity relative to Harosoy, with recessive alleles at both loci (Extended Data Fig. 2a,c). Together, these results indicate that PRR3a and PRR3b are the genes responsible for the effects of the Tof11 and Tof12 loci. This conclusion was further tested and confirmed by transgenic comple-mentation (Supplementary Note).

To examine the genetic relationship between Tof11 and Tof12, we generated a Tof11/Tof12 near-isogenic line (NIL) set from the H3 × Harosoy cross (Fig. 2a), on an e1as E2 genetic background. NILs possessing a dominant allele at either Tof11 or Tof12 flow-ered later than double-recessive tof11-1 tof12-1 lines (Fig. 2a), while plants carrying both dominant alleles were even later flower-ing (Fig. 2a). These results suggested that Tof11 and Tof12 function independently, but to some extent redundantly, to control flower-ing time and maturity, which is consistent with findings from a pre-vious report15.

Tof11 and Tof12 are genetically dependent on E1. The legume-specific E1 gene is the key suppressor of soybean photoperiod flowering and maturity under long-day (LD) conditions11,19. In our earlier analysis of a wild × domesticated population17, QTLs corre-sponding to Tof11 and Tof12 were detected only in RIL subpopula-tions carrying the functional E1 allele, but not in subpopulations carrying the null allele e1nl (ref. 17), implying that Tof11 and Tof12 might depend genetically on E1. We determined the genotype for Tof11 and Tof12 in the entire population to evaluate their interac-tion with E1 in LD conditions (Fig. 2b,c). Both tof11-1 and tof12-1 alleles conferred early flowering in an E1 background, but had no significant effect in the e1nl null background. This epistatic interaction implies that Tof11 and Tof12 act through E1 to delay flowering in LD conditions. Consistent with this result, analysis of NIL sets for E1/Tof11 or E1/Tof12 in the recessive e2 background derived from H3 × Harosoy showed that tof11-1 or tof12-1 effects were weaker in the hypomorphic e1as background19 than in the E1 background (Fig. 2d,e).

The central role of E1 in the photoperiod regulation of soybean flowering largely reflects its repression of two key FT homologs, FT2a and FT5a, under noninductive LD conditions19–21. We evalu-ated the effect of Tof11 and Tof12 on transcriptional regulation of E1, FT2a and FT5a under LD (16 h light/8 h dark) using the transformants of Tof11 and Tof12 in DN50 and in the single-locus NILs described above. Overexpression of either gene resulted in increased E1 expression and reduced expression of FT2a and FT5a (Fig. 2h–m), consistent with the observed epistatic interaction (Fig. 2b–e). A similar result was obtained in the NILs (Supplementary Fig. 4c–h), showing that functional alleles of Tof11 or Tof12, rela-tive to the respective mutant alleles, increased E1 expression and decreased FT2a and FT5a expression.

Tof11 and Tof12 enhance E1 expression via repression of LHY. The two PHYTOCHROME A (PHYA) homologs E3 and E4 are also known to have key roles in soybean photoperiod response, acting to delay flowering under noninductive LD conditions through positive regulation of E1 (ref. 11). Expression analysis in previously described E3/E4 NILs20 showed that both Tof11 and Tof12 genes were posi-tively regulated by both E3 and E4 (Supplementary Fig. 5), indicat-ing that the effect of Tof11 and Tof12 on flowering time and maturity under LD conditions is at least partly under the control of these two photoreceptors.

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics 429

Page 3: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcs

a b

c d

e

f

Tof11 Tof12

WildLandraceCultivar

WildLandraceCultivar

WildLandraceCultivar

g hy = 0.908x + 0.1989

R2 = 0.8329

y = 0.9832x – 4.4138

R2 = 0.8470

0 1 2 3 4 5 6 7

Expected –log10(P )

Expected –log10(P )

Obs

erve

d –l

og10

(P)

Obs

erve

d –l

og10

(P)

0

4

8

12

16

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7

0

4

8

12

16

–log

10(P

)–l

og10

(P)

Chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0

2

4

6

8

12

10

Chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0.10

0.05

0

–0.05

–0.10

–0.15

–0.20–0.15 –0.10 –0.05 0 0.05

PC1

PC

2

0 50 100 150 200 250 300

Distance (kb)

r2

0 30 60 90 120 1500

30

60

90

120

150

0

30

60

90

120

150

0 30 60 90 120 150

Flo

wer

ing

time

(DA

E, B

eijin

g 20

14)

Flowering time (DAE, Hefei 2018)

Flo

wer

ing

time

(DA

E, Z

heng

zhou

201

8)

Flowering time (DAE, Beijing 2013)

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Landrace

Wild

Cultivar

Fig. 1 | Characterization and GWaS for flowering time in soybean diversity panels. a, Geographic origins of a newly assembled 424-accession diversity panel. The map was drawn using ArcGIS v.10.3 software for desktop (https://desktop.arcgis.com/en/). b, Phylogenetic structure of the diversity panel based on analysis of genome-wide SNPs. The apparent misclassification reflects an identification error in the passport data. c, PCA of genetic diversity within the panel. d, Linkage disequilibrium decay for the three subgroups within the diversity panel. Linkage disequilibrium decay is determined by squared correlations of allele frequencies (r2) against the distance between polymorphic sites in wild soybeans (gray), landraces (green) and cultivars (blue). e, GWAS scan for flowering time (R1 stage) using mean data from the 424-accession panel grown in two locations (Zhengzhou and Hefei, China) in 2018. f, GWAS scan of 809 accessions from a previously described13 diversity panel using mean flowering time (R1 stage) over the 2013 and 2014 field seasons in Beijing, China. g, Regression correlations of flowering time (DAE, days after emergence) used in e (Zhengzhou 2018 and Hefei 2018). h, Regression correlations of flowering time used in f (Beijing 2013 and Beijing 2014). Strong correlations in g and h were supportive of the GWAS results from e and f in which the phenotypic mean values were used for analysis.

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics430

Page 4: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs

In Arabidopsis, PRR proteins act as transcriptional repressors in the circadian clock output pathways, and associate with the CCT-binding motif CACGTG in promoters of two key circadian clock genes, LATE ELONGATED HYPOCOTYL (LHY) and CIRCADIAN CLOCK-ASSOCIATED 1 (CCA1), to reduce their expression22. The ZH 13 soybean reference genome contains four LHY/CCA1 homologs (termed LHY1a, LHY1b, LHY2a and LHY2b), which all feature CCT promoter motifs (Fig. 3a) and showed lower transcript levels in Tof11 and Tof12 complementation lines (Extended Data Fig. 3). Physical association of Tof11 and Tof12 with LHY promoters near the CCT motif was demonstrated using chromatin immuno-precipitation (ChIP)–PCR assays and electrophoretic mobility shift assays (EMSA) (Fig. 3a,b and Supplementary Note), and the func-tional significance of this association was confirmed in a tobacco transient assay system using wild-type and mutant versions of both proteins (Extended Data Fig. 4 and Supplementary Note).

To test the functional significance of this interaction, we gen-erated a lhy1a lhy1b lhy2a lhy2b quadruple knockout mutant in the Harosoy background by using a CRISPR–Cas9 approach (Supplementary Fig. 6). Complete loss of LHY function in the qua-druple mutant significantly delayed flowering under LD conditions (Fig. 3c,d), relieved the transcriptional suppression of E1 (Fig. 3e)

and reduced the expression of FT2a and FT5a (Supplementary Fig. 7). A possible direct influence of LHY proteins on E1 promoter activity was also tested by EMSA and in-tobacco transient assays, revealing that all four LHY proteins could directly bind to the AATATC motif in the E1 promoter and that LHY1a could reduce expression from the E1 promoter (Fig. 3f–h). These results indicate that LHY proteins promote early flowering and maturity by associ-ating with the E1 promoter to suppress E1 expression.

Taken together these results support a model (Fig. 4) in which the phyA photoreceptors E3 and E4 promote Tof11 and Tof12 expression, and the Tof11 and Tof12 proteins then physically asso-ciate with the promoters of LHY genes to suppress their expression. LHY proteins bind to the promoter of E1 to suppress its transcrip-tion, and in turn release its transcriptional suppression of the two key soybean FT homologs, ultimately resulting in the promotion of flowering and earlier maturity (Fig. 4).

Sequential selection of Tof11 and Tof12 during evolution. To gain initial insight into the molecular history of Tof11 and Tof12, we compared nucleotide diversity (FST and π) across the 2-megabse (Mb) genomic regions spanning Tof11 and Tof12 genes in the two diversity panels described above (Fig. 5a,b). We identified strong

0 4 8 12 16 20 240

0.2

0.4

0.6

0.8

1.0

Tof

11/tu

blin

0 4 8 12 16 20 240

0.01

0.02

0.03

E1/

tubl

in

0 4 8 12 16 20 240

0.001

0.002

0.003

0.004

FT

2a/tu

blin

DN50 (tof11-1 tof12-1)pTof11:Tof11(TC#4)

0 4 8 12 16 20 240

0.005

0.010

0.015

0.020

0.025

FT

5a/tu

blin

0 4 8 12 16 20 240

0.2

0.4

0.6

0.8

Tof

12/tu

blin

0 4 8 12 16 20 240

0.01

0.02

0.03

E1/

tubl

in

0 4 8 12 16 20 240

0.001

0.002

0.003

0.004F

T2a

/tubl

in

DN50 (tof11-1 tof12-1)pTof12:Tof12(TC#7)

0 4 8 12 16 20 240

0.005

0.010

0.015

0.020

0.025

FT

5a/tu

blin

f h j l

g i k m

0

50

100

150

Flo

wer

ing

time

(DA

E)

c

a a

b c

15 18 31 2712 21 39 19

0

50

100

150

Flo

wer

ing

time

(DA

E)

b

a a

b c

a

0

10

20

30

40

50

Flo

wer

ing

time

(DA

E)

a

b cb

Flo

wer

ing

time

(DA

E)

0

10

20

30

40

50d

a

b

dc

Flo

wer

ing

time

(DA

E)

0

10

20

30

40

50e

a

dc

b

NIL- T

of11

Tof1

2

NIL- T

of11

tof1

2-1

NIL- t

of11

-1 To

f12

NIL- t

of11

-1 to

f12-

1

NIL- E

1 Tof

11-1

NIL- E

1 to

f11-

1

NIL- e

1as To

f11-

1

NIL- e

1as To

f11-

1

NIL- E

1 Tof

12-1

NIL- E

1 to

f12-

1

NIL- e

1as To

f12-

1

NIL- e

1as To

f12-

1

RIL- e

1nl to

f11-

1

RIL- e

1nl To

f11-

1

RIL- E

1 to

f11-

1

RIL- E

1 Tof

11-1

RIL- e

1nl to

f12-

1

RIL- e

1nl To

f12-

1

RIL- E

1 to

f12-

1

RIL- E

1 Tof

12-1

Fig. 2 | Genetic and regulatory interactions of Tof11, Tof12 and E1. a, Flowering time under LD (14 h light/10 h dark) of NILs (under the e1as E2 background) possessing different allelic combinations at Tof11 and Tof12. All data are given as mean ± s.e.m. (n = 10 plants); the value of each plant is represented by a dot. b,c, Flowering time of RILs possessing different allelic combinations at E1 and Tof11 (b) or E1 and Tof12 (c). e1nl, e1 null functional allele under the LD (16 h light/8 h dark). The lower and upper box edges correspond to the first and third quartiles (the 25th and 75th percentiles); the horizontal line indicates the median value; the lower and upper whiskers correspond to the smallest value at most at 1.5× interquartile range, and the largest value no further than 1.5× interquartile range. d,e, Flowering time under LD (14 h light/10 h dark) of NILs (under the recessive e2 background) possessing different allelic combinations at E1 and Tof11 (d) or E1 and Tof12 (e). e1as, partially functional e1 allele. All data are given as mean ± s.e.m. (n = 10 plants); the value of each plant is represented by a dot. The presence of the same lowercase letter above the histogram bars in a–e denotes nonsignificant differences across the two panels (P > 0.05). A Student’s t-test was used to generate the P values. f–m, Diurnal variation in transcript levels of Tof11 (f), E1 (h), FT2a (j) and FT5a (l) in DN50 and Tof11 (TC#4) transformants, and Tof12 (g), E1 (i), FT2a (k) and FT5a (m) in DN50 and Tof12 (TC#7) transformants under LD (16 h light/8 h dark). All data are given as mean ± s.e.m. (n = 5 plants).

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics 431

Page 5: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcs

Coding region UTR CACGTG

LHY2b

LHY1b

LHY1a

a

c d

e

FLAG

LHY1a

Reporter

E1 promoter LUC

Internal control

GUS

GFP

NOS

35S promoter

35S promoter

35S promoter

35S promoter

FLAG NOS

NOS

NOS

NOS

Effecterf

g

0 4 8 12 16 20 24E

1/tu

blin

0

0.01

0.02

0.03Harosoylhy1a lhy1b lhy2a lhy2b

– +

LHY1a-HIS

LHY1b-HIS

LHY2a-HIS

Unlabeled pE1 (50×)

Labeled pE1

LHY2b-HIS

+ – + – + –

+ + + + + + + +

+ –+ – – – – –

– – + + – – – –

– – – – + + – –

– – – – – – + +

Complex

Free probes

E1pE1

Coding region UTR AATATC

P5 P6

LHY2aP8P7

P4P3

P1 P2

1

2

3

4

P1 P2 P3 P4 P5 P6 P7 P8 EF1b0

5

10

15W82 Tof12:HA-OE2

Rel

ativ

e en

tichm

ent

(per

inpu

t)

b

h

Flo

wer

ing

time

(DA

E)

P = 9.77 × 10–7

0

20

Haros

oy

Haros

oy

lhy1a

lhy1

b lhy

2a lh

y2b

lhy1a

lhy1

b lhy

2a lh

y2b

40

60

80

0 0.2 0.4 0.6 0.8 1.0 1.2

P =

0.0015

Relative LUC/GUS

2 + 4

4

1 + 4

3 + 4

Fig. 3 | Tof11 and Tof12 promote E1 transcription through direct downregulation of LHY genes. a, Location of CACGTG motifs in the promoters of the four soybean LHY homeologs and amplicons targeted in the ChIP–PCR assay. LHY1a, SoyZH13_16G015500; LHY1b, SoyZH13_07G046900; LHY2a, SoyZH13_19G245000; LHY2b, SoyZH13_03G240700. b, Results of the ChIP–PCR assay on LHY amplicons in wild type (W82) and Tof12-HA (OE2) overexpression lines fused with HA tags. A monoclonal HA antibody was used for ChIP. Results were from three independent replications and the value of each replication is represented by a dot. The presence of the same lowercase letter above the histogram bars denotes nonsignificant differences across the two panels (P > 0.05). A Student’s t-test was used to generate the P values. c–e, Characterization of a quadruple lhy1a lhy1b lhy2a lhy2b mutant in a cultivar Harosoy background. c, Plant phenotypes at 20 d after emergence; scale bar, 10 cm. d, Flowering time under LD (16 h light/8 h dark) conditions (mean ± s.e.m. for n = 10 plants); the value of each plant is represented by a dot. A Student’s t-test was used to generate the P values. e, Diurnal expression of the E1 gene (mean ± s.e.m. for n = 5 plants, 20 d after emergence). f, Constructs used for the transient transfection assay. NOS, nopaline synthase terminator; LUC, luciferase; GUS, β-glucuronidase. g, Luciferase activity under control of E1 promoter showing the results from three independent replications; the value of each replication is represented by a dot. A Student’s t-test was used to generate the P values. h, Location of the AATATC LHY protein-binding motif in the E1 gene promoter, and EMSA results demonstrating the binding of LHY proteins to this motif. The original gel blot image of the EMSA is available as Source Data Fig. 1.

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics432

Page 6: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs

evidence of selection in a region of 144 kb around the Tof12 gene, comparable to that seen for domestication genes G, Shat1-5 and Hs1-1 (refs. 23–25; Fig. 5b, Extended Data Fig. 5 and Supplementary Fig. 8). These results suggested that the Tof12 region might have conferred some advantage during early cultivation, possibly includ-ing earlier maturity caused by loss of Tof12 function.

Analysis of variation in the Tof11 and Tof12 coding sequence across the two diversity panels defined 25 Tof12 haplotypes, includ-ing four distinct loss-of-function alleles (Extended Data Fig. 6 and Supplementary Fig. 9). Surprisingly, the most common haplotype H1 (tof12-1) was present in all improved cultivars (532/532) and most landraces (406/450), and originated from the wild haplotype 25 by a single SNP (Supplementary Table 6 and Extended Data Fig. 6c). This result was consistent with the genomic selection signal detected around Tof12 and suggested that tof12-1 was under strict artificial selection; strongly favored in landraces and subsequently widely incorporated at a very early stage of modern soybean breed-ing (Fig. 5c). Three other tof12 mutant haplotypes appeared at low frequency in wild lineages and landraces (Supplementary Table 6 and Extended Data Fig. 6c). Together, these results indicate that the tof12-1 mutation has been central in the origin of all modern soy-bean cultivars.

For Tof11, 11 putative loss-of-function mutations were identified (Extended Data Fig. 7 and Supplementary Fig. 10), and among these the tof11-1 haplotype H1 was the most abundant, both in landra-ces (192/520) and in improved cultivars (507/552) (Supplementary Table 6 and Extended Data Fig. 7c). Loss of Tof11 function appears to have arisen independently multiple times within three distinct lineages of landraces, but only one mutation, tof11-1, became widespread in landraces and was subsequently also strongly selected at an early stage of modern soybean breeding (Fig. 5c). Interestingly, almost no accessions carried the tof11-1 mutation alone (6/463) (Supplementary Table 6 and Fig. 5c), strongly imply-ing that it arose in a tof12-1 genetic background. Molecular dat-ing analysis also suggested that tof11-1 mutations occurred around 8,000 years ago, approximately 2,500 years after tof12-1 mutations

(Fig. 5e). These observations suggest that tof12-1 and tof11-1 muta-tions were sequentially incorporated within the cultivated soybean gene pool, and together were likely to have conferred a shift toward an earlier phenology.

Comparison of soybean domesticated genes. Previous stud-ies have identified causal genes for several classic domestication-related traits in soybean, including seed dispersal and dormancy26, but of these only three (Shat1-5, Hs1-1 and G) are considered to be domestication genes, on the basis of evidence of strong selection on key variants in comparisons of wild accessions to landraces23–25. We therefore compared the changes in their allele frequencies across the three groups of accessions in our combined dataset. In contrast to Tof11 and Tof12, the domesticated alleles for all three genes were present to some extent in our wild material, with frequencies rang-ing from 4% for g to 18% for Shat1-5G (Fig. 5d). However, wild alleles were also substantially present in landraces, with frequencies from 4% (Shat1-5) to 24% (G), similar to the frequency of 10% observed for Tof12 (Fig. 5c). This is also supported by similar negative values of Tajima’s D statistic for Tof12, G and Hs1-1 in domesticated soy-bean (Supplementary Table 7), indicating that Tof12 (and in par-ticular the tof12-1 allele) experienced a generally similar intensity of selection to the Shat1-5 and Hs1-1 and G loci during the transition from wild soybean to landrace. This implies that some phenotypic consequence of tof12-1 may have been selected in parallel with the reduced dormancy and dispersal conferred by the Shat1-5G, hs1-1 and g alleles.

Natural variation in Tof11 and Tof12 under adaptation. To fur-ther explore the functional significance of tof11-1 and tof12-1, we examined their association with flowering time and maturity in the 424-accession panel at five field sites in China spanning 23° N in Guangzhou to 45° N in Harbin (Extended Data Fig. 8). Classification according to Tof11 or Tof12 genotype clearly showed that accessions carrying functional alleles flowered and matured significantly later than accessions carrying the corresponding loss-of-function alleles (Extended Data Fig. 8a,b). In all locations, accessions carrying a mutant allele at either locus (that is, Tof11 tof12-1 or tof11-1 Tof12) flowered similarly to each other, but significantly earlier than those carrying a functional allele at both (Tof11 Tof12) (Extended Data Fig. 8c–h). Accessions carrying both mutations were significantly earlier than the other three genotypic groups at all sites, except at the lowest-latitude site in Guangzhou (Extended Data Fig. 8h). These results were consistent with the similar comparison of the tof11-1 and tof12-1 interaction in NILs (Fig. 2a), and collectively confirm significant additive roles for Tof11 and Tof12 mutations in the promotion of flowering and maturity.

We next examined how the distributions of the major Tof11/tof11-1, Tof12/tof12-1 and E1/e1as (refs. 27,28) alleles within the sub-set of Chinese accessions differed according to their geographic origins (Fig. 5f). As observed for the entire panel (Fig. 5c), tof12-1 was strongly enriched in landraces, particularly in the northeast region (Fig. 5f), and fixed in improved cultivars regardless of origin. Within both landraces and improved cultivars, tof11-1 alleles were more frequent in northern regions (north and northeast) than in southern regions (Huanghuai and south), and approached fixation in improved cultivars from the northeast (Fig. 5f). In contrast to tof11-1 and tof12-1, e1as was much less common in both landraces and improved cultivars, and almost absent in accessions from the two southern regions.

Finally, we considered the geographic distributions of the four Tof11/Tof12 allelic combinations in Chinese accessions (Extended Data Fig. 9). Landraces carrying either tof12-1 or tof11-1 showed a higher mean latitude of origin relative to accessions carrying wild-type alleles at both loci, and the distribution of accessions carrying both mutations was shifted even further north (Extended Data Fig. 9d).

Tof11/Tof12

E1

E1

FT2a/FT5a

FT2a FT5a

FD19AP1

Flowering/maturity/grain yield

Shoot apical

LeafE3/

PHYA3E4/

PHYA2

LHYs

LHYsTof11/Tof12

Fig. 4 | Model summarizing the mechanism of Tof11 and Tof12 action under LD conditions. Light acts to induce Tof11 and Tof12 expression in part through the two phytochrome A photoreceptors E3 and E4. Tof11 and Tof12 proteins directly bind to the promoters of LHY genes to suppress their transcription. LHY proteins in turn physically associate with the promoter of E1 to suppress its expression, thus mediating the transcriptional activation of E1 by Tof11 and Tof12. Consequently, E1 represses expression of two FT homologs to delay flowering and maturity and enhance grain yield.

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics 433

Page 7: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcs

This result was further supported by the observation that the effect of tof11-1 and tof12-1 alleles became more pronounced at higher latitude sites (Fig. 5f,g and Extended Data Fig. 10). In view of the likely origins of domesticated soybean in the Huanghuai/north regions, and the observed patterns and effects of allelic change in the Tof12, Tof11 and E1 loci, we conclude that these three genes are of sequential importance during soybean domestication and the ini-tial expansion toward higher latitudes.

DiscussionModification of phenology has been a central feature of crop evolu-tion, and is understood to have been driven by selective pressures to maximize yield and synchronize development across diverse environments and farming practices. Although the importance of flowering time in the adaptation of cultivated soybean is well estab-lished, and several major genes have been characterized11,20, little

is known about the changes in phenology that accompanied the earliest steps in soybean crop evolution. Our detailed examination shows that two PRR3 homeologs, Tof12 and Tof11, have respec-tively played sequential roles in this adaptation. Compared to wild soybean, in which flowering and maturity show substantial delay under LD conditions relative to short-day conditions, soybean with loss of Tof11 and Tof12 function significantly reduces this photope-riod sensitivity and significantly shortens the time to flowering and maturity under LD conditions (Fig. 2 and Extended Data Fig. 2). Furthermore, a molecular footprint of strong selection at Tof12 dur-ing domestication (Fig. 5b, Extended Data Fig. 5 and Supplementary Fig. 8) and the high frequency of a specific loss-of-function tof12-1 allele in an extensive landrace collection (Fig. 5c,d) together provide strong evidence that a change in phenology related to loss of Tof12 function had a significant role in adaptation of wild soybean during a phase of initial cultivation and domestication.

0

2,000

4,000

6,000

8,000

10,000

12,000

Yea

rs b

efor

e pr

esen

t (B

P)

Tof11 Tof12

BP 8,003

BP 10,469

00.20.40.6

FS

TF

ST

0.81 Tof11

π (1

0–3)

π (1

0–3)

0

2

4

6

8

10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0

a

00.20.40.60.8

1 Tof12

0

2

4

6

8

4.5 4.7 4.9 5.1 5.3 5.5 5.7 5.9 6.1 6.3 6.5

b

e

d

n = 138 n = 492 n = 522

Gg96% 76%

24%

98%

n = 117 n = 431 n = 430

Shat1-5T

Shat1-5G82% 96% 86%

n = 124 n = 445 n = 457

Hs1-1hs1-194% 93% 99%

c

Tof11/Tof12

n = 143 n = 542 n = 550

Tof11tof11-1100% 58%42%

90%

n = 140 n = 485 n = 564

Tof12tof12-194% 90% 100%

n = 136 n = 463 n = 542

100%

47%44%90%

Wild Landrace Cultivar

Tof11/tof12-1

tof11-1/tof12-1tof11-1/Tof12

Tof12

Tof11

Wild Landrace Improved cultivar

NE

E1

NR HR SR NE NR HR SR

f

Tof12 and Tof11

100%

π (Wild) π (Landrace) π (Cultivar)

π (Wild) π (Landrace) π (Cultivar)

FST(Wild_Landrace) FST(Landrace _Cultivar)

FST(Wild_Landrace) FST(Landrace _Cultivar)

Fig. 5 | Stepwise selection on Tof12 and Tof11 during soybean evolution. a,b, FST and π values in wild soybeans, landraces and improved cultivars across the 2 Mb genomic regions surrounding Tof11 (a) and Tof12 (b). Blue dashed lines indicate the threshold of the whole-genome level. c, Proportions of tof11-1 and tof12-1 alleles and their co-occurrence within each of the three germplasm groups. Data are combined from both 424- and 871-accession 1,295 diversity panels. d, Proportions of three putative domesticated genes and their co-occurrence within each of the three germplasm groups. Data are combined from both 424- and 871-accession 1,295 diversity panels. e, Molecular domestication dating of Tof11 and Tof12. Estimating the expected evolutionary ages is described in the Methods. bp, before present. f, Allelic distributions of Tof12/tof12-1, Tof11/tof11-1 and E1/e1-as in subsets of Chinese landraces and cultivars (from c) according to region of origin. NE, northeast region of China; NR, north region of China; HR, Huanghuai region of China; SR, south region of China.

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics434

Page 8: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs

The PRR gene family was first functionally characterized in the LD plant Arabidopsis, in which three of the five family mem-bers have partially redundant roles in the promotion of flowering under LD conditions29. PRR genes also influence geographical and seasonal adaptation in several cereal crops, through effects on pho-toperiod sensitivity30–35. Our results show that PRR genes in the PRR3/7 subclade are also functionally and adaptively important in soybean, a major crop legume, and suggest that they may also have adaptative roles in other legume crop species. However, despite this growing understanding of the significance of PRR genes in flower-ing time control and adaptation, their mechanism of action is not well defined. Our results show that in soybean, Tof11 and Tof12 act to inhibit FT gene expression and flowering under LD conditions by enhancing expression of the legume-specific FT suppressor E1 (refs. 11,19). This differs from a recent conclusion that the effects of these loci are independent of E1 (ref. 15), a difference that is likely attributable to our use of more closely isogenic material. We also show that the LHY family of MYB transcription factors are key mediators of this regulation, serving as direct targets of transcrip-tional repression by Tof11 and Tof12 and direct repressors of E1 expression (Fig. 3c–h). This conclusion is strengthened by evidence that LHY genes act redundantly to repress E1 expression and pro-mote flowering in LD conditions (Figs. 3c–e and 4). How Tof11 and Tof12 effects are integrated with those of other photoperiod-response loci to determine photoperiod-specific regulation of E1 activity is an intriguing question for future studies.

The ability to conduct detailed genetic and genomic analyses in well-characterized and comprehensive germplasm collections is providing increasing resolution in our ability to distinguish the sequence of molecular events that have contributed to the evolution of modern crops36. Our analyses of domestication-related changes in soybean flowering time extend a number of previous studies focusing on hard seededness, seed dormancy and pod shattering traits23–25, by demonstrating that selection for the early-flowering tof12-1 allele coincided with selection for alleles at the Hs1-1, G and Shat1-5 loci (Fig. 5). Although broad discussions of plant domes-tication have often included flowering phenology as a prominent component in the domestication syndrome, and highlighted in par-ticular the importance of early, synchronous maturity4,5, there are few cases where changes in phenology during domestication have been clearly demonstrated, and even fewer in which the molecular basis has been conclusively defined. In most well-known examples of ancient flowering time adaptation, including the barley, rice and sorghum PRR genes noted above, and in legume examples, such as pea HR37, common bean PPD38 and soybean E1 (ref. 19), wild-type alleles are present at substantial frequencies in domesticated germ-plasm, indicating a predominantly postdomestication role in crop expansion and diversification6. In other examples from sunflower and sugarbeet, identified variants have been strongly selected dur-ing domestication, but confer the delayed flowering that is typical of domesticated material39–41. In one recent example, a regulatory mutation in the promoter of the maize FT gene, ZCN8, is closely associated with elevated expression and earlier flowering, and has been strongly selected from a frequency of around 20% in wild teosinte to 80% in tropical maize42. Our characterization of Tof12 reveals an even more comprehensive replacement of a functional wild allele with a nonfunctional allele in primitive domesticated material, and thus provides perhaps the strongest evidence for a shift to earlier phenology that dates to the domestication phase of crop evolution.

A comparison of Tof12 and Tof11 with other major soybean phe-nology loci suggests a model in which initial selection for tof12-1 loss of function during domestication was followed by successive incor-poration of the major loss-of-function alleles for Tof11 (tof11-1) and E1 (e1as; Fig. 5f), and likely for other maturity genes e3, e4 and j. The geographic distribution of this variation shows that enrichment

for these alleles is associated with increased latitude (Fig. 5f and Extended Data Figs. 8–10), suggesting that it may have permitted gradual expansion and improvement at the northern limit of early soybean cultivation, where accessions carrying functional alleles flower too late to reliably mature in the growing season (Extended Data Fig. 8e). However, the substantial overlap in allele distribu-tions, and the presence of tof11-1 and tof12-1 alleles across a wide latitudinal range, also suggests that the shorter growth cycle and greater synchrony in maturity that they provide may have conferred advantages even in middle latitudes, conceivably for early intensi-fication of cultivation and/or in marginal environments. One con-sequence of this early selection is that functional Tof11 and Tof12 alleles have been effectively lost to modern soybean breeding. This observation suggests a new approach to reconfigure and fine-tune soybean phenology, in which these alleles might be used in combi-nation with variation at other major maturity loci to broaden and improve adaptation and increase yield. It also establishes a platform from which more detailed physiological and agronomic analyses can explore and model the adaptive advantage of variation at these loci, to provide new insight into the history of soybean cultivation and opportunities for future improvement.

Online contentAny methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary informa-tion, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-020-0604-7.

Received: 9 October 2019; Accepted: 27 February 2020; Published online: 30 March 2020

references 1. Darwin, C. On the Origin of Species by Means of Natural Selection, or the

Preservation of Favoured Races in the Struggle for Life (John Murray, 1859). 2. Hammer, K. Das domestikations syndrom. Kulturpflanze 32, 11–34 (1984). 3. Harlan, J. R. Crops and Man 2nd edn (American Society of Agronomy, 1992). 4. Doebley, J. F., Gaut, B. S. & Smith, B. D. The molecular genetics of crop

domestication. Cell 127, 1309–1321 (2006). 5. Olsen, K. M. & Wendel, J. F. A bountiful harvest: genomic insights into crop

domestication phenotypes. Annu. Rev. Plant Biol. 64, 47–70 (2013). 6. Meyer, R. S. & Purugganan, M. D. Evolution of crop species: genetics of

domestication and diversification. Nat. Rev. Genet. 14, 840–852 (2013). 7. Graham, P. H. & Vance, C. P. Legumes: importance and constraints to greater

use. Plant Physiol. 131, 872–877 (2003). 8. Hymowitz, T. On the domestication of the soybean. Econ. Bot. 24,

408–421 (1970). 9. Carter, T. E., Nelson, R., Sneller, C. H. & Cui, Z. in Soybeans: Improvement,

Production and Uses 3rd edn (eds Shibbles, R. M. et al.) Ch. 8 (American Society of Agronomy, 2004).

10. Li, Y. et al. Genetic structure and diversity of cultivated soybean (Glycine max (L.) Merr.) landraces in China. Theor. Appl. Genet. 117, 857–871 (2008).

11. Cao, D. et al. Molecular bases of flowering under long days and stem growth habit in soybean. J. Exp. Bot. 68, 1873–1884 (2017).

12. Zhou, Z. et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408–414 (2015).

13. Fang, C. et al. Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biol. 18, 161 (2017).

14. Qi, X. et al. Identification of a novel salt tolerance gene in wild soybean by whole-genome sequencing. Nat. Commun. 5, 4340 (2014).

15. Li, M. W., Liu, W., Lam, H. M. & Gendron, J. M. Characterization of two growth period QTLs reveals modification of PRR3 genes during soybean domestication. Plant Cell Physiol. 60, 407–420 (2019).

16. Li, S., Cao, Y., He, J., Zhao, T. & Gai, J. Detecting the QTL-allele system conferring flowering date in a nested association mapping population of soybean using a novel procedure. Theor. Appl. Genet. 130, 2297–2314 (2017).

17. Lu, S. et al. Identification of additional QTLs for flowering time by removing the effect of the maturity gene E1 in soybean. J. Integr. Agr. 15, 42–49 (2016).

18. Shen, Y. et al. De novo assembly of a Chinese soybean genome. Sci. China Life Sci. 61, 871–884 (2018).

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics 435

Page 9: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcs

19. Xia, Z. et al. Positional cloning and characterization reveal the molecular basis for soybean maturity locus E1 that regulates photoperiodic flowering. Proc. Natl Acad. Sci. USA 109, E2155–E2164 (2012).

20. Lu, S. et al. Natural variation at the soybean J locus improves adaptation to the tropics and enhances yield. Nat. Genet. 49, 773–779 (2017).

21. Kong, F. et al. Two coordinately regulated homologs of FLOWERING LOCUS T are involved in the control of photoperiodic flowering in soybean. Plant Physiol. 154, 1220–1231 (2010).

22. Nakamichi, N. et al. Transcriptional repressor PRR5 directly regulates clock-output pathways. Proc. Natl Acad. Sci. USA 109, 17123–17128 (2012).

23. Wang, M. et al. Parallel selection on a dormancy gene during domestication of crops from multiple families. Nat. Genet. 50, 1435–1441 (2018).

24. Dong, Y. et al. Pod shattering resistance associated with domestication is mediated by a NAC gene in soybean. Nat. Commun. 5, 3352 (2014).

25. Sun, L. et al. GmHs1-1, encoding a calcineurin-like protein, controls hard-seededness in soybean. Nat. Genet. 47, 939–943 (2015).

26. Sedivy, E. J., Wu, F. & Hanzawa, Y. Soybean domestication: the origin, genetic architecture and molecular bases. New Phytol. 214, 539–553 (2017).

27. Jiang, B. et al. Allelic combinations of soybean maturity loci E1, E2, E3 and E4 result in diversity of maturity and adaptation to different latitudes. PLoS ONE 8, e106042 (2014).

28. Ogiso-Tanaka, E., Shimizu, T., Hajika, M., Kaga, A. & Ishimoto, M. Highly multiplexed AmpliSeq technology identifies novel variation of flowering time-related genes in soybean (Glycine max). DNA Res. 3, 243–260 (2019).

29. Nakamichi, N. et al. Arabidopsis clock-associated pseudo-response regulators PRR9, PRR7 and PRR5 coordinatevely and positively regulate flowering time through the canonical CONSTANS-dependent photoperiodic pathway. Plant Cell Physiol. 48, 822–832 (2007).

30. Turner, A., Beales, J., Faure, S., Dunford, R. P. & Laurie, D. A. The pseudo-response regulator Ppd-H1 provides adaptation to photoperiod in barley. Science 310, 1031–1034 (2005).

31. Beales, J., Turner, A., Griffiths, S., Snape, J. W. & Laurie, D. A. A pseudo-response regulator is misexpressed in the photoperiod insensitive Ppd-D1a mutant of wheat (Triticum aestivum L.). Theor. Appl. Genet. 115, 721–733 (2007).

32. Nishida, H. et al. Structural variation in the 50 upstream region of photoperiod-insensitive alleles Ppd-A1a and Ppd-B1a identified in hexaploid wheat (Triticum aestivum L.), and their effect on heading time. Mol. Breed. 31, 27–37 (2013).

33. Koo, B.-H. et al. Natural variation in OsPRR37 regulates heading date and contributes to rice cultivation at a wide range of latitudes. Mol. Plant 6, 1877–1888 (2013).

34. Murphy, R. L. et al. Coincident light and clock regulation of pseudoresponse regulator protein 37 (PRR37) controls photoperiodic flowering in sorghum. Proc. Natl Acad. Sci. USA 108, 16469–16474 (2011).

35. Klein, R. R. et al. Allelic variants in the PRR37 gene and the human-mediated dispersal and diversification of sorghum. Theor. Appl. Genet. 9, 1669–1683 (2015).

36. Purugganan, M. D. Evolutionary insights into the nature of plant domestication. Curr. Biol. 14, R705–R714 (2019).

37. Weller, J. L. et al. A conserved molecular basis for photoperiod adaptation in two temperate legumes. Proc. Natl Acad. Sci. USA 109, 21158–21163 (2012).

38. Weller, J. L. et al. Parallel origins of photoperiod adaptation following dual domestications of common bean. J. Exp. Bot. 70, 1209–1219 (2019).

39. Blackman, B. K., Strasburg, J. L., Raduski, A. R., Michaels, S. D. & Rieseberg, L. H. The role of recently derived FT paralogs in sunflower domestication. Curr. Biol. 20, 629–635 (2010).

40. Blackman, B. K. et al. Contributions of flowering time genes to sunflower domestication and improvement. Genetics 187, 271–287 (2011).

41. Pin, P. A. et al. The role of a pseudo-response regulator gene in life cycle adaptation and domestication of beet. Curr. Biol. 22, 1095–1101 (2012).

42. Guo, L. et al. Stepwise cis-regulatory changes in ZCN8 contribute to maize flowering time adaptation. Curr. Biol. 28, 3005–3015 (2018).

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© The Author(s), under exclusive licence to Springer Nature America, Inc. 2020

NaTure GeNeTiCS | VOL 52 | APRIL 2020 | 428–436 | www.nature.com/naturegenetics436

Page 10: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs ArticlesNAturE GENEtIcs

MethodsResequencing, mapping and variation calling. For each of the accessions in the 424 panel at least 5 µg of DNA was used to construct a sequencing library with an Illumina TruSeq DNA Sample Prep Kit, according to the manufacturer’s instructions. Paired-end sequencing (150 base pairs (bp)) of each library was performed on an Illumina HiSeq X Ten system. For the 809-accession panel, resequencing data for landraces and improved cultivars were downloaded from the Genome Sequence Archive database in BIG Data Center (http://gsa.big.ac.cn/index.jsp) under accession number PRJCA000205 (ref. 13). The resequencing data for 62 wild soybeans were downloaded from the NCBI database under SRA accession number SRP045129 and the sequence data newly generated from this study are deposited into the SRA database in NCBI under accession number PRJNA394629 (ref. 12). Paired-end resequencing reads of the 424 accessions in this study and the 871 accessions previously sequenced were mapped to the ZH 13 genome with BWA software with default parameters18. The duplicates of the sequencing read for each accession were filtered with the Picard program, and uniquely mapping reads were retained in BAM format. Reads around indels from the BWA alignment were realigned with the IndelRealigner option in the Genome Analysis Toolkit (GATK)43,44. SNP and indel calling was performed with GATK and SAMtools software45. SNPs with a minor allele frequency (MAF) less than 1% were discarded, and indels with a maximum length of 10 bp were included. SNP annotation was carried out based on that of the ZH 13 genome, using snpEff software46, and SNPs were categorized as being in intergenic regions, upstream (that is, within a 2-kb region upstream of the transcription start site) and downstream (within a 2-kb region downstream of the transcription termination site) regions, in exons or introns. SNPs in coding sequences were further classified as synonymous SNPs or nonsynonymous SNPs. Indels in exons were classified according to whether they led to a frameshift effect.

Population genetic analyses. To conduct the phylogenetic analysis, SNPs of all accessions were filtered with MAF = 0.05. These SNPs were used to construct a neighbor-joining tree with PHYLIP software and were visualized with the online tool iTOL (https://itol.embl.de). PCA was performed with this SNP set with the smartpca program embedded in the EIGENSOFT package47,48.

Linkage disequilibrium analysis. Linkage disequilibrium was calculated for each subpopulation with SNPs with MAF > 0.05. To perform the linkage disequilibrium calculation, plink software was applied with the parameters (--ld-window-r2 0 --ld-window 99999 --ld-window-kb 1000). Linkage disequilibrium decay was calculated on the basis of r2 between two SNPs and averaged in 1-kb windows with a maximum distance of 1 Mb (ref. 49).

GWAS for flowering time and maturity. We used 3,582,767 high-quality SNPs (MAF > 0.05) to perform GWAS for flowering time in 424 accessions, and 3,024,773 high-quality SNPs (MAF > 0.05) to perform GWAS for flowering time and maturity in 809 accessions. Association analyses were performed by MLM implemented in efficient mixed-model association expedited (EMMAX) software13. Kinship was derived from all these SNPs. The significant association threshold was set as 1/n (n, total SNP number). The significant association regions were manually verified from the aligned resequencing reads against the ZH 13 genome with SAMtools45.

Genetic diversity analysis and molecular evolution. SNP data were used for the genetic diversity analysis. The SNPs with missing data >10% or MAF < 5% were filtered, and the pairwise genomic differentiation values for wild, landrace and cultivated populations of soybean were calculated using a 10-k–10-k sliding window in VCFtools50. Values for Tajima’s D were calculated as the probability of the sequence departure of a neutrally evolved model, also using VCFtools.

Soybean accessions, growth conditions and phenotyping. The 424-accession panel was grown under natural day length conditions in Zhengzhou (34° 44′ N, 113° 42′ E) and Hefei (31° 51′ N, 117° 15′ E) in 2018, Guangzhou (23° 16′ N, 113° 23′ E), Wuhan (30° 52′ N, 114° 31′ E), Zhengzhou and Harbin (45° 75′ N, 126° 63′ E) in 2019 in China to evaluate the flowering time and maturity in 2018. For QTL mapping, an F2 population (n = 145) was generated from the crosses H3 × Harosoy. For map-based cloning, two heterozygous inbred progenies of 1,737 and 2,859 individuals segregating at the Tof11 and Tof12 loci, respectively, were subsequently developed. NILs for the loci E1 and Tof11, E1 and Tof12, Tof11 and Tof12, were also selected from F6 progeny of this same cross, using molecular markers for E1, Tof11 and Tof12. The F2 population, heterozygous inbred progenies, NILs, CRISPR–Cas9 knockout mutants and transformants for phenotyping were grown under an artificial long day in the field (day length 14 h light/10 h dark) or natural long day (day length 16 h light/8 h dark) from 2013 to 2018 at the Experimental Station of the Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin. Artificial 14 h LD conditions in the field were imposed using a large steel frame, which was manually covered with black plastic from 19:00 to 5:00 on a daily basis. Plants were sown at the beginning of May, spaced 0.15 m apart in rows 5 m long, with 0.7 m between rows, and harvested in September or October of each year. Plants for expression analysis and ChIP assay were grown under LD

conditions (day length 16 h light/8 h dark) in a plant growth cabinet (Conviron Adaptis A1000) with light intensity of 500 µmol m−2 s−1.

Flowering time was recorded at the R1 stage (days from emergence to the appearance of the first open flower on 50% of the plants). Maturity was recorded at the R8 stage (days from emergence to the time at which 95% of pods attained mature color)51. Plant height, number of branches, number of nodes, average internode length, pods per plant, grains per plant and yield per plant were all recorded at the R8 stage20.

DNA isolation and molecular mapping and map-based cloning. Genomic DNA was extracted from fresh trifoliate leaves of 2-week-old seedlings with the SurePlant DNA Kit (CWBIO) and used for amplifying indel markers. Linkage map construction was conducted according to previous reports52,53 and QTL analysis used MapQTL v.5.0 (ref. 54). Primer sequences of the markers for mapping are listed in Supplementary Table 8. For fine mapping, indel markers were developed in the regions of Tof11 and Tof12 from the resequencing data of the two parents, H3 and Harosoy. For Tof11, seven recombinants were identified in the fine-mapping population using nine markers, and for Tof12 eight recombinants were identified in the fine-mapping population using eight markers. Flowering time of the progeny of these recombinants was evaluated to delimit the genomic interval containing Tof11 and Tof12.

RNA extraction and quantitative PCR with reverse transcription. Sample tissues were immediately frozen in liquid N2 and store at −80 °C. Total RNA was isolated from frozen tissues by Ultrapure RNA Kit (CWBIO). Synthesis of complementary DNA and removal of DNA were performed with PrimeScript RT ReagentKit with genomic DNA Eraser (Takara). The PCR mixture contained 10 ng of cDNA, 5 μl of 1.2 mM primer premix (Supplementary Table 8), 10 μl of SYBR Premix ExTaq Perfect Real Time (Takara) and water to a final volume of 20 μl. Quantitative PCR with reverse transcription (qRT–PCR) was performed using the LightCycler 480 (Roche). The PCR cycling conditions were 95 °C for 30 s followed by 40 cycles of 95 °C for 10 s and 60 °C for 60 s. The levels of GmTubulin (Glyma.05G157300) expression were used to calculate the relative expression levels of genes. Three biological replicates were used in all assays.

Plasmid construction and plant transformation. The CDS of the candidate genes (Tof11 and Tof12) and the 3-kb promoter region upstream of the start codon of the candidate genes were obtained from H3 by KOD-Plus-Neo (TOYOBO) with the primer in Supplementary Table 8. PCR products were separated on 1.0% (wt/vol) agarose and recovered using the E.Z.N.A. Gel Extraction Kit (Omega). The isolated DNA was ligated into the pGEM-T Easy vector (Promega) and sequenced. Clones with verified inserts were digested with restriction enzyme and introduced into pTF101-Gene vector (containing the bar gene for glufosinate resistance) to make the two constructs: pTof11 promoter plus the CDS of Tof11 and pTof12 promoter plus the CDS of Tof12. In addition, we introduced six repeat hemagglutinin (HA) sequences to the C end of p35S-Tof12 to construct p35S-Tof12-6HA. All vectors were introduced into the Agrobacterium strain EHA101, and Agrobacterium-mediated transformation was performed as described previously55,56. pTof11-Tof11 and pTof12-Tof12 were introduced into the soybean accession DN50 and p35S-Tof12-6HA was introduced into Williams 82 (W82).

CRISPR–Cas9 vector construction. The pYLCRISPR–Cas9P35S-B vector57, was reconstructed by replacing the kanamycin resistance gene with the spectinomycin resistance gene, resulting in the pYLCRISPR–Cas9P35S-BS. The target sequence adapters for four GmLHY genes were designed using the web tool CRISPR-P (http://cbi.hzau.edu.cn/crispr/). On the basis of the location within the gene and the GC content, six targets were selected and integrated into different single guide DNA (sgDNA) expression cassettes, and then introduced into the pYLCRISPR–Cas9P35S-BS vector according to the protocol reported previously51. The obtained CRISPR–Cas9 plasmid carrying the six sgRNA cassettes was transformed into the soybean accession Harosoy following the previously described approach55,56.

Transient expression assay. The ~3-kb promoter sequences from each of the four soybean LHY genes and the E1 gene were amplified from Harosoy and its corresponding NIL-E1 E3 E4, and were introduced into pBI121-LUC to generate the construct pGmLHY1a-LUC, and pE1-LUC. The CDS of Tof11 and Tof12 from H3, the CDS of four GmLHY homologous genes from W82, were introduced into pTF101-Gene-3Flag vector to generate the constructs p35S-Tof11-3Flag, p35S-Tof12-3Flag, p35S-Gmtof11-3Flag and p35S-Gmtof12-3Flag. The pGmLHY1a-LUC construct was used as the reporter and the p35S-Tof11-3Flag/p35S-Tof12-3Flag constructs used as the effectors in the transient tobacco expression system to test whether Tof11 and Tof12 suppress the transcription of GmLHY1a. The regulatory relationship of GmLHY1a with E1 was also confirmed by the transient tobacco expression system, with pE1-LUC as the reporter and p35S-GmLHY1a-3Flag as the effector.

Western blot. To analyze the protein expression in transgenic plants, total proteins of W82 and p35S-Tof12-6HA transgenic lines for immunoblot analyses were extracted with protein extraction buffer (50 mM Tris–HCl pH 7.5, 150 mM NaCl, 5 mM EDTA, 0.1% Triton X-100 and protease inhibitor cocktail). The antibody

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 11: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcsArticles NAturE GENEtIcs

anti-HA (ab18181) was obtained from Abcam. Western blot was performed as described previously20,58.

ChIP assay. Leaf samples were collected from 20-day-old seedlings at Zeitgeber time 8 under LD conditions from W82 and p35S-Tof12-6HA transgenic lines. Samples were fixed on ice for 20 min in 1% formaldehyde under vacuum. Nuclei were isolated and sonicated as previously described20. The solubilized chromatin was immunoprecipitated by anti-HA (ab18181) or mouse IgG (Sigma, catalog no. I5381) with Protein G PLUS agarose (Santa Cruz Biotechnology, catalog no. sc-2002). The coimmunoprecipitated DNA was recovered and analyzed by qRT–PCR in triplicate. The precipitated DNA was recovered and analyzed by qRT–PCR. Relative fold enrichment was calculated by normalizing the amount of a target DNA fragment against that of a genomic fragment of a reference gene, ELONGATION FACTOR 1 GmELF1B (Glyma.02G276600.1), and then by normalizing the value for immunoprecipitation using a specific antibody against that of mouse IgG20. The primers used for amplification are listed in Supplementary Table 8.

EMSA. The full-length coding region of four GmLHYs and Tof11, Tof12 were amplified by PCR using the primer pair in Supplementary Table 8. The PCR product was ligated into the pET29b plasmid (Novagene). The recombinant fusion plasmids were transformed into Escherichia coli BL21 (DE3) cells. The fusion protein were purified at 4 °C and quantified according to the pET System Manual (Novagen). Four GmLHY promoters, including the CCT domain-binding motif CACGTG and E1 promoters including the EE motif, were produced by annealing of oligonucleotides. The EMSA assay was conducted with the LightShift Chemiluminescent RNA EMSA Kit (Thermo Scientific, catalog no. 20158). Details of the approach can be found in a previous report59.

Estimating the expected evolutionary ages of tof11 and tof12 mutations. We considered a dataset of DNA sequence variations at Tof11 and Tof12 genes in worldwide soybean samples. We focused on estimating the expected time to the most recent common ancestor and the expected ages of certain mutations with interesting geographic distributions. We used a conservative method that was described previously60,61 to estimate the age of the variation at Tof11 and Tof12. It can be estimated as T = S(nμL)−1, where T is time in generations, S is the number of singletons, n is the number of sequences carrying mutations, L is the length of the sequence in base pairs and μ is the per generation mutation rate per base pair62. The mutation rate was estimated as 6.1 × 10−9 (ref. 47) in which tof11 = 17 × (27 × 6.1 × 10−9 × 2,276)−1 and tof12 = 20 × (25 × 6.1 × 10−9 × 1,879)−1.

Statistical analyses. For phenotypic evaluation, at least ten individual plants were analyzed per accession, and exact numbers of individuals (n) are presented in the figure legends. For expression analyses using qRT–PCR, at least three individual plants were pooled per tissue sample and at least three qRT–PCR reactions (technical replicates) were performed, with the exact number of replicates provided in the figure legends. Mean values for each measured parameter were compared using one-way analysis of variance from SPSS (v.20, IBM) or one-tailed, two-sample Student’s t-tests from Microsoft Excel, whenever appropriate, and indicated in the relevant figure legends.

Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availabilityThe sequencing data used in this study have been deposited into the Genome Sequence Archive (GSA) database in BIG Data Center (http://gsa.big.ac.cn/index.jsp) under accession number PRJCA001691 and into the NCBI database under accession number PRJNA608146. The previously reported sequence data were deposited into the NCBI database under accession number SRP045129 and into the GSA database in BIG Data Center under accession number PRJCA000205. Source data for Fig. 3 are provided with the paper.

references 43. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–

Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 44. McKenna, A. et al. The genome analysis toolkit: a MapReduce framework

for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

45. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

46. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

47. Felsenstein, J. PHYLIP-Phylogeny Inference Package (version 3.2). Cladistics 5, 164–166 (1989).

48. Falush, D., Stephens, M. & Pritchard, J. K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).

49. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

50. Wang, M. et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat. Genet. 49, 579–587 (2017).

51. Fehr, W. R. & Cavines, C. E. Stages of Soybean Development Special Report (Iowa State Univ., 1977).

52. Fang, C. et al. Rapid identification of consistent novel QTLs underlying long-juvenile trait in soybean by multiple genetic populations and genotyping-by-sequencing. Mol. Breed. 39, 80 (2019).

53. Kong, L. et al. Quantitative trait locus mapping of flowering time and maturity in soybean using next-generation sequencing-based analysis. Front. Plant Sci. 9, 995 (2018).

54. Van Ooijen J. MapQTL 5 Software for the Mapping of Quantitative Trait Loci in Experimental Populations (Kyazma, 2004).

55. Nan, H. et al. GmFT2a and GmFT5a redundantly and differentially regulate flowering through interaction with and upregulation of the bZIP transcription factor GmFDL19 in soybean. PLoS ONE 9, e97669 (2014).

56. Cao, D. et al. GmCOL1a and GmCOL1b function as flowering repressors in soybean under long-day conditions. Plant Cell Physiol. 56, 2409–2422 (2015).

57. Ma, X. et al. A robust CRISPR/Cas9 system for convenient, high-efficiency multiplex genome editing in monocot and dicot plants. Mol. Plant 8, 1274–1284 (2015).

58. Ren, S. et al. CLE25 peptide regulates phloem initiation in Arabidopsis through a CLERK-CLV2 receptor complex. J. Integr. Plant Biol. 10, 1043–1061 (2019).

59. Hou, X. et al. Nuclear factor Y-mediated H3K27me3 demethylation of the SOC1 locus orchestrates flowering responses of Arabidopsis. Nat. Commun. 5, 4601 (2014).

60. Studer, A., Zhao, Q., Ross-Ibarra, J. & Doebley, J. Identification of a functional transposon insertion in the maize domestication gene tb1. Nat. Genet. 43, 1160–1163 (2011).

61. Huang, C. et al. ZmCCT9 enhances maize adaptation to higher latitudes. Proc. Natl Acad. Sci. USA 15, E334–E341 (2018).

62. Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).

acknowledgementsThis work was supported by the National Key Research and Development Program (grant no. 2016YFD0100401 to F.K.), National Natural Science Foundation of China (grant nos 31725021, 31571686 and 31701445 to F.K., 31930083 to B.L. and 31801384 to S. Lu) and the National Key Research and Development Program (grant nos 2017YFE0111000 to F.K. and 2016YFD0101900 to X.Z.).

author contributionsF.K. coordinated the project, and designed and interpreted experiments with input from J.L.W., S. Lu, L.D., C.F., S. Liu, L.K., Q.C., L.C., T.S., H.N., D.Z., L.Z., Z.W. and Y.Y. performed the experiments. X. Liu., Q.Y., D.Y., Q.X., X. Lin., X. Yang, C.T., X. Li., Y.T., X.Z., X. Yuan, Z.T., B.L. and F.K. performed the data analysis. F.K. and J.L.W. wrote the manuscript.

Competing interestsThe authors declare no competing interests.

additional informationExtended data is available for this paper at https://doi.org/10.1038/s41588-020-0604-7.

Supplementary information is available for this paper at https://doi.org/10.1038/s41588-020-0604-7.

Correspondence and requests for materials should be addressed to X.Yuan, Z.T., B.L., J.L.W. or F.K.

Reprints and permissions information is available at www.nature.com/reprints.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 12: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs ArticlesNAturE GENEtIcs

Extended Data Fig. 1 | See next page for caption.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 13: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcsArticles NAturE GENEtIcs

Extended Data Fig. 1 | Flowering time and maturity variations in 424 accessions. 424 accessions (85 wild soybeans, 153 landraces and 186 improved cultivars) were recorded in 2018 in Zhengzhou and Hefei, in 2019 in Harbin, Zhengzhou, Wuhan and Guangzhou, China. Average flowering time of 85 wild soybeans, 153 landraces and 186 improved cultivars in Zhengzhou 2018 a, Hefei 2018 b and average maturity in Zhengzhou 2018 c, Hefei 2018 d. Average flowering time of 85 wild soybeans, 153 landraces and 186 improved cultivars in Guangzhou 2019 e, Wuhan 2019 f, Zhengzhou 2019 g and Harbin 2019 h. Some wild soybeans in Harbin were unable to flower at the end season and were treated as 130d (entire growth period). In a-h, the lower and upper box edges corresponded to the first and third quartiles (the twenty-fifth and seventy-fifth percentiles); the horizontal line indicated the median value; and the lower and upper whiskers corresponded to the smallest value at most 1.5× interquartile range and the largest value no further than 1.5× interquartile range; points, outliers. i, The black circles indicated the geographic locations in China. The map was drawn using software ArcGIS 10.3 for desktop (https://desktop.arcgis.com/en/).

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 14: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs ArticlesNAturE GENEtIcs

Extended Data Fig. 2 | Positional cloning of Tof11 and Tof12. a, Characterization of key recombinants in the immediate vicinity of the Tof11 locus showed recombination break points (left panel), and mean flowering time of progeny (right panel). b, Gene structure of Tof11 showed the location of the loss-of-function tof11-1 mutation. c, Characterization of key recombinants in the immediate vicinity of the Tof12 locus showed recombination break points (left panel), and mean flowering time of progeny (right panel). d, Gene structure of Tof12 showed the location of the loss-of-function tof12-1 mutation. e, Transgenic complementation of the tof11-1 mutation showed phenotypes of two independent transformants TC#2 and TC#4 relative to the untransformed control DN50 tof11-1 plants under LD (14 h light/10 h dark) conditions. Scale bar, 10 cm. f, Flowering time, g, time to maturity and h, grain yield per plant of control and transgenic lines. i, Transgenic complementation of the tof12-1 mutation, showed phenotypes of two independent transformants TC#7 and TC#8 relative to the untransformed control DN50 tof12-1 plants under LD (14 h light/10 h dark) conditions. Scale bar, 10 cm. j, Flowering time, k, time to maturity and l, grain yield per plant of control and transgenic lines. All data were given as mean ± s.e.m. (n = 10 plants), the value of each plant was represented by a dot. A Student’s t-test was used to generate the P values.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 15: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcsArticles NAturE GENEtIcs

Extended Data Fig. 3 | expressions of LHY homologues in DN50 and complementary transgenic lines of Tof11 and Tof12. Expressions of LHY2b in Tof11 a and Tof12 b complementary transgenic lines. Expressions of LHY1b in Tof11 c and Tof12 d complementary transgenic lines. Expressions of LHY1a in Tof11 e and Tof12 f complementary transgenic lines. Expressions of LHY2a in Tof11 g and Tof12 h complementary transgenic lines. All data are given as mean ± s.e.m. (n = 5 plants). Plants were grown under LD (16 h light/8 h dark) and sampled at 20 DAE.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 16: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs ArticlesNAturE GENEtIcs

Extended Data Fig. 4 | Tobacco transient assay of different alleles of Tof11/tof11-1 and Tof12/tof12-1. a, Constructs used for the transient transfection assay. b, Luciferase activity under control of LHY1a promoter showing the results from three independent replications. The value of each replication was represented by a dot. The presence of the same lowercase letter above the histogram bars denotes nonsignificant differences across each panel (P > 0.05). A Student’s t-test was used to generate the P values.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 17: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcsArticles NAturE GENEtIcs

Extended Data Fig. 5 | Fst and Pi in the wild soybeans, landraces and improved cultivars spanning the 2 megabase genome regions of putative domesticated genes. Domesticated genes of G a, Shat1-5 b and Hs1-1 c were analyzed using the 1295 panel of 146 wild soybeans, 575 landraces and 574 improved cultivars. Blue dash lines indicated the threshold of the whole genome level.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 18: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs ArticlesNAturE GENEtIcs

Extended Data Fig. 6 | Haplotypes and their origins of Tof12. a, Haplotypes of Tof12. b, loss-of-function alleles of tof12. c, haplotype origins of Tof12. Grey color represented the wild soybeans, green color represented the landraces, blue color represented the improved cultivars. Pink triangles represented the loss-of-function alleles. Haplotypes was extracted from the 1295 panel of 146 wild soybeans, 575 landraces and 574 improved cultivars.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 19: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcsArticles NAturE GENEtIcs

Extended Data Fig. 7 | Haplotypes and their origins of Tof11. a, Haplotypes of Tof11. b, loss-of-function alleles of tof11. c, haplotype origins of Tof11. Grey color represented the wild soybeans, green color represented the landraces, blue color represented the improved cultivars. Pink triangles represented the loss-of-function alleles. Haplotypes was extracted from the 1295 panel of 146 wild soybeans, 575 landraces and 574 improved cultivars.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 20: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs ArticlesNAturE GENEtIcs

Extended Data Fig. 8 | Flowering time variations of different alleles in the 424 accessions of Tof11 and Tof12. a, flowering time (R1) and maturity (R8) variations in 424 accessions possess Tof11 and tof11-1 in Zhengzhou (ZZ) and Hefei (HF). b, flowering time (R1) and maturity (R8) variations in 424 accessions possess Tof12 and tof12-1 in Zhengzhou (ZZ) and Hefei (HF). Flowering time variations of four allelic combinations of Tof11 and Tof12 in Zhengzhou 2018 c, Hefei 2018 d, Harbin 2019 e, Zhengzhou 2019 f, Wuhan 2019 g and Guangzhou 2019 h. The horizontal dash lines from c-h indicated the growth period in each location. The lower and upper box edges corresponded to the first and third quartiles (the twenty-fifth and seventy-fifth percentiles); the horizontal line indicated the median value; and the lower and upper whiskers corresponded to the smallest value at most 1.5× interquartile range and the largest value no further than 1.5× interquartile range. The presence of the same lowercase letter above the histogram bars in c-h denoted nonsignificant differences across the two panels (P > 0.05). A Student’s t-test was used to generate the P values.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 21: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

Articles NAturE GENEtIcsArticles NAturE GENEtIcs

Extended Data Fig. 9 | Loss of Tof11 and Tof12 function improves soybean adaptation to high latitudes. a–c, Geographic origins of soybean wild accessions (a), landraces (b), and improved cultivars (c) possessing different allelic combinations at Tof11 and Tof12. NE, North East region of China; NR, North region of China; HR, Huanghuai region of China; SR, South region of China. Data were for both diversity panels (424- and 809-accession panels) but only accessions from China were shown. The maps were drawn using software ArcGIS 10.3 for desktop (https://desktop.arcgis.com/en/). d, Latitudinal distribution of all the landraces from China from (b) possessing different allelic combinations at Tof11 and Tof12. The lower and upper box edges corresponded to the first and third quartiles (the twenty-fifth and seventy-fifth percentiles); the horizontal line indicated the median value; and the lower and upper whiskers corresponded to the smallest value at most 1.5× interquartile range and the largest value no further than 1.5× interquartile range. The presence of the same lowercase letter above the histogram bars in (d) denoted nonsignificant differences across the two panels (P > 0.05). Student’s t-test was used to generate P values.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 22: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

ArticlesNAturE GENEtIcs ArticlesNAturE GENEtIcs

Extended Data Fig. 10 | Flowering time variation in two panels under latitudinal cline. The 809 domesticated accessions diversity panel at different latitudes, according to Tof11 a or Tof12 b genotype. Flowering time variation in the 442 accessions diversity panel at different latitudes, according to Tof11 c or Tof12 d genotype. The lower and upper box edges corresponded to the first and third quartiles (the twenty-fifth and seventy-fifth percentiles); the horizontal line indicated the median value; and the lower and upper whiskers corresponded to the smallest value at most 1.5× interquartile range and the largest value no further than 1.5× interquartile range. Student’s t-test was used to generate P values.

NaTure GeNeTiCS | www.nature.com/naturegenetics

Page 23: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

nature research | reportingsum

mary

October2018

Corresponding author(s):Fanjiang Kong; James L Weller; BaohuiLiu;  Zhixi Tian; Xiaohui Yuan

Last updated by author(s): 25th Feb.2020

Reporting SummaryNature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency  in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Please do not complete any field with "not applicable" or n/a. Refer to the help text for what text to use if an item is not relevant to your study.  For final submission: please carefully check your responses for accuracy; you will not be able to make changes later.

StatisticsFor all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement

A statement on whether measurements were taken from distinct samples or whether the same sample was measuredrepeatedly  

The statistical test(s) used AND whether they are one‐ or two‐sided

Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested

A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons

A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient)  AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P valuenotedGive P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings

For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes  

Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Our web collection on statistics for biologists contains articles on many of the points above.

Software and code

Policy information about availability of computer code

Data collection The reference genome sequence Zhonghuang 13 (Sci. China Life Sci. 61: 871‐884, 2018) was used in this paper. The sequencing reads  were undertaken with an Illumina HiSeq 4000 system.

Data analysis All software for data analysis were described in detail on the onlinemethods

For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.  We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data

Policy information about availability of data

All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:

‐ Accession codes, unique identifiers, or web links for publicly available datasets‐ A list of figures that have associated raw data‐ A description of any restrictions on data availability

The sequencing data used in this study have been deposited into the Genome Sequence Archive (GSA) database in BIG Data Center (http://gsa.big.ac.cn/index.jsp) under Accession Number PRJCA001691 (https://bigd.big.ac.cn/search?dbId=bioproject&q=PRJCA001691) and into the NCBI database under accession number  PRJNA608146. The previously reported sequence data were deposited into the NCBI database under accession number SRA: SRP045129 and deposited into the  Genome Sequence Archive (GSA) database in BIG Data Center under Accession Number PRJCA000205.

1

Page 24: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

nature research | reportingsum

mary

October2018

Field‐specific reportingPlease select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences

For a reference copy of the document with all sections, see nature.com/documents/nr‐reporting‐summary‐flat.pdf

Life sciences study designAll studies must disclose on these points even when the disclosure is negative.

Sample size Association was done by two panels of 424 and 809 soybeanaccessions.

Data exclusions no dataexclusion

Replication Association analysis was repeated at least twice. Expression and phenotyping were repeated threetimes.

Randomization The samples in all the expression and phenotyping investigation were randomlysampled.

Blinding The investigators were blinded to group allocation during the datacollection.

Behavioural & social sciences study designAll studies must disclose on these points even when the disclosure is negative.

Study description Briefly describe the study type including whether data are quantitative, qualitative, or mixed‐methods (e.g. qualitative cross‐sectional,  quantitative experimental, mixed‐methods case study). 

Research sample State the research sample (e.g. Harvard university undergraduates, villagers in rural India) and provide relevant demographic information  (e.g. age, sex) and indicate whether the sample is representative. Provide a rationale for the study sample chosen. For studies involving  existing datasets, please describe the dataset and source.

Sampling strategy Describe the sampling procedure (e.g. random, snowball, stratified, convenience). Describe the statistical methods that were used to  predetermine sample size OR if no sample‐size calculation was performed, describe how sample sizes were chosen and provide a rationale  for why these sample sizes are sufficient. For qualitative data, please indicate whether data saturation was considered, and what criteria  were used to decide that no further sampling was needed.

Data collection Provide details about the data collection procedure, including the instruments or devices used to record the data (e.g. pen and paper,  computer, eye tracker, video or audio equipment) whether anyone was present besides the participant(s) and the researcher, and whether  the researcher was blind to experimental condition and/or the study hypothesis during data collection.

Timing Indicate the start and stop dates of data collection. If there is a gap between collection periods, state the dates for each samplecohort.

Data exclusions If no data were excluded from the analyses, state so OR if data were excluded, provide the exact number of exclusions and the rationale  behind them, indicating whether exclusion criteria were pre‐established.

Non‐participation State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that noparticipants dropped out/declined participation.

Randomization If participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if  allocation was not random, describe how covariates were controlled.

Ecological, evolutionary & environmental sciences study designAll studies must disclose on these points even when the disclosure is negative.

Study description Briefly describe the study. For quantitative data include treatment factors and interactions, design structure (e.g. factorial, nested,  hierarchical), nature and number of experimental units and replicates.

Research sample

2

Describe the research sample (e.g. a group of tagged Passer domesticus, all Stenocereus thurberi within Organ Pipe Cactus National  Monument), and provide a rationale for the sample choice. When relevant, describe the organism taxa, source, sex, age range and any manipulations. State what population the sample is meant to represent when applicable. For studies involving existing datasets,  describe the data and its source.

Page 25: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

nature research | reportingsum

mary

October2018

Sampling strategy Note the sampling procedure. Describe the statistical methods that were used to predetermine sample size OR if no sample‐size  calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes aresufficient.

Data collection Describe the data collection procedure, including who recorded the data andhow.

Timing and spatial scale Indicate the start and stop dates of data collection, noting the frequency and periodicity of sampling and providing a rationale for  these choices. If there is a gap between collection periods, state the dates for each sample cohort. Specify the spatial scale from which  the data are taken

Data exclusions If no data were excluded from the analyses, state so OR if data were excluded, describe the exclusions and the rationale behind them,  indicating whether exclusion criteria were pre‐established.

Reproducibility Describe the measures taken to verify the reproducibility of experimental findings. For each experiment, note whether any attempts to  repeat the experiment failed OR state that all attempts to repeat the experiment were successful.

Randomization Describe how samples/organisms/participants were allocated into groups. If allocation was not random, describe how covariates were  controlled. If this is not relevant to your study, explain why.

Blinding Describe the extent of blinding used during data acquisition and analysis. If blinding was not possible, describe why OR explain why  blinding was not relevant to your study.

Did the study involve fieldwork? Yes No

Field work, collection and transportField conditions Describe the study conditions for field work, providing relevant parameters (e.g. temperature, rainfall).

Location State the location of the sampling or experiment, providing relevant parameters (e.g. latitude and longitude, elevation, water  depth).

Access and import/export Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and in compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing  authority, the date of issue, and any identifying information).

Disturbance Describe any disturbance caused by the study and how it wasminimized.

n/a Involved in the study  

Antibodies  

Eukaryotic cell lines  

Palaeontology

Animals and other organisms

Human research participants

Clinical data

Reporting for specific materials, systems andmethodsWe require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material,  system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting aresponse.

Materials & experimental systems Methods

n/a Involved in the study

ChIP‐seq

Flow cytometry

MRI‐based neuroimaging

Antibodies

Antibodies used HAantibody

Validation Antibodies anti‐HA (ab18181) was purchased fromAbcam

Eukaryotic cell lines

Policy information about cell lines

Cell line source(s) State the source of each cell lineused.

Authentication

3

Describe the authentication procedures for each cell line used OR declare that none of the cell lines used wereauthenticated.

Page 26: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

nature research | reportingsum

mary

October2018

Mycoplasma contamination Confirm that all cell lines tested negative for mycoplasma contamination OR describe the results of the testing for  mycoplasma contamination OR declare that the cell lines were not tested for mycoplasma contamination.

Commonly misidentified lines(See ICLAC register)

Name any commonly misidentified cell lines used in the study and provide a rationale for theiruse.

PalaeontologySpecimen provenance Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the 

issuing authority, the date of issue, and any identifying information).

Specimen deposition Indicate where the specimens have been deposited to permit free access by otherresearchers.

Datingmethods If new dates are provided, describe how they were obtained (e.g. collection, storage, sample pretreatment and measurement),where they were obtained (i.e. lab name), the calibration program and the protocol for quality assurance OR state that no newdates are provided.

Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information.

Animals and other organisms

Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research

Laboratory animals For laboratory animals, report species, strain, sex and age OR state that the study did not involve laboratoryanimals.

Wild animals Provide details on animals observed in or captured in the field; report species, sex and age where possible. Describe how animals  were caught and transported and what happened to captive animals after the study (if killed, explain why and describe method; ifreleased, say where and when) OR state that the study did not involve wild animals.

Field‐collected samples For laboratory work with field‐collected samples, describe all relevant parameters such as housing, maintenance, temperature,  photoperiod and end‐of‐experiment protocol OR state that the study did not involve samples collected from the field.

Ethics oversight Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or  guidance was required and explain why not.

Note that full information on the approval of the study protocol must also be provided in the manuscript.

Human research participants

Policy information about studies involving human research participants

Population characteristics Describe the covariate‐relevant population characteristics of the human research participants (e.g. age, gender, genotypic  information, past and current diagnosis and treatment categories). If you filled out the behavioural & social sciences study design  questions and have nothing to add here, write "See above."

Recruitment Describe how participants were recruited. Outline any potential self‐selection bias or other biases that may be present and how these are likely to impact results.

Ethics oversight Identify the organization(s) that approved the studyprotocol.

Note that full information on the approval of the study protocol must also be provided in the manuscript.

Clinical data

Policy information about clinical studiesAll manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions.

Clinical trial registration Provide the trial registration number from ClinicalTrials.gov or an equivalentagency.

Studyprotocol Note where the full trial protocol can be accessed OR if not available, explainwhy.

Data collection Describe the settings and locales of data collection, noting the time periods of recruitment and datacollection.

Outcomes

4

Describe how you pre‐defined primary and secondary outcome measures and how you assessed thesemeasures.

Page 27: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

nature research | reportingsum

mary

October2018

ChIP‐seq

Data deposition

Confirm that both raw and final processed data have been deposited in a public database such asGEO.

Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access linksMay remain private beforepublication.

For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document,provide a link to the deposited data.

Files in databasesubmission Provide a list of all files available in the databasesubmission.

Genome browser session(e.g. UCSC)

Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to  enable peer review. Write "no longer applicable" for "Final submission" documents.

Methodology

Replicates Describe the experimental replicates, specifying number, type and replicateagreement.

Sequencing depth Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of  reads and whether they were paired‐ or single‐end.

Antibodies Describe the antibodies used for the ChIP‐seq experiments; as applicable, provide supplier name, catalog number, clone  name, and lot number.

Peak callingparameters Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and  index files used.

Data quality Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5‐fold  enrichment.

Software Describe the software used to collect and analyze the ChIP‐seq data. For custom code that has been deposited into a  community repository, provide accession details.

FlowCytometry

Plots

Confirm that:

The axis labels state the marker and fluorochrome used (e.g. CD4‐FITC).

The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).  

All plots are contour plots with outliers or pseudocolor plots.

A numerical value for number of cells or percentage (with statistics) is provided.

Methodology

Sample preparation Describe the sample preparation, detailing the biological source of the cells and any tissue processing stepsused.

Instrument Identify the instrument used for data collection, specifying make and modelnumber.

Software Describe the software used to collect and analyze the flow cytometry data. For custom code that has been deposited into a  community repository, provide accession details.

Cell population abundance Describe the abundance of the relevant cell populations within post‐sort fractions, providing details on the purity of the samples  and how it was determined.

Gating strategy Describe the gating strategy used for all relevant experiments, specifying the preliminary FSC/SSC gates of the starting cellpopulation, indicating where boundaries between "positive" and "negative" staining cell populations are defined.

Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.

Magnetic resonance imaging

Experimental design

Design type

5

Indicate task or resting state; event‐related or blockdesign.

Page 28: Stgpwisg sglgetion on homgologous PRR ggngs ... - Geneticstianlab.genetics.ac.cn/TianLab_Publications/pdf...College of Resources and Environment, Fujian Agriculture and Forestry University,

6

nature research | reportingsum

mary

October2018

Design specifications Specify the number of blocks, trials or experimental units per session and/or subject, and specify the length of each trial  or block (if trials are blocked) and interval between trials.

Behavioral performancemeasures State number and/or type of variables recorded (e.g. correct button press, response time) and what statistics were used toestablish that the subjects were performing the task as expected (e.g. mean, range, and/or standard deviation acrosssubjects).

Acquisition

Imaging type(s) Specify: functional, structural, diffusion, perfusion.

Field strength Specify inTesla

Sequence & imagingparameters Specify the pulse sequence type (gradient echo, spin echo, etc.), imaging type (EPI, spiral, etc.), field of view, matrix size, slice thickness, orientation and TE/TR/flip angle.

Area of acquisition State whether a whole brain scan was used OR define the area of acquisition, describing how the region wasdetermined.

DiffusionMRI Used Not used

Preprocessing

Preprocessing software Provide detail on software version and revision number and on specific parameters (model/functions, brain extraction,  segmentation, smoothing kernel size, etc.).

Normalization If data were normalized/standardized, describe the approach(es): specify linear or non‐linear and define image types  used for transformation OR indicate that data were not normalized and explain rationale for lack ofnormalization.

Normalization template Describe the template used for normalization/transformation, specifying subject space or group standardized space (e.g.  original Talairach, MNI305, ICBM152) OR indicate that the data were not normalized.

Noise and artifact removal Describe your procedure(s) for artifact and structured noise removal, specifying motion parameters, tissue signals and  physiological signals (heart rate, respiration).

Volume censoring Define your software and/or method and criteria for volume censoring, and state the extent of suchcensoring.

Statistical modeling & inference

Model type and settings Specify type (mass univariate, multivariate, RSA, predictive, etc.) and describe essential details of the model at the first and second levels (e.g. fixed, random or mixed effects; drift or auto‐correlation).

Effect(s) tested

Specify type of analysis:

Statistic type for inference(See Eklund et al. 2016)

Define precise effect in terms of the task or stimulus conditions instead of psychological concepts and indicate whether  ANOVA or factorial designs were used.

Whole brain ROI‐based Both

Specify voxel‐wise or cluster‐wise and report all relevant parameters for cluster‐wise methods.

Correction Describe the type of correction and how it is obtained for multiple comparisons (e.g. FWE, FDR, permutation or Monte  Carlo).

Models & analysis

n/a Involved in thestudy

Functional and/or effective connectivity  

Graph analysis

Multivariate modeling or predictiveanalysis

Functional and/or effective connectivity Report the measures of dependence used and the model details (e.g. Pearson correlation, partial  correlation, mutual information).

Graphanalysis Report the dependent variable and connectivity measure, specifying weighted graph or binarized graph,  subject‐ or group‐level, and the global and/or node summaries used (e.g. clustering coefficient, efficiency,  etc.).

Multivariate modeling and predictive analysis Specify independent variables, features extraction and dimension reduction, model, training and evaluation  metrics.

This checklist template is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium  or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images  or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in  the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the  copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/