Post on 17-Jan-2016
Phenotype Curation
Susan R. McCouch
Department of Plant Breeding
Cornell University
QTL curation strategy for rice (Oryza sp.)
•Time
• Initiate with the current reports and work the backlog
•Traits
• Use of commonly accepted trait terms for clustering QTL phenotypes complementing the knowledge of breeders and geneticists
•Number
• Adopt a strategy that provides rapid processing of information on large number of QTLs followed by a deeper curation.
Three level curation strategy
• Level-1: A version with core features• Trait, map position, and citation.
• Develop tools for community curation and data mining
• Level-2: Deeper curation includes• Use of controlled vocabularies (PO, TO, GO and GEO),
defining the phenotypic assay and environment
• Genetic population structure and germplasms
• QTL analysis methodology, etc.
• Level-3: Populations and genotypes• Invite authors to submit raw data
• Train researchers to curate their own data
Demand for QTL curation
• Repeated requests from rice and other cereal research communities
• Thousands of QTLs reported in cereal crop literature
• Critical component of overall phenotype curation
• Continuity with previous versions of RiceGenes
• Critical links for breeders/geneticists looking for genes associated with agronomic traits
Comparison among cereals• Build and share data structure/schema required for useful
phenotypic/QTL comparison across cereal sp.
• Gramene is funded to curate QTL information for rice only, though working with cereal communities (e.g. maize and Triticeae).
• Contributions to a community based phenotype consortium of model organism databases through:
• Newly funded NSF initiatives
•POC (www.plantontology.org)
•Gramene RCN
•Phenotype Ontology initiatives of OBO (http://obo.sf.net)
QTL data in Gramene
• 3843 QTLs
• Rice (Oryza sp.): 3475
• Maize: 327
• Wild Rice (Zizania sp.): 41
• Curated references: 152
• Rice (Oryza sp.): 143 (2003-1999)
• Maize 8 (test cases on drought tolerance)
• Wild Rice (Zizania sp.): 1
• 57 QTL reference maps generated from different genetic populations such as DH, RIL, BC, F2 etc.
QTL traits clustered by trait categories
Trait Category Number of Traits Number of QTL
Abiotic stress 51 589
Biotic stress 8 256
Quality 30 200
Yield 24 773
Development 13 314
Anatomy 44 662
Sterility or fertility 9 38
Vigor 14 816
Biochemical 27 195
OPTION-1Search your query
QTL database search
Trait name. Links to all the QTLs in Gramene database, detected for this trait
Number of QTLs listed in Gramene database, detected for the given trait
Sort by any column heading
Found 30 traits under category “Quality”. Displayed 25 entries per page.
View Next 25 entries or type a page number and hit Page to go to that page
OPTION-2Browse by trait categoryBrowse QTL database
Trait assayed
QTL detected on linkage group. Links to all the QTLs listed in Gramene but mapped to this linkage group
Trait symbol. Links to all the QTLs detected for this trait listed in Gramene.
Trait category to which the trait belongs. Links to a trait browser displaying all the traits listed in Gramene belonging to this category.
QTL symbol assigned in the publication
Links to Gramene citation a published QTL reference.
Links to the QTL map on Gramene Comparative map viewer
QTL displayed on CMap
QTL always defined by linkage
• Population x marker set x phenotypic assay x statistical test of association
• Generally many phenotypes assayed for each population and multiple loci identified per phenotype
• Entry into DB different than with mutants but information intersects with mutants
phenotypic assay, gene, allele
• Entry is always via linkage to a set of markers
Level-2
• Large number of QTL references (200-300 papers in rice)• Inconsistent reporting style and format requires manual
curation by experts • Efforts to develop curation strategies using ontology etc.
encouraged. • Complicated data relationship and structure• Prioritize based on data availability• Prioritize based on trait
• Biotic and abiotic stress
Discussion points
• What linkage analysis tools should we make available to Gramene users, if any?
• How best to visualize QTLs on the comparative maps?
Updates to rice mutant database
1320 Genes characterized by phenotype• 424 Genes fully annotated (562 references):
• Phenotypic description• Associations to controlled vocabularies (TO, PO, GO)• Map position• Alleles, phenotypic study, germplasm, environment• Sequence and gene product• DBxref link to Oryzabase
• 896 Genes with only basic Info:• Gene name and symbol;• Map position• DBxref link to Oryzabase
Annotated mutant gene an example
Linking phenotypes across genetic maps
Linking phenotypes across genetic and physical maps
Outreach and Acknowledgement
• Distributed Annotation of Mutants
– Toshiro Kinoshita - rice mutants - Hokkaido Univ. (Japan)
– HeeJong Koh-rice mutants-Seoul National Univ. (Korea)
• Distributed Annotation of QTL
– Jonaliza Lanceras - drought-related QTLs in Rice - Kasetsart Univ. (Thailand)
– Dr. Longxi Yu – drought-related QTLs in Maize
• Contributed Map Position Info to Gramene:
• Dr. H. W. Cai; Dr. Y. Fukuta; Dr. J. Leach;
• Dr. H. Leung; Dr. Z. Li; Dr. M. Maheswaran;
• Dr. D. Mackill; Dr. Adam Price; Dr. J. Xiao;
• Dr. M. Yano; Dr. Q. Zhang; Dr. K. Zheng, etc.