Gramene Scientific Advisory Board December 14, 2010

79
Gramene Scientific Advisory Board December 14, 2010 1 Gramene SAB 2010

description

Gramene Scientific Advisory Board December 14, 2010. Introduction of SAB Members. David Marshall (SCRI) Paul Flicek (EBI) Michael Ashburner (Cambridge) Anna M McClung (USDA-ARS) Patricia Klein (Texas A&M) William Beavis (Iowa State) Tim Nelson (Yale) Georgia Davis (Missouri). - PowerPoint PPT Presentation

Transcript of Gramene Scientific Advisory Board December 14, 2010

Page 1: Gramene  Scientific Advisory Board December 14, 2010

Gramene Scientific Advisory Board

December 14, 2010

1Gramene SAB 2010

Page 2: Gramene  Scientific Advisory Board December 14, 2010

Introduction of SAB Members

• David Marshall (SCRI)• Paul Flicek (EBI)• Michael Ashburner (Cambridge)• Anna M McClung (USDA-ARS)• Patricia Klein (Texas A&M)• William Beavis (Iowa State)• Tim Nelson (Yale)• Georgia Davis (Missouri)

2Gramene SAB 2010

Page 3: Gramene  Scientific Advisory Board December 14, 2010

Introduction of Gramene• Doreen Ware (CSHL, PI)• Susan McCouch (Cornell, PI)• Pankaj Jaiswal (OSU, PI)• Ed Buckler (Cornell, PI)• Vindhya Amarasinghe (OSU, Pathways)• Karthikeyan Athikkattuvalasu (Cornell, Diversity, Phenotypes)• Terry Casstevens (Cornell, Diversity)• Charles Chen (Cornell, Diversity)• Aaron Chuah (CSHL, Diversity)• Genevieve DeClerck (Cornell, Diversity)• Palitha Dharmawardhana (OSU, Pathways)• Marcela Monaco (CSHL, Pathways)• Will Spooner (CSHL, Genomes)• Joshua Stein (CSHL, Genomes)• Jim Thomason (CSHL, Germplasm, Website, Pathways, Genes)• Sharon Wei (CSHL, Genomes)• Ken Youens-Clark (CSHL, Project Manager, etc.)

3Gramene SAB 2010

Page 4: Gramene  Scientific Advisory Board December 14, 2010

Aim 1: Genomes

Gramene SAB 2010 4

Doreen Ware, PI

Sharon Wei, Will Spooner, Ken Youens-Clark, Jim Thomason, Marcela Monaco, Josh Stein,

(Total Full Time Equivalent [FTE] 3.5)

Note: hired 25% FTE (Josh) to replace Noel Yap who left the project in the Cornell Group

1.5 FTE available from Ware, Dvorak NSF collaborations

Page 5: Gramene  Scientific Advisory Board December 14, 2010

Suggestions From Last Year• Add Brachypodium

– Added in Release 29

• Add a basal plant, e.g. Selaginella– We chose Physcomitrella patens because it was better

documented at the time (GB record and published)– Selaginella now has GB record and will be investigated for 2011

• Add a Solanacea and/or Legume– We are adding tomato in 2011 and are looking into either

soybean or Medicago

• Display RNAseq data– We now have the ability to display as DAS track (see

maizesequence.org)– Need to investigate data sources

Gramene SAB 2010 5

Page 6: Gramene  Scientific Advisory Board December 14, 2010

Highlights in 2010• Genomes: 3 new; many updates• Software: Ensembl 59 provides new visualizations

– SNP view– SNP Mart– Multi-species view– Multi-sequence alignment

• New Analyses– Gene-centered synteny build– EPO multi-sequence alignment– Split-gene detection

• New Development– GERP Conservation (Sharon)– GWAS views (Aaron, NSF 2010 collaboration)– Tandem arrays (Josh, Will)

Gramene SAB 2010 6

Page 7: Gramene  Scientific Advisory Board December 14, 2010

17 Genomes in Release 32• Physcomitrella (moss): Basal land plant• Updated assemblies of grapevine & poplar• Updated annotations of Indica rice & Arabidopsis • Updated assemblies & annotations of Oryza chr 3S projects

7Gramene SAB 2010

Page 8: Gramene  Scientific Advisory Board December 14, 2010

Genome Plans 2011:

Planning:• Lycopersicon esculentum (tomato)• Oryza glabberima (African domesticated rice)• Oryza brachyantha (wild rice)• Aegilops tauschii (wheat D, NSF #0701916)

Investigating:• Selaginella moellendorffii (basal vascular plant)• Triticum aestivum (hexaploid wheat)• Malus x domestica (apple) • Glycine max (soybean) or Medicago

8Gramene SAB 2010

Page 9: Gramene  Scientific Advisory Board December 14, 2010

Collaborations Genomes

– NSF PGI #0638820 PI Wing end 2009 (wild rice OMAP)– USDA ARS Grape end 2009 – NSF PGI PI Buckler end 2009 – NSF 2010 #0723510 PI Nordborg end 2011 (Arabidopsis

thaliana, A. Lyrata, Capesella) – NSF #0701916 PGI PI Dvorak end 2011 (wheat)– NSF PGI PI Wilson end 2010 (maize)– NSF PGI PI #0723510 Scanlon end 2012 (maize)– NSF PGI PI Springer to start this year (maize)– NSF PGI PI Wing end 2011 (wild rice OGE)– NSF PGI #1032105 PI McCombie end 2012 (wheat)– EBI BBRSC Paul Kersey (travel for coordination

participants)– NSF PGI PI McCouch end 2014 (rice)– NSF XXX Iplant Steve Goff

Page 10: Gramene  Scientific Advisory Board December 14, 2010

New Maps and Markers

New maps in last year:•Sorghum genetic (Mace) •Barley genetic (Close) •Ae. tauschii genetic (Dvorak) •Switchgrass genetic (Tobias)

10Gramene SAB 2010

Page 11: Gramene  Scientific Advisory Board December 14, 2010

More genomes in CMap

Gramene SAB 2010 11

Added two more fully sequenced genomes to CMap with seq/seq comparisons based on orthology (build 32).

Page 12: Gramene  Scientific Advisory Board December 14, 2010

New SNP View

• Synonymous coding• Non-synonymous

coding• Stop gain/loss• Splice site• UTR• Intronic

Shows functional consequences of polymorphism

12Gramene SAB 2010

New in Ensembl 56

Rice 160,000 SNPs x 21 varieties (incl. Nipponbare ref.) from OryzaSNP, MSU6

Maize 1.6 million SNPs x 27 NAM founder lines from Panzea, AGPv1

Arabidopsis

2010 Project SNP Discovery: 637,522 SNPs x 21 ecotypes (incl. Col-0 ref.), TAIR9

2010 Project 250K SNP chip genotypes v3.04, 214,000 SNPs x 1179 ecotypes, TAIR9

1001 Genomes/WTCHG SNPs from dbSNP, 2.7 million SNPs, 17 ecotypes, TAIR9

Grape 71K SNPs (Myles et al.)

Page 13: Gramene  Scientific Advisory Board December 14, 2010

13Gramene SAB 2010

SNP BioMart

Filter on region, phenotype, strains, id, & consequence (e.g. introduced STOP codon), and other attributes

Available for rice japonica, rice indica, Arabidopsis & grape datasets

Configure output fields and format (XLS, CSV, TSV, or HTML)

If HTML, link to Variation, Gene, or Browser Pages

Page 14: Gramene  Scientific Advisory Board December 14, 2010

Whole Genome Alignments

Gramene SAB 2010 14

Schwartz S et al., Genome Res.;13(1):103-7 Kent WJ et al., Proc Natl Acad Sci U S A., 2003;100(20):11484-9

BLASTZ-CHAIN-NET between 20 pairs of speciesAlignment (Release)Oryza sativa Japonica O.japOryza sativa Indica 31 O.indSorghum bicolor 31 - S.bicBrachypodium distachyon 31 31 - B.disArabidopsis thaliana 31 31 31 31 A.thaArabidopsis lyrata 31 - - - 31Vitis vinifera 31 - - - 31Poplar trichocarpa 31 - - - 31Oryza glaberrima 3s 31 - - - -Oryza minuta CC 3s 31 - - - -Oryza officinalis 3s 31 - - - -Oryza punctata 3s 31 - - - -Physcomitrella patens 32 - - - 32

New & improved alignment viewer (Ensembl 56)

Page 15: Gramene  Scientific Advisory Board December 14, 2010

Multispecies View

• Stack any number of genomes aligned to a common reference by BLASTZ

• Browse & zoom along any genome independently

Gramene SAB 2010 15

Re-introduced in Ensembl 56

Page 16: Gramene  Scientific Advisory Board December 14, 2010

Automated Detection of Split Genes

Gramene SAB 2010 16

Special class of “paralog” since Ensembl 58Contiguous split paralog: Non-overlapping, nearby (<1 Mb), same strandPutative split paralog: Non-overlapping, different regions (e.g. scaffolds)

Species Split GenesPopulus trichocarpa 1181Sorghum bicolor 1087Oryza sativa Japonica 916Vitis vinifera 520Oryza sativa Indica 365Zea mays 280Arabidopsis lyrata 202Arabidopsis thaliana 137Brachypodium distachyon 101

Genome alignment confirms inconsistent annotation

Page 17: Gramene  Scientific Advisory Board December 14, 2010

Gene-Centered Synteny Build

Gramene SAB 2010 17

2010: Implemented with automated pipeline runnables• Release 31: monocots• Release 32: dicots

Oryza sativa Japonica O.jap

Brachypodium distachyon YES B.dis

Sorghum bicolor YES YES S.bic

Arabidopsis thaliana - - - A.tha

Arabidopsis lyrata - - - YES A.lyr

Vitis vinifera - - - YES YES V.vin

Poplar trichocarpa - - - YES YES YES P.tri

Compara Orthologs Collinear mappings (DAGchainer)“in-range” mappings near collinear anchors

Map

Page 18: Gramene  Scientific Advisory Board December 14, 2010

Grape Reference Highlights Duplicated Regions in Arabidopsis and Poplar

• Polyploid and segmental duplications manifest as co-syntenic regions

• SyntenyView links to browser: Thus users can easily navigate between duplicated regions

Gramene SAB 2010 18

Page 19: Gramene  Scientific Advisory Board December 14, 2010

EPO Multiple Alignment & Ancestor Reconstruction

• Gramene implementation in 2010• Release 32: 8-way EPO alignment

– Rice japonica, indica, Brachypodium, sorghum, Arabidopsis, A. lyrata, grape, poplar

Paten et al (2008) Genome Research 18:1814Paten et al (2008) Genome Research 18:1829

Page 20: Gramene  Scientific Advisory Board December 14, 2010

2010 Genomes Development: Constrained Elements

• Genomic Evolutionary Rate Profiling (GERP): measures purifying selection• Method testing using 4-way and 8-way EPO alignments as input with

varying parameters• Input tree generated from 1301 ortholog sets• Planning release in 2011

Gramene SAB 2010 20

Cooper et al (2005) Genome Research 15:901

Page 21: Gramene  Scientific Advisory Board December 14, 2010

2010 Genomes Development

Gramene SAB 2010 21

Page 22: Gramene  Scientific Advisory Board December 14, 2010

Tandem Duplicate Detection

• Adjacent paralogs with no more than 2 intervening unrelated gene

• Increase gene dosage• Diversifying selection• Often species-specific

Gramene SAB 2010 22

Species Clusters Genes Largest FunctionRice japonica 2519 7054 24 phytosulfokine receptor-like (LRR-kinase receptor)Sorghum 2182 5927 19 Chalcone-stilbene synthase likeMaize 1871 4564 22 DUF1754 (domain of unknown function)Arabidopsis 1738 4581 28 ECA1 gametogenesis related family

LRR-Kinase cluster in rice

LRR-Kinase species-specific expansions

Page 23: Gramene  Scientific Advisory Board December 14, 2010

Collaboration with Ensembl Genomes

23Gramene SAB 2010

• Share conference calls• Developers meeting (Hinxton, UK, Sept. 2010) • Co-authored papers/posters• Two releases• Ensembl Developer’s Workshop

Page 24: Gramene  Scientific Advisory Board December 14, 2010

Website Improvements• Home facelift:

quick entry-points

• Migrated to Apache 2.0 in Release 31

Page 25: Gramene  Scientific Advisory Board December 14, 2010

REST Interfaces

25Gramene SAB 2010

New RESTful interface for site gives greater

user control over data views and format

Page 26: Gramene  Scientific Advisory Board December 14, 2010

New Oryza Pages• Highlights this genus with images, phylogeny,

geographic origin, & traits of interest• Entry points to browsers, germplasm, markers, &

taxonomy ontology

Gramene SAB 2010 26

Page 27: Gramene  Scientific Advisory Board December 14, 2010

Web Services

• Distributed Annotation Server (DAS) serving Ensembl genes as well as Gramene markers, sequences, and QTL

• Gramene Mart integration with Galaxy• Public MySQL server• Diversity data via Tassel and GDPC• Subversion for code access

27Gramene SAB 2010

Page 28: Gramene  Scientific Advisory Board December 14, 2010

Browser Development 2011 Plans• Communicate/distinguish gene-confidence information

– 28% of MSU6 rice genes are annotated as “TE_related” and 17% are in poorly-conserved “hypothetical” class

– 20% Sorghum genes are “low-confidence” (TE, pseudogenes, etc)– Color-code or display in separate tracks in browser– Color-code in gene-tree display

• List/Display detailed gene-level synteny information– Explicitly list syntenic genes from Gene Page– Indicate that a gene is syntenic to one or more genes of a different species

within the browser (e.g. color-code or synteny track) • List co-syntenic genes

– 2 genes (in separate blocks) having synteny to a common gene in another species arose from a large scale duplication event (e.g. polyploidy or segmental).

• Tandem Array track– Indicate clusters of paralogous genes within browser

• [Challenges of low-depth or highly fragmented genomes, e.g. wheat & Physcomitrella]

Gramene SAB 2010 28

Page 29: Gramene  Scientific Advisory Board December 14, 2010

2010 Ongoing Development Work

• miRNA pipeline runnable– Refine and automate steps in miRNA

annotation– Vmatch alignment– mfold RNA secondary structure prediction– Filter based on secondary structure

• Gene-Build with RNAseq evidence data– First pilot experiments performed

29Gramene SAB 2010

Page 30: Gramene  Scientific Advisory Board December 14, 2010

Questions for the SAB?

• Nominate genomes• New data types e.g. RNAseq data

available for current genomes that we may not be aware of

• Any physical aspects of web site needing improvement

30Gramene SAB 2010

Page 31: Gramene  Scientific Advisory Board December 14, 2010

Aim 2: Pathways

Pankaj Jaiswal, PI

Palitha Dharmawardhana, Jim Thomason, Vindhya Amarasinghe, Liya Ren,

AS Karthikeyan, Marcela Monaco

Note: Liya left the project this year and has been replaced by Marcela.

31Gramene SAB 2010

Page 32: Gramene  Scientific Advisory Board December 14, 2010

Aim#2 Plan (2009-2010 / Year-3)

• Continue curating Rice and Sorghum Pathways

• Release MaizeCyc and BrachyCyc

• Add all available microarray probesets to MarkerDb and allow OMICS viewer to validate

• Develop Reactome database for (Rice)

• Update the gene database schema to structure the allele based annotations on function, phenotype and interactions.

• Maintain and Develop Ontologies

32

Page 33: Gramene  Scientific Advisory Board December 14, 2010

33Gramene SAB 2010

Added BrachyCyc, MaizeCyc

Updated Pathway tools twice to latest versions.

Updated the individual pathway databases twice to be consistent with the Pathway tools version

Rice Pathways curated by addition of hydroxycinnamic acid and serotonin biosynthetic pathways, updates to auxin biosynthesis, tryptophan biosynthesis. Addition of 80 transport reactions and 477 transporters

Page 34: Gramene  Scientific Advisory Board December 14, 2010

Suggestions from last SAB

Concerns on supporting three technologies: Cyc, Reactome, WikiPathways.

Suggested moving to Reactome and allow the Cyc and WikiPathway databases to be populated by automated exports using BioPax.

34Gramene SAB 2010

Page 35: Gramene  Scientific Advisory Board December 14, 2010

Reactome Database Build• Reactome:

– Rice• Start with RiceCyc import and build on the existing Enselmbl and

Curated Genedb resources

– Arabidopsis • After consulting with the Reactome project and the Arabidopsis

Reactome group, this will become part of the renewal effort. The work on it will start with integrating it in the Reactome central database from its current location in JIC (www.arabidopsis reactome.org) , followed by active curation.

• Active curation will be primarily done in collaboration with Nick Provart’s group at Univ. of Toronto.

• This is a new International Collaboration

– Plan is to integrate the plant specific Reactome database instances in the Reactome central database, but provide a modified user interface for users.

Gramene SAB 2010 35

Page 36: Gramene  Scientific Advisory Board December 14, 2010

Rice Reactome• Initial build of the Rice Reactome started by importing the complete

(curated and predicted) RiceCyc data in BioPax level-2 format.• A test-v2 Rice Reactome is available from this link.

– The Reactome tools with some tweaking successfully imported 375 pathways and the children reactions

– Efforts are now on to integrate the mappings to • ChEBI, Ligand and PubChem for compounds/metabolites• KEGG for EC enzymes• Uniprot

– Drawing the network diagrams requiring manual curation. • Priority is to draw networks for fully curated Rice Pathways by using the Reactome tools

– Integrate predicted models of regulatory pathways for rice based on the reference pathway projections for cell cycle, transcription, translation etc.

– Curate test case rice pathways• Organized a week long workshop attended by curators from Gramene and BAR-Univ. of

Toronto (Nick Provart’s group)• Mentored by Reactome co-PI Peter D’Eustachio• A test case of ABA metabolism and signaling was curated, which contained both the

molecular and genetic interaction datasets.

Gramene SAB 2010 36

Page 37: Gramene  Scientific Advisory Board December 14, 2010

ABA metabolism and signaling pathway

Gramene SAB 2010 37

Klinger et al J. Exp. Bot. (2010) 61 (12): 3199-3210.

Reactome model: A prototype reaction network, ABA-mediated transcriptional regulation, was laid out using material from Nambara & Marion-Poll (2005 – PMID: 15862093) to supplement the pathways of ABA synthesis and catabolism available as RiceCyc templates, and the regulatory processes discussed by Xiong et al. (2002 – PMID: 11779861) (especially Figure 10) and Klingler et al. (2010 – PMID: 20522527)

Page 38: Gramene  Scientific Advisory Board December 14, 2010

Automated Cyc and WikiPathways builds

• Based on the SAB suggestions, the progress has been made towards the goal of extending the annotation of pathway databases in Cyc and Wiki versions in an automated way.

• However to do that approach we have to streamline the data workflow and structure the current curated gene database as a central repository/aggregator of necessary datasets to help achieve this goal.

• The Curated Gene database schema was restructured to hold, whole genome based annotations on genes and alleles and their associations to function, phenotype, germplasm, pathways, gene-to-gene interactions, gene products, and gene models, besides providing cross references to sequencing project objects (like gene models from IRGSP-RAP, MSU-OSA, BGI gene models for rice O. sativa) and published literature.

• Use aggregated datasets for automated Cyc build using the standard patwhay tools and provide the BioPax and SMBL dumps to WikiPathways project for their users.

• Gramene’s focus will be pathway curation and annotation in Reactome and functional annotation in gene database.

38Gramene SAB 2010

Page 39: Gramene  Scientific Advisory Board December 14, 2010

Outreach• Curated rice specific pathways and compounds contributed to PlantCyc and

MetaCyc projects on reference pathway databases.• Organized Workshops

– Community Gene Annotation Workshop at Plant Biology 2010 (July 2010)• Jointly organized with Plant Ontology (PO) Project.• Provided meeting support by way of website portal and onsite helping hands• Tool development (plant configurations of Phenote annotation tool and Ontologies) and

funding provided by PO project.• Attended by about 35 researchers of which 12 were awarded travel support by PO.

– Reactome workshop at CSHL, 25-29 October 2010• Attended by Gramene and BAR curators• Mentored by Reactome database (Peter D’Eustachio)• Hands on curation of a test case pathway.• Analysis of RiceCyc import and current Reactome Annotation tools.• Development of curation strategy and annotation guidelines.

39Gramene SAB 2010

Page 40: Gramene  Scientific Advisory Board December 14, 2010

Plans for 2010-2011• Release Rice Reactome• Release curated gene database in new avatar as

aggregator of gene information• Integrate microarray probeset mappings in OMICS validator

for non-rice pathways• Conduct the gene and pathway annotation outreach

workshops.• Develop test cases for upcoming Renewal and strategies

for analyzing large-scale datasets generated by NextGen technologies on transcriptomics and metabolomics.

• Maintain the current Cyc based Pathway views upgare to v14.5 and later of Ptools

40Gramene SAB 2010

Page 41: Gramene  Scientific Advisory Board December 14, 2010

Pathway Collaborations• Metacyc/BioCyc (Peter Karp)• Reactome (Lincoln Stein, Peter D’Eustachio)• Arabidopsis Reactome (Nick Provart, Henning Hermjakob)• PlantCyc (Sue Rhee)• SolCyc and Solanaceae Genome Network (Lukas Mueller)• Phenote curation tool (Nomi Harris, Suzi Lewis)• Ontologies (GO, PO, OBO)• BrachyBase (Todd Mockler)• Sorghum Biofuel and Bioenergy Project (John Mullet)• MaizeSequence.org• MaizeGDB• Maize Pathways (Andrew Hanson)• C3-C4 project (Tim Nelson, Tom Brutnell, Chris Myer, R. Bruskiewich)• WikiPathways• Expression data (Todd Mockler, Tim Nelson, Tom Brutnell)

Gramene SAB 2010 41

Page 42: Gramene  Scientific Advisory Board December 14, 2010

Questions for SAB?

• Nominate Pathways• Types of analysis users are interested in• Potential collaborators (national and

International)

Gramene SAB 2010 42

Page 43: Gramene  Scientific Advisory Board December 14, 2010

Aim3: Gramene Diversity Module

Susan McCouch & Edward Buckler, PIs

Terry Casstevens, Genevieve DeClerck, Charles Chen, AS Karthikeyan,

Jon Zhang, Qi Sun, Ken Youens-Clark.

43Gramene SAB 2010

Page 44: Gramene  Scientific Advisory Board December 14, 2010

Suggestions from last year

• Integration with key tools – We provide new SNP query tool, Web-

launched Tassel, and downloads to work with Flapjack, in formats like Plink, HapMap, etc.

• How about genotype storage? – Implemented BLOBs to store SNPs

Page 45: Gramene  Scientific Advisory Board December 14, 2010

New Data Sets• Arabidopsis

– Atwell et. al.. Genotype, phenotype, association data. ~214,000 SNPs, 199 Germplasm, 107 Phenotypes.

• Rice– Zhao et. al PLoS May 2010, "1536 Assay": 1311

SNPs x 395 varieties, mapped to MSU6.0– Gross B, et. al, Mol Ecol. Aug 2010 SNP diversity

study from PG • Maize

– dbSNP IDs and AGPv2 coordinate update for current dataset (1.6 million SNP x 27 NAM lines)

Page 46: Gramene  Scientific Advisory Board December 14, 2010

Web Interface – SNP Query

Page 47: Gramene  Scientific Advisory Board December 14, 2010

Downloads

Page 48: Gramene  Scientific Advisory Board December 14, 2010

Tassel

Page 49: Gramene  Scientific Advisory Board December 14, 2010

GWAS Visualization

Gramene SAB 2010 49

Page 50: Gramene  Scientific Advisory Board December 14, 2010

Tassel Development• New data structure significantly improving memory efficiency• Alignment viewer • User-friendly “wizards”• Progress monitoring with ability to cancel tasks • Import/export Hapmap, Flapjack, Plink data formats • Auto-loading and analysis execution from web site startup• GLM and MLM:

– GLM interface simplified. – Compression and faster P3D implemented for MLM resulting in reduced

runtime. – Matrix Algebra library wrapper written to make switching to newer, faster

libraries easier. – EJML Matrix Algebra library interface implemented.

• Tassel 3.0 Pipeline… – Automates complex loading/analysis pipelines – Doesn't need Java coding to create – Has simultaneously executing pipeline segments – Works from web site launch, command line, and GUI

Page 51: Gramene  Scientific Advisory Board December 14, 2010

- Experimental evidences (from other species, e.g. Arabidopsis)- Ontology terms

Selection of candidate genes

Selection of candidate genes

Prior-candidate genes

Prior-candidate genes

Compara pipelineCompara pipeline

- Coordinates of the genes- Functional implication or annotations

GWAS associationsGWAS associations

Hapmap SNP information

Hapmap SNP information

- SNP positions- Linkage disequilibrium estimates (r2)

Linkage block size calculations

Linkage block size calculations

- Associated SNP map positions- p-values

Linkage block size for ith prior candidate is given by:Bi = 95% quantile {di1, di2, di3,…dix} di1, di2, ..and dix are the map distances of the SNP loci in the gene to other loci on the same chromosome that are in a perfect LD (r2=1.0)

Enrichment score calculations

Enrichment score calculations

Hapmap SNP information

Hapmap SNP information

-SNP positions

for ith prior candidate gene, the enrichment score, Ei, is calculated by the weighted hypermetric probability of observing gi significant associations in the linkage block Bi, given the number of SNP xi located in the block and the total number of Gt SNP loci on the chromosome

Functional implicationsFunctional

implications

Functional implication of prior candidate genesby statistically significant overrepresentation of association signals

Page 52: Gramene  Scientific Advisory Board December 14, 2010

Example: Days-to-silk flowering time associations of maize chromosome 8

- Maize first generation hapmap 1.6 M SNP of all chromosomes- 136,119 SNPs on chromosome 8

- Flowering time trait, Days-to-Silk, of maize GWAS associations on chromosome 8- 144 associations (p-values < 1e-6)

- Curated Arabidopsis flowering time candidate genes- 274 genes in total

- Compara orthology of maize homologs to Arabidopsis flowering time candidates- 74 prior candidate genes

- Linkage disequilibrium estimates (r2) from 136,119 SNPs, filtered with MAF > 0.05

- Genetic distances calculated from each maize candidate gene to 144 GWAS associations

- Genetic distances of every pair of SNP loci in a perfect LD (r2=1.0)

Linkage block size calculations

Empirical cumulative probability distribution of genetic distances estimated by the SNP loci that are in a perfect LD

95% quantile

Linkage block size =105,387 bp

Pro

bab

ility

genetic distance of SNP loci

0 0.2 Mb 0.4Mb 0.6 Mb 0.8 Mb

Page 53: Gramene  Scientific Advisory Board December 14, 2010

Enrichment score calculations

Suppose GWAS identify Mt SNPs significantly associated with flowering time variation in Nt total number of SNPs on a given chromosome.

The enrichment score (Sei) determines the probability of getting gi number of significant GWAS association, weighted by p-values, within a linkage block.

Enrichment score for ith gene:

Mt: total number of significant GWAS SNPs on a given chromosome

Nt: total number of SNPs on a given chromosome

gi: significant GWAS SNPs in the defined window

xi: number of SNPS in the defined window

Sei: enrichment score of the ith maize flowering time candidate gene

where

Page 54: Gramene  Scientific Advisory Board December 14, 2010

Log10 of odds of maize flowering time prior candidate gene

FT maize homolog

AGL79 maize homolog

GI maize homolog

TOC1 maize homolog

rap2.7 AP2 maize homolog

Chromosome 2 Chromosome 3 Chromosome 8

LOD =2*

* Probability of null hypothesis is assessed by randomizing the association results with respect to the SNP positions, without changing the number and strength of association signals.

Page 55: Gramene  Scientific Advisory Board December 14, 2010

Plans - Rice

• Rice Diversity 44K chip: ~39,000 SNPs, 400 rice lines, phenotype data for 23traits - Build 33

• Rice SNP Consortium 1M chip data - Build 34

• Curate key large GWAS results

Page 56: Gramene  Scientific Advisory Board December 14, 2010

Plans Maize, Arabidopsis

• Maize Diversity/Panzea, 56 million SNPs x 104 maize lines (Build 33)

• Phenotypic data for an additional 10-20 traits (depending on publication acceptance rate)

• Additional data from Arabidopsis 2010 Project

• Curate key large GWAS results

Page 57: Gramene  Scientific Advisory Board December 14, 2010

Diversity Collaborations

• Rice:– McCouch (#0606461, #1026555)– Wing (#1026200)– Purugganan (#0701382)– Olsen (#0638820)

• Arabidopsis: Nordberg (#0723510)• Maize: Buckler (#0820619)

Gramene SAB 2010 57

Page 58: Gramene  Scientific Advisory Board December 14, 2010

Plans - Software

• Google Web Toolkit for association data viewer• SNP Query - additional features• TASSEL

– Flapjack integration. Work with SCRI to create seamless connectivity between the two applications

– Complete support for heterozygous data – Greater Junit testing (regression testing)– Automated MLM/GLM association analysis– New graphical displays (i.e., Manhattan plot) – Improvements to kinship calculations, imputation function

• Functional implications from GWAS associations -- develop web-based interface for statistical method

Page 59: Gramene  Scientific Advisory Board December 14, 2010

Plans – Comparative GWAS

• Develop web-based interface for comparative candidate gene enrichment system.

Page 60: Gramene  Scientific Advisory Board December 14, 2010

Diversity Questions for the SAB

• What should happen to diversity data in the renewal?– Large projects such as SeeD (CIMMYT),

Wheat/Barley CAP, GRIN-Global will likely go to new standards

• What needs to be done to transition?

Gramene SAB 2010 60

Page 61: Gramene  Scientific Advisory Board December 14, 2010

Aim 5: Outreach

Everyone

61Gramene SAB 2010

Page 62: Gramene  Scientific Advisory Board December 14, 2010

62Gramene SAB 2010

Page 63: Gramene  Scientific Advisory Board December 14, 2010

Tutorials

63Gramene SAB 2010

OpenHelix’s Gramene tutorial went live the end of March, 2010. As of Sept. 7, The tutorial includes a self-run tutorial as well as PowerPoint slides, handouts, and exercises. In the five months it has been available, the landing page has received 305 views, with 36 viewings of the tutorial.

Five new Gramene-produced tutorials such as this one on pathways.

Page 64: Gramene  Scientific Advisory Board December 14, 2010

Meetings and Presentations

– Presentations • PAG• Rice Technical Working Group• Maize conference• International Symposium on Integrative Bioinformatics• Evolution • ISMB• Genome Informatics• Agronomy, Crop and Soil Sciences Meeting

– ASPB curation workshop with hands-on exercises

– Other:• Gramene Retreat (CSHL, June 2010)• Plant Ensembl developers meeting (Hinxton, Sept. 2010)• Plant Reactome training workshop (CSHL, Oct. 2010)• Ken and Jim TA’d bioinformatics course (CSHL, Oct. 2010)

Page 65: Gramene  Scientific Advisory Board December 14, 2010

Letters of Support

• Wise/Dickerson, NSF-PGRP TRPGR: NextGen PLEXdb (0543441)• Ana Caicedo (UMass) The evolutionary genomics of invasive weedy

rice (0638820)• Rod Wing CPGS Oryza Genome Evolution (1026200)• Dick McCombie CPGS: Gene Discovery in Wheat (1032105)• Carolyn Lawrence, NSF-PGRP GERP: Functional Structural Diversity

Among Maize Haplotypes (0743804)• Steven Briggs, TRPGR Discovery, revision, and validation of maize

genes by proteogenomics (0924023)• Matt Vaughn, Epigenetic Variation in Maize (0922095)

Gramene SAB 2010 65

Page 66: Gramene  Scientific Advisory Board December 14, 2010

Publications• “Gramene database in 2010: updates and extensions” (Youens-Clark, et al.)

Nucleic Acids Research, 2010, 1–10 doi:10.1093/nar/gkq1148.• “Fine Quantitative Trait Loci Mapping of Carbon and Nitrogen Metabolism

Enzyme Activities and Seedling Biomass in the Intermated Maize IBM Mapping Population.” (Zhang, Chen, Buckler, et al.) Plant Physiology, in press.

• “Gramene database: a hub for comparative plant genomics.” (P Jaiswal). Methods Mol Biol. 2011;678:247-75. (invited book chapter)

• “Applications and methods utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for bioinformatics resource discovery and disparate data and service integration.” (Nelson et.al) BioData Min. 2010 Jun 4;3(1):3.

Coming Up:• “Gramene GeneTrees: A comprehensive database of phylogenetic trees in

plants and other model Eukaryotes” (Plant Phys)• RiceCyc• Diversity• Genome sequence analysis

66Gramene SAB 2010

Page 67: Gramene  Scientific Advisory Board December 14, 2010

Plant Ensembl Collaboration

• Lead: Will • EBI Participants: Paul Kersey, Paul

Derwent, Dan Staines, Andy Yates• Gramene Participants: Will Spooner,

Doreen Ware, Aaron Chuah, Shiran Pasternak, Sharon Wei

67Gramene SAB 2010

Page 68: Gramene  Scientific Advisory Board December 14, 2010

Plant Reactome Curators Meeting

Pankaj Jaiswal and Marcela Monaco organized an intensive five-day meeting (October 25-29) at CSHL with Peter D'Eustachio of New York University to learn how to use the Reactome model and software to curate plant pathways.

Other participants included Vindhya Amarasinghe (OSU), Palitha Dharmawardhana (OSU), and Hardeep Nahal (Univ. of Toronto).

68Gramene SAB 2010

Page 69: Gramene  Scientific Advisory Board December 14, 2010

• Development work on visualizing annotations from DNA Subway within Gramene’s Ensembl views

• Contribution of reference genomes for high-throughput sequencing

Gramene SAB 2010 69

Page 70: Gramene  Scientific Advisory Board December 14, 2010

Web Usage and Stats

70Gramene SAB 2010

Page 71: Gramene  Scientific Advisory Board December 14, 2010

Page Requests by Year per Month2001 - 2010

Page 72: Gramene  Scientific Advisory Board December 14, 2010

Explanation of drop in web usage

Gramene SAB 2010 72

Prior to release 29, Gramene was experiencing problems from abusive spidering by web searches on our development site. As a consequence, all indexing was disabled in our “robots.txt” file. Through an error in the release process, this file was copied to the live server, thereby refusing access to search engines. This explains the severe drop in usage by casual users finding Gramene through Internet searches. The problem has been fixed, and usage appears to be climbing again.

Page 73: Gramene  Scientific Advisory Board December 14, 2010

3-year Perspective

Gramene SAB 2010 73

Page 74: Gramene  Scientific Advisory Board December 14, 2010

Top Countries - Visits% Nov 2009 – Nov 2010

Page 75: Gramene  Scientific Advisory Board December 14, 2010

Duration of Visit

Page 76: Gramene  Scientific Advisory Board December 14, 2010

Depth of Visit

Page 77: Gramene  Scientific Advisory Board December 14, 2010

Visitor Loyalty

Page 78: Gramene  Scientific Advisory Board December 14, 2010

Gramene SAB 2010 78

Thanks, from Gramene

Page 79: Gramene  Scientific Advisory Board December 14, 2010

End

79Gramene SAB 2010