Blast2GO presentation @ StatSeq COST workshop
description
Transcript of Blast2GO presentation @ StatSeq COST workshop
![Page 1: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/1.jpg)
Blast2GO presentation @ StatSeq COST workshop
Friday 25th January 2013, Royal Melbourne Hospital21nd -23rd April 2013, Helsinki, Finland
![Page 2: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/2.jpg)
Why Blast2GO
Functional characterization of novel sequence data
Adapted of high throughput needs of biological laboratories
Extracting knowledge about functioning of genomes
![Page 3: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/3.jpg)
Blast2GO Impact
![Page 4: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/4.jpg)
Outline
Concepts on Functional AnnotationThe Blast2GO annotation frameworkVisualization of functional dataPathway analysis with Blast2GO
![Page 5: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/5.jpg)
Concepts of Functional AnnotationWhat is functional annotation?
How to annotate a large dataset?
![Page 6: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/6.jpg)
The Gene Ontology Three branches:
Biological Process Molecular Function Cellular Component
Annotations are given to te most specific (low) level
True path rule: annotation at a given term implies annotation to all its parent terms
Annotation is given with an Evidence Code:o IDA: inferred by direct assayo TAS: traceable author statemento ISS: infered by sequence similarityo IEA: electronic annotationo ….
More general
More specific
![Page 7: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/7.jpg)
Functional assignment
Annotation
Empirical Transference
Molecular interactions
Gene/proteinexpression
Biochemicalassay
StructureComparison
Sequenceanalysis
Identificationof folds
Motif identification
Phylogeny
Literature reference
Sequencehomology
![Page 8: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/8.jpg)
Annotation by similarity: concerns
Level of homology (~ from 40-60% is possible) The overlap between hit and query, association function and structure The paralog problem: genes with similar sequences
might have different functional specifications The evidence for the original annotation Balance between quality and quantity: depends on the use
GO1, GO2, GO3, GO4
GO1, GO2, GO3, GO4QUERY
HIT
![Page 9: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/9.jpg)
The Blast2GO annotation framework
![Page 10: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/10.jpg)
cellularcomponent
biological process
Fasta
Application scheme
![Page 11: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/11.jpg)
Application scheme
cellularcomponent
biological process
Fasta
![Page 12: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/12.jpg)
Fasta
cellularcomponentbiological
process
Application scheme
![Page 13: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/13.jpg)
Basic annotation procedure
Sq1
BlastSq2
Sq3
Sq4
Sq1
Sq2
Sq3
Sq4
Hit1Hit2Hit3Hit4
Hit1Hit2Hit3Hit4
Hit1Hit2Hit3Hit4
Sq1
Sq2
Sq3
Sq4
Hit1Hit2Hit3Hit4
Hit1Hit2Hit3Hit4
Hit1Hit2Hit3Hit4
Hit1Hit2
go1,go2, go3go1,go3, go4go3,go5, go6,go8go1,go4
go6,go9, go8go1,go8go4,go1, go8,go9
go2go2,go4, go4go2,go5, go6go2,go4
Sq1
Sq2
Sq3
Sq4
go1,go2, go3go1,go3, go4go3,go5, go6,go8go1,go4
go6,go9, go8go1,go8go4,go1, go8,go9
go2go2,go4, go4go2,go5, go6go2,go4
Mapping
Hit1Hit2
Annotation
![Page 14: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/14.jpg)
Annotation Rule
- Let be GO1…n be candidate annotations for sequence S1, obtained from hits Hi…k
- We compute an annotation score AS for each GOi that depends on:- The similarity between sequence S1 and Hj
- The evidence code of GOi
- The existence of other neigboring GO candidates- The structure of the Gene Ontology
- We define an abritary annotation threshold (AT)- S1 is annotated with GOi if its ASGOi > AT
![Page 15: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/15.jpg)
Annotation Rule
Annotation Score
Quality of source annotation:IEA=0.7, IDA = 1, NR = 0.0, ...
Similarity Requirement
GO4
GO2GO1 GO3
Possibility of abstraction
True-Path-Rule
selectivity vs.
specificity
Cut-Off Value
new annotation
![Page 16: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/16.jpg)
Blast2GO annotation rule
- When I have a GO with ECw =1 and I do not allow abstraction (GOw = 0), then the Annotation Score = %similarity
- If the ECw < 1 my similarity requirement is higher to obtain the same Annotation Score
- If I allow abstraction GOw > 0, then with less similarity I can obtain the required Annotation Score at a parent node
![Page 17: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/17.jpg)
www.blast2go.com
![Page 18: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/18.jpg)
Start Blast2GO
![Page 19: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/19.jpg)
Blast2GO Application
MainSequence
Table
(1) Blast(2) Mapping
(3) Annotation
Graph visualisationApplication messages
Blast resultsApplication statistics
Any operationwill only affect to selected sequences!!!!
![Page 20: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/20.jpg)
Load sequences
![Page 21: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/21.jpg)
Input data (in FASTA format, AA or nt)
asdf
asdf
>my_favourite_species_seq1 | still unknowngtgatggaaaagaaaagttttgttatcgtcgacgcatatgggtttctttttcgcgcgtattatgcgctgcctggattaagcacctcatacaattttcctgtaggaggtgtatatggttttataaacatacttttgaaacatctctctttccacgatgcagattatttagttgtggtatttgattcggggtcgaaaaattttcgtcacactatgtattccgaatacaaaactaatcgccctaaagcaccagaggatctgtcactacaatgtgctccgctacgtgaggctgttgaagcgtttaatattgtaagtgaagaagtgcttaactacgaagcagacgacgtaatagctacactctgtacaaaatatgcatctagtaatgttggagtgagaatactgtcagcagataaggatttactacaactcctaaatgataatgttcaagtttacgaccctataaaaagcagatacctcaccaatgaatacgttttagaaaaatttggtgtttcatcagataagttgcatattgatacggttgcatcgagttataatgagaaaattattctcagctaagctgtacaccgtttattacacactcgaaaggccgttag>my_favourite_species_seq2 | no cluettgttagctaaaaaggaagactttcacacctttggtaatggtgttggctctgctggaacaggtggagttgtagtttctgcatccatgttgtctgcggatttttcaaatcttagagaagagatagcagcggttagtacggctggtgcagattggttacacattgatgtgatggatgggtgcttcgtccccagtttgactatgggtcctgtggtgatttccggcattaggaaatgtacaaatatgtttcttgatgtgcatttgatgattaatcgcccaggcgatcatctgaagagtgtggtagatgctggagctgataagatagagcacattcgcaagatgatagaggaaagctcatcaaccgcgaaaatcgctgttgatggtggtgtttcaacggataatgcccgggctgttatcgaggcaggtgcgaatatactcgttgttggaacggcgctgtttgctgctgacgatatgagtaaagttgtaagaactttaaaatcattttaa>my_favourite_species_seq3 | just sequencedgtgggactgctcatccctgtaggcagggtggctattttttgtgtaaaggcagtctttcatagtcttgtaccgccatactatctatggataactacaaagcagttttttgaggtgtggtttttctctcttcctatagtagcagttacatctttgtttacgggaggcgcgttagcccttcaggataccctcgtgggaagcgctaaagtatcagggtaatggagtttttactcctgcaagatgtaatagagggtctggtaaaagctgtatcgtttgggctggtaatttcgctagttgggtgttacaacgggtatcactgtgagataggcgcaaggggtgtaggaacagcgacaacaaaaacttcggtagcagcttctatgctcataattttgttaaactatataattactgttttttacgcgta>my_favourite_species_seq4 | we will see soon...atgtacgctgtatctctttcaaatttgcatgtctctttcaacaacaaggaggttttgaaaggtgttgacttggacatagcatggggggattccctggttatactgggagaatctggtagtggaaagtctgtactaacaaaggttgtattgggtctaatagtgccccaagagggaagtgttactgtagatggcaccaatattcttgagaataggcagggcatcaagaattttagtgttttgtttcaaaactgtgcgttatttgacagtcttacgatttgggaaaatgtagtattcaatttccgtaggaggcttcgtttagataaggataatgccaaggctttggctttacggggattggagcttgtgggattggacgccagtgtaatgaacgtgtatcctgtggagctatcaggcgggatgaaaaagcgcgtagctttggcaagagctattataggtagtcccaaaattctaattttggatgagccaacttcgggattggatcctataatgtcttcagtggt
![Page 22: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/22.jpg)
You email adressBLAST program (normally blastx)
Number of HITs (use <= 20)
Human readable seq. Descriptions via BDA
Recommended to save as XML
BLAST database (many options)E-Value (depends on the DB)
BLAST
![Page 23: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/23.jpg)
Use your own server
Set word size and filter
Filter by description
Parsing options for own databases
Minimum HSP length
Additional BLAST params
![Page 24: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/24.jpg)
BLAST Results
RED
![Page 25: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/25.jpg)
Blast Distribution Charts
Evaluate the similarity of your sequences with public DBs
![Page 26: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/26.jpg)
Single Sequence Menu
Single Sequence Menu
![Page 27: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/27.jpg)
Mapping Results
GREEN
![Page 28: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/28.jpg)
Annotation MenuBLAST based annotation
Validation and Annex
Other Annotation modes
![Page 29: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/29.jpg)
Annotation
Allows to set a minimum percentage of the HIT sequence which should be expand by the QUERY sequence
This helps to avoid the problem of cis-annotation
![Page 30: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/30.jpg)
Annotation Result
BLUE
![Page 31: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/31.jpg)
Annotation Charts
![Page 32: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/32.jpg)
Annotation Charts
Commonly, level 5 is the most abundant specificity level in the Gene Ontology
![Page 33: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/33.jpg)
Recovers implicit biological process and cellular
component GO terms based on molecular function annotations
Biological Process Cellular Component
acts inis involved in
Myhre et al, Bioinformatics 2006
Additional Annotation: ANNEX
Molecular Function
![Page 34: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/34.jpg)
Additional Annotation: InterProScan
Results are stored at your computer as XML files. You can upload them later
Once you have completed your InterProannotation, results can be transformed toGO terms and merged to Blast annotation
Runs InterProScan searches at the EBI through Blast2GO
![Page 35: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/35.jpg)
InterProScan Results
Column with InterProScan results
![Page 36: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/36.jpg)
Additional Annotation: GOSlim
GOSlim is a reduction of the Gene Ontology to a more reduced vocabulary → Helps to summarize information
After GOSlim transformation sequences get YELLOW
Different GOSlims available at Blast2GO
![Page 37: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/37.jpg)
Enzyme annotation and Kegg Maps GO Enzyme Codes KEGG maps
![Page 38: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/38.jpg)
Manual Curation
You can modify manually annotation of particularsequences
If you click in this box, curated sequences get purple
![Page 39: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/39.jpg)
Export Results
Saves the complete B2G project (heavy)
Export annotation results in different formats
![Page 40: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/40.jpg)
Export formats
C04018A02 glyoxalase i GO:0004462 F:lactoylglutathione lyase activityC04018C02 metallothionein-like protein GO:0046872 F:metal ion bindingC04018G02 protein phosphatase GO:0008287 C:protein serine/threonine phosphatase complex
C04013E10 response to water deprivation; regulation of transcription; multicellular organismal development; response to abscisic acid stimulus; nucleus; transcription factor activity; C04013A12 translation; ribosome; plastid; structural constituent of ribosome; C04013C12 galactose metabolic process; plastid; aldose 1-epimerase activity; carbohydrate binding;
By Seq
GeneSpring Format
C04018C10 4707,9409,6979,10200,5524,169C04018A12 16798,272,44248C04018C12 4869,12505,8233
GoStat
C04018C10 GO:0004707 mitogen-activated protein kinase 3C04018C10 EC:2.7.11.24C04018A12 GO:0016798 class iv chitinaseC04018A12 GO:0000272
.annot Also for import!
![Page 41: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/41.jpg)
More export formats
Sequence name Sequence desc. Sequence lengthHit desc. Hit ACC E-Value Similarity Score Alignment lengthPositivesC04018C10 mitogen-activated protein kinase 3 717 gi|122894104|gb|ABM67698.1|mitogen-activated protein kinase [Citrus sinensis]ABM67698 1.35E-123 99 445.28 222 221C04018E10 ---NA--- 706 gi|157356307|emb|CAO62459.1|unnamed protein product [Vitis vinifera]CAO62459 2.69E-036 83 155.22 119 99C04018G10 protein 620 gi|114153154|gb|ABI52743.1|10 kDa putative secreted protein [Argas monolakensis]ABI52743 7.47E-015 63 83.57 90 57C04018A12 class iv chitinase 715 gi|3608477|gb|AAC35981.1|chitinase CHI1 [Citrus sinensis]AAC35981 1.45E-061 78 239.2 171 134C04018C12 cysteine proteinase inhibitor 663 gi|8099682|gb|AAF72202.1|AF265551_1cysteine protease inhibitor [Manihot esculenta]AAF72202 9.33E-025 83 116.7 99 83C04018E12 protein phosphatase 2c 663 gi|46277128|gb|AAS86762.1|protein phosphatase 2C [Lycopersicon esculentum]AAS86762 2.76E-077 91 291.2 180 164C04018G12 alpha beta fold family protein 578 gi|147865769|emb|CAN83251.1|hypothetical protein [Vitis vinifera] >gi|157339464|emb|CAO44005.1| unnamed protein product [Vitis vinifera]CAN83251 1.67E-084 94 314.69 179 169C04018A02 glyoxalase i 600 gi|2213425|emb|CAB09799.1|hypothetical protein [Citrus x paradisi]CAB09799 2.16E-064 81 248.05 114 93C04018C02 metallothionein-like protein 625 gi|3308980|dbj|BAA31561.1|metallothionein-like protein [Citrus unshiu]BAA31561 2.23E-014 100 82.03 40 40
Seq. Name Seq. Description Seq. Length #Hits min. eValuemean Similarity#GOs GOs Enzyme Codes InterProScanC04018C12 cysteine proteinase inhibitor 663 20 25 80.00% 3 F:GO:0004869; C:GO:0012505; F:GO:0008233IPR000010; IPR018073; noIPRC04018E12 protein phosphatase 2c 663 20 77 85.00% 2 N:GO:0015071; F:GO:0003824 IPR001932; IPR014045; IPR015655; noIPRC04018G12 alpha beta fold family protein 578 20 84 79.00% 4 F:GO:0016787; C:GO:0005739; C:GO:0009507; P:GO:0006725noIPRC04018A02 glyoxalase i 600 20 64 74.00% 2 P:GO:0005975; F:GO:0004462EC:4.4.1.5 IPR004360; noIPRC04018C02 metallothionein-like protein 625 18 14 74.00% 1 F:GO:0046872 IPR000347C04018E02 haemolysin-iii related familyexpressed 612 20 32 72.00% 1 C:GO:0016020 noIPRC04018G02 protein phosphataseexpressed 645 20 97 81.00% 5 C:GO:0008287; N:GO:0015071; P:GO:0006470; C:GO:0009536; C:GO:0005739no IPS matchC04018C04 phosphoglycerate bisphosphoglycerate mutase family protein780 20 63 66.00% 2 P:GO:0008152; F:GO:0003824 IPR001345; IPR013078; noIPRC04018E04 polyubiquitin 707 20 115 99.00% 2 P:GO:0006464; C:GO:0005622 IPR000626; IPR019954; IPR019955; IPR019956; noIPRC04018G04 meiotic recombination 11 575 20 45 89.00% 21 C:GO:0019013; P:GO:0007126; F:GO:0004519; F:GO:0005509; F:GO:0004871; C:GO:0005739; F:GO:0030145; P:GO:0006302; P:GO:0045449; F:GO:0008289; P:GO:0042157; F:GO:0003677; P:GO:0006869; C:GO:0030089; P:GO:0007165; F:GO:0004527; P:GO:0015979; C:GO:0005576; F:GO:0005198; C:GO:0005634; P:GO:0006118IPR003701; IPR004843; noIPRC04018A06 late embryogenesis-abundant protein 648 20 43 68.00% 2 P:GO:0009737; P:GO:0009409 no IPS match
Export Sequence Table
Export BestHit Data
![Page 42: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/42.jpg)
Sequence Selection
Sequence Selection tool toobtain a selection based onannotation status
![Page 43: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/43.jpg)
Sequence Selection
By Name/Description By Function
![Page 44: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/44.jpg)
View Menu
Functions to switch between displaying IDs or descriptions for GO annotation or InterPro results
![Page 45: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/45.jpg)
Hands-on I
Annotation 10 seqs with Blast2GO
![Page 46: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/46.jpg)
VisualizationHow to understand the functional context of a
annotated dataset
![Page 47: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/47.jpg)
Each term has a number of sequences associated
Nodes can be coloured to indicate relevance
Each term is displayedaround its biological context
Node shape to differentiate betweendirect and indirect annotation
Combined Graph
![Page 48: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/48.jpg)
Different GO branches
Reduces nodes by numberof annotate sequences
Criterion for highlighting and filtering nodes
Node data to be displayed
Combined Graph
![Page 49: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/49.jpg)
Accumulated by GO term(Sequence Count)
31
4
5
1 3
1
Incomming information(Node Score)
31
2.4
2.5
1
1 3
Node information content
Σ seq(g)*α dist (g, g')
g desc(g')∈
![Page 50: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/50.jpg)
Compacting Graphs by GO-Slim
![Page 51: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/51.jpg)
Saving Options
Save as picture and as txt
![Page 52: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/52.jpg)
Graph Charts
![Page 53: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/53.jpg)
Graph Charts
• Sequence Distribution/GO as Multilevel-Pie (#score or #seq cutoff)
• Sequence Distribution/GO
as Bar-Chart
• Sequence Distribution/GO as Level-Pie (level selection)
![Page 54: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/54.jpg)
Analysis of a specific functionHow many sequences are annotated to the function “photosynthesis”?
Option 1: Find in the GO graph -> direct & indirect annotation Option 2: Find through the Select function. Two sub-options
Option 2.1. Direct annotation (use GO-ID or description) Option 2.2. Direct & indirect (use GO-ID and “include GO
parents”)
![Page 55: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/55.jpg)
Find a function on the graph
searchexport
Analysis of a specific function
![Page 56: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/56.jpg)
Exporting sequence table you see sequencesAnnotated to the function
Analysis of a specific function
![Page 57: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/57.jpg)
Select all sequences annotatedto this function and its descendents
Analysis of a specific function
![Page 58: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/58.jpg)
Locate these sequences
Analysis of a specific function
![Page 59: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/59.jpg)
Hands-on II
Summary statisticsVisualize & Search
![Page 60: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/60.jpg)
Pathway analysis with Blast2GOWhich cellular functions are important in my
experiment
![Page 61: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/61.jpg)
Biosynthesis 54% Biosynthesis 18%
Sporulation 18% Sporulation 18%
One Gene List (Responsive genes)
The other list (Non responsive genes)
Are this two groups of genes
carrying out different
biological roles?
Functional Enrichment Analysis
Are pathway frequencies different?
![Page 62: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/62.jpg)
Biosynthesis 54% Biosynthesis 18%
Sporulation 18% Sporulation 18%
95No biosynthesis26BiosynthesisBAGenes in group A have not
significantly to do with biosynthesis nor sporulation.
Fisher's Exact TestC
ontingency table
p-value for Biosynthesis = 0.0913
One Gene List (Responsive genes)
The other list (Non responsive genes)
![Page 63: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/63.jpg)
Multiple testing correctionWe do this for all GO term of our dataset!!!
Many tests => Many false positive => We need correction!
FDR control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses.
FWER control: The familywise error rate is the probability of making one or more false discoveries among all the hypotheses when performing multiple pairwise tests. (more conservative)
![Page 64: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/64.jpg)
Different types of comparisons
Compare two equivalentconditions (root vs leaves)
Remove Common Ids Test and Ref-Set are
interchangeable
Set 1 Set 2
Common IDs
Compare a subset against the total
Common ids removed from reference
Test and Ref-Set are NOT interchangeable
Test-Set
Ref-Set
Common IDs
Test-Set
Ref-Set
Common IDs
![Page 65: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/65.jpg)
FET in Blast2GO
Two-Tailed test not only identifies over but also under represented functions.
If no Ref-Set is chosen all annotations are used as reference
![Page 66: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/66.jpg)
FatiGO ResultsResult table with link out to sequence lists
![Page 67: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/67.jpg)
Most specific terms
Retains only the lowest, most specific enriched term per GO branch
![Page 68: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/68.jpg)
Enriched Graph View enriched terms data as DAG graphs!
reduce
=> To draw all nodes, set filter to 1
![Page 69: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/69.jpg)
Hands-on III
Enrichment Analysis
![Page 70: Blast2GO presentation @ StatSeq COST workshop](https://reader031.fdocuments.net/reader031/viewer/2022011717/56816664550346895dd9f513/html5/thumbnails/70.jpg)
Concluding Remarks
Blast2GO is a versatile tool for the annotation of sequence data
Blast2GO uses controlled vocabularies and a elaborated annotation rule to
generate GO labels
Visualization and data mining functions help to understand the functional
content of your dataset