Scripps bioinformatics seminar_day_2
Transcript of Scripps bioinformatics seminar_day_2
![Page 1: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/1.jpg)
Day 2 of Computing on the shoulders of
giants: how existing knowledge is represented and applied in
bioinformaticsBenjamin Good
[email protected] Professor of the Department of
Molecular and Experimental Medicine
![Page 2: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/2.jpg)
Recap from Day 1• Make things (articles, genes,
antibodies, etc.) easier to find• Answer questions• Generate hypotheses
Controlled vocabularies (MeSH)Ontologies (Gene Ontology)
knowledge graphs on the Web: the SPARQL query language
knowledge plus computation = inference, the ABC model
![Page 3: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/3.jpg)
Computing with knowledge• Challenges with knowledge graphs
• Too much data• ->> query, sort, visualize, interact
• Not enough data• ->> mine for more..
• Goal for practical day: Go beyond PubMed! • gain hands on experience using a knowledge graph
• either with tools built for the purpose or with your own code…
![Page 4: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/4.jpg)
Assignment: knowledge graph to hypothesis• Option 1 Coding
• Implement and apply an ABC Model style hypothesis generating program (can adapt from example provided)
• explain its logic, explain how you used it to generate a hypothesis, explain the hypothesis (provide a visual)
• Option 2 Non-coding• Use a knowledge discovery application(s) (list provided) to define a new hypothesis• if you can’t think of where to start, try to explain why Metformin may contribute to cancer survival
• Assignment deliverables: a document containing • the inputs you gave to your program or the online tool(s) you used• what was generated in response and the underlying logic • an image and text describing the results, especially any hypothesis you could derive
• (for Option 1 also submit any code written or files generated as a tar or zip archive)
![Page 5: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/5.jpg)
Online tools for knowledge discovery• http://knowledge.bio (* we make this one…)• http://www.biograph.be (this is a good tool, but often breaks down) • http://epiphanet.uth.tmc.edu (also on the flaky side, but can be good) • https://skr3.nlm.nih.gov/SemMed/ (works okay, requires a (free)
account) • http://arrowsmith.psych.uic.edu (ugly interface, but good tool)
![Page 6: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/6.jpg)
Demos• http://knowledge.bio • http://www.biograph.be• http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/start.cgi
![Page 7: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/7.jpg)
![Page 8: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/8.jpg)
Example question: repurposing all drugs
http://tinyurl.com/hwm9388
?drug
?disease
interacts with
protein
geneencoded by genetic association
treats??
![Page 9: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/9.jpg)
Example program (feel free to follow or adapt to your interest)• Example
• Input = a disease (A)• Output = a ranked list of drugs (C) that might be used for treatment
• Render the results of your workflow as a cytoscape network that illustrates the reasoning behind the predictions
• Implementation• Python• Use a SPARQL endpoint such as http://query.wikidata.org
• + identify and use another endpoint (e.g. EBI, UniProt)• ++ access pubmed articles and MeSH indexing
![Page 10: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/10.jpg)
Python setup• pip install RDFLib, SPARQLWrapper, pandas…. • Hopefully Jupyter already installed ? else install it http://
jupyter.readthedocs.io/en/latest/install.html • get notebook from https://
github.com/SuLab/sparql_to_pandas/blob/master/SPARQL_pandas.ipynb • go to directory where you put the notebook• run it with• >jupyter notebook• should be ready to run
![Page 11: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/11.jpg)
the notebook• will run a basic search for disease-gene-drug connections in wikidata• will sort the results by the number of intervening genes• will export the data to a tab-delimited file you can view in Excel, text
editor, or load into cytoscape• Your job:
• Run it and extend it by one or more of:• adapting the query• changing the way the results are sorted• working with the output in cytoscape to produce an informative visualization
![Page 12: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/12.jpg)
example output rendered in cytoscape
![Page 13: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/13.jpg)
Other queries from Day 1 (slides 48-54)• Drugs that target a cancer and impact a specific biological process
• http://tinyurl.com/j222k6g
• Drugs that target a new disease linked via biological pathway with shared genes to disease the drug is now used to treat
• http://tinyurl.com/gpfr9kj
![Page 14: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/14.jpg)
Possible inputs for adaptations• Browse and examine wikidata.org to see what you might make use of
• e.g. • Type of physical interaction between gene and drug• Gene ontology annotation (what evidence codes?)• Disease ontology hierarchy• Drug characteristics
![Page 15: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/15.jpg)
Other possible knowledge sources • SPARQL
• UniProt http://sparql.uniprot.org • EBI SPARQL https://www.ebi.ac.uk/rdf/documentation/sparql-endpoints • look for unique identifiers on genes and proteins that you can use to link
wikidata content to their content
• Text• use the NCBI the E-utils API to programmatically access pubmed articles and
MeSH indexing http://www.ncbi.nlm.nih.gov/books/NBK25501/ • Can use to build co-occurrence networks of e.g. MeSH terms
![Page 16: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/16.jpg)
Good luck! Ask questions!
![Page 17: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/17.jpg)
ABC ranking algorithms• Out of all C, which are most strongly
related to A?• Rank by N shared B concepts
• c2: 4• c4:3• c1: 1• c3: 1• c5:1• c6:1
• Next level: adjust to down-weight highly connected nodes
A B Cc1c2c3c4c5c6
![Page 18: Scripps bioinformatics seminar_day_2](https://reader035.fdocuments.net/reader035/viewer/2022062503/587b38041a28ab057d8b76f3/html5/thumbnails/18.jpg)
ABC ranking algorithms – advanced (require large networks to be useful) • Wren – Average Minimum Weight (AMW) (Wren)
• http://bioinformatics.oxfordjournals.org/content/20/3/389.full.pdf
• Linking Term Count with Average Minimum Weight (LTC-AMW) (Yetisgen-Yildiz and Pratt)
• https://www.researchgate.net/publication/23759128_A_new_evaluation_methodology_for_literature-based_discovery_systems
• Predicate inter-dependence (Rastegar-Mojarad)• https://s3.amazonaws.com/uploads.hipchat.com/25885/154162/UaGvvQqbr
hPBAWN/A%20new%20method.pdf