Free Open-Source, Open-Platform System for Information Mash-Up and Exploration in Earth Science
description
Transcript of Free Open-Source, Open-Platform System for Information Mash-Up and Exploration in Earth Science
Free Open-Source, Open-Platform System for
Information Mash-Up and Exploration in Earth Science
Tawan Banchuen, Will Smart,
Brandon Whitehead,Mark Gahegan,
Sina Masoud-Ansari
Center for eResearch & School of EnvironmentThe University of Auckland
Overview
1. Introduction and background to project2. Application Development
– Software system for integrating, browsing and understanding large information bases
3. Demonstration / sample results4. Conclusion
Components of knowledge computing
Rich descriptions of resource meaningRecommender systemsFinding analogous situationsKnowledge evaluation
Ontology alignment toolsFilters and query tools for locating resourcesKnowledge visualization tools (e.g. ConceptVista, CMap, ThinkBase)
Workflow description Metadata scrapingOntology captureUse-case captureTag clouds
Ontologies, controlled vocabularies, taxonomiesMetadata Knowledge basesRDF/OWL/KIF
4
What is an Ontology?
• An ontology describes what we know or what is true, via a kind of logic
• An ontology can be as simple as a concept map showing terms used to describe a topic and the relationships between those terms
Topic
Terms
The problem
• Knowledge leaks from organizations– Some gets forgotten– Some leaves with its container– Some gets buried or lost in the infrastructure
• We are very poorly equipped to care for knowledge in computational infrastructure– Can we ‘surface’ more of the knowledge implicitly held in
unstructured documents?– If so, can we put it to use effectively?
Complete conceptual neighborhood of a document
ConceptVista, Gahegan et al.
Methods
Lab Books
Preprints
Data
Video
Blogs
Podcasts
Codes
Algorithms
Models
Presentations
Ontologies
IntermediateResults
Related Articles
Comments& Reviews
Plans
Reproducible, transparent science Composite research components
Carole Goble, UK eScience
Methods
Lab Books
Preprints
Data
Video
Blogs
Podcasts
Codes
Algorithms
Models
Presentations
Ontologies
IntermediateResults
Related Articles
Comments& Reviews
Connections run both ways…an open, linked web of science
Plans
Carole Goble, UK eScience
Application Development
Software system for integrating, browsing and understanding large information bases
Alfred & SemDat IntegrationData Sources • Geospatial Data - Geoserver & Mapserver• Ontological Data - Sesame• Documents - webpages, PDFs, reportsVisualization • Map• Concept graph• Concept tree• Web browserAnalysis methods • Visual exploration• Relevant measurement• Spatial and ontological queries
• The application has the following basic module types:
Single Sourcing
• Eclipse is used as the base– Stable and industry-standard– Enables advanced coordination between our modules and many
available third party modules
• The display modules provide a view on the dataset with rich interactivity– A user can focus on the information they want.
• The query engine is the smarts– Determines which information is relevant to the current selection– Determines how that information should be displayed
Style queries mark-up displayed information based on semantics:
• Standards:– Eclipse – Industry-standard base with standardized plug-in format
• NeOn – Existing eclipse application providing useful ontological plug-ins• uDig – Existing eclipse application providing useful mapping and browser
plug-ins• Open source• Open standard• Active communities
– OWL/RDF – Industry standard for representing ontologies– SPARQL – Query language– Jython/Python – Advanced styling and rendering of data
Geographic Context (Map View)
Analysts can gain insights from geographic relationships between cases
• Distance – possible physical/chemical interactions, team collaboration
• Clusters – successes and failures• Patterns – successes restricted to a particular team• Possible explanations/theories
Drill Down to Related Document
Analysts can drill down to investigate anindividual abstract/article for more details
We need far better information filters
Demonstration / sample results
Conclusions
• We are drowning in data / information / knowledge, yet are rewarded for producing more, not less
zero sum game: if we are writing more, we must be reading less…• Describing documents and other digital artifacts according to a variety
of different facets holds considerable promiseThe semantic web is providing many ways to describe data collectionsWe may not be able to capture what things mean directly, but we can provide some useful signifiers (clues)
• The traces that individuals leave behind can be very useful, both to themselves and to others.
And it is comparatively inexpensive to capture and analyse • Trust: Researchers need commitments over data custodianship that they
can rely on into the long term. Not 4 year funding cycles for nationally significant datasets
Questions?
Tawan Banchuen, PhDLecturer at Auckland University
[email protected]://eresearch.auckland.ac.nz