Data R&D Issues for GTLData R&D Issues for GTL
Data and Knowledge SystemsData and Knowledge Systems
San Diego Supercomputer CenterSan Diego Supercomputer Center
University of California, San DiegoUniversity of California, San Diego
Bertram LudäscherBertram Ludä[email protected]@sdsc.edu
Data R&D Issues for GTLData R&D Issues for GTL GTL data management infrastructureGTL data management infrastructure Service-oriented Data GridsService-oriented Data Grids for for
Seamless data sharing (volume, distribution, access restrictions, …) Capabilities for data integration (mediators/warehouses), digital library functions, knowledge-based
(“semantic”) extensions (e.g. ontologies), and archival capabilities Data analysis and knowledge-enabling infrastructureData analysis and knowledge-enabling infrastructure
Analytical PipelinesAnalytical Pipelines (“ (“Scientific WorkflowsScientific Workflows”)”) Rapid design and prototyping, handling of complex data & task semantics, large volume, sci. workflow as
a first-class product, validation, execution, monitoring, sharing, archiving How to go from a scientist’s abstract (conceptual) workflow to a data grid execution plan?
New Model Management and Knowledge Representation Technologies New Model Management and Knowledge Representation Technologies :: Closing the gap between data management (DBMS’s, data grids) and knowledge-based systems (desktop-
oriented, rule-based systems) and analysis and modeling systems Mapping between numerous formalisms at the syntactic, structural, and semantic level (terminological,
process-semantics, …) “Gluing” together models and formalisms across different levels: from genes to proteins to molecular
machines to microbial communities…(compare: pnp transistors, boolean circuits, assembly language, high-level PLs , declarative QLs, … ) abstraction & elaboration mechanisms
Data exploration and hypothesis generation tools (KNOW-ME, SKIDL, SEEK AMS, …) Computational facilitiesComputational facilities
Use of high-end networked facilities a la Use of high-end networked facilities a la TeraGridTeraGrid Opportunities (and challenges!) in leveraging related efforts:Opportunities (and challenges!) in leveraging related efforts:
NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, …NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … Standardization Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference
mechanisms, …), scientific workflow standards, …mechanisms, …), scientific workflow standards, … interoperable, open source tools interoperable, open source tools One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive”
(capturing implicit domain knowledge, hidden assumptions, …)(capturing implicit domain knowledge, hidden assumptions, …)
Bonus MaterialBonus Material (beyond 1 slide limit ;-) starts here …(beyond 1 slide limit ;-) starts here …
Up & Down: Abstraction & Elaboration MechanismsUp & Down: Abstraction & Elaboration Mechanisms
KnowledgeMgmt
Information Mgmt
Data Management
How to punch through the technology barriers?• Data Grids • vs Digital Libraries • vs DBMS’s • vs Knowledge-Based Analysis & Modeling Systems
Biomedical Informatics Research NetworkBiomedical Informatics Research Network
Biomedical InformaticsResearch Networkhttp://nbirn.net
Biomedical InformaticsResearch Networkhttp://nbirn.net
Getting Formal: Source ContextualizationGetting Formal: Source Contextualization & Ontology Refinement in Logic & Ontology Refinement in Logic
Scientific Data IntegrationScientific Data Integration ... Questions to Queries ...... Questions to Queries ...
What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?
How does it relate to host rock structures?
?Information Integration
Geologic Map(Virginia)
GeoChemicalGeoPhysical
(gravity contours)GeoChronologic
(Concordia)Foliation Map(structure DB)
“Complex Multiple-Worlds”
Mediation
domain knowledge
Database mediationData modeling
Knowledge Representation:ontologies, concept spaces
raw data
GeoSciences Network
Geologic Map Integration: Geo & IT/CS meetGeologic Map Integration: Geo & IT/CS meet
domainknowledge
domainknowledge
Knowledge r
epresentatio
n
AGE ONTOLOGY
NevadaNevada
Geoscientists + Computer Scientists Igneous Geoinformaticists+/- Energy
GEON Metamorphism Equation:
+/- a few hundred million years
Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,..Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. ““Analysis & Modeling SystemAnalysis & Modeling System” to design, execute, reproduce/refine scientific ” to design, execute, reproduce/refine scientific
workflows in the ecology and biodiversity domains.workflows in the ecology and biodiversity domains.
SEEK Project
Overview
ASx ASy ASzTS1TS2
Semantic MediationEngine
Data Binding
Query Processing
ECO2
Logic Rules ECO2-CL
Analytical Pipeline (AP)
SMS: SemanticMediation System
EcoGrid
provides unified access to Distributed Data Stores , Parameter Ontologies, & Stored Analyses, and runtime capabilities via the Execution Environment
Semantic Mediation System & Analysis and Modeling System use EcoGrid web services, enabling analytically driven data discovery and integration
SEEK is the combination of EcoGrid data resources and information services, coupled with advanced semantic and modeling capabilities
AM: Analysis and Modeling System
ASr
Parameters w/ Semantics
CC
C
CC
CParameterOntologies
WSDL WSDL
SRB KNB
MC
Species
WrpDar
...
Raw data setswrappedfor integrationw/ EML, etc.
ECO2 TaxOn
EML
etc.
Execution Environment
SAS, MATLAB,FORTRAN, etc
Library of Analysis Steps, Pipelines& Results
Invasive speciesover time
ASr
WSDL
Example of “AP0”
AP0
Top Related