DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC...

46
DiGIR 1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley

Transcript of DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC...

Page 1: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 1

Distributed Databases and Applications

John Wieczorek

Museum of Vertebrate Zoology, UC Berkeley

Page 2: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 2

Distributed Databases Multiple sources of data …under local control, …with concepts in common …and a desire to deliver data as part of a

community.

Page 3: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 3

Distributed Databases The Species Analyst (TSA) The Integrated Taxonomic Information System (ITIS) FishNet The Mammal Networked Information System (MaNIS) HerpNET The Ornithological Information System (ORNIS) …

Page 4: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 4

Distributed Databases European Natural History Science Information

Network (ENHSIN) Biological Collection Access for Europe (

BioCASE) Australia Virtual Herbarium (AVH) Red Mundial de Información Sobre

Biodiversidad, Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (REMIB, CONABIO)

Page 5: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 5

Distributed Databases Mountain and Plains Spatio-Temporal Database-

Informatics (MaPSTeDI) Ocean Biogeographic Information System (OBIS) Pacific Basin Information Node, National Biological

Information Infrastructure (PBIN, NBII) Species Link, Centro de Referência em Informação

Ambiental (Species Link, CRIA) A Virtual Herbarium of the Chicago Region (vPlants) Spatial Analysis of Local Vegetation Inventories Across

Scales (SALVIAS) …

Page 6: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 6

Distributed Databases Berkeley Natural History Museums (BNHM) Association of Biological Collections, UC Davis …

Page 7: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 7

Distributed Databases LifeMapper Global Biodiversity Information Facility (GBIF)

Page 8: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 8

Distributed vs. centralized Multiple sources of data …under local control, …with concepts in common …and a desire to deliver data as part of a

community

Page 9: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 9

Distributed vs. centralized

In other words, distribute the headache rather than have one central migraine.

Page 10: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 10

DiGIRDistributed Generic Information Retrieval

John Wieczorek, Stan Blum, Dave Vieglais, P.J. Schwartz

Page 11: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 11

Project Rationale To avoid multiple incongruous

development efforts To pool resources and create a community

of experts To solve the problem of scalability

Page 12: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 12

Project Goals To define a protocol for retrieving

structured data from multiple, heterogeneous databases across the Internet

To build a reference implementation of both provider and portal software using said protocol

Page 13: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 13

Design Goals To use open protocols and standards, such

as HTTP and XML To decouple the protocol, software and

semantics To make new data provider installations as

easy as possible To have open source development and

GNU General Public Licensing

Page 14: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 14

DiGIR ArchitectureUser InterfaceProtocolPortal EngineProvider

Page 15: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 15

DiGIR ArchitectureProvider

Page 16: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 16

DiGIR ArchitectureProviderRegistry

Page 17: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 17

DiGIR ArchitecturePortal Engine

Page 18: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 18

DiGIR ArchitecturePortal EngineRegistry

Page 19: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 19

DiGIR ArchitectureUser Interface

Page 20: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 20

DiGIR ArchitectureUser InterfaceProtocolPortal Engine

Page 21: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 21

DiGIR ArchitectureUser InterfaceProtocolPortal EngineProtocolProvider

Page 22: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 22

DiGIR ArchitectureUser InterfaceProtocolPortal EngineProtocolProvider

Page 23: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 23

DiGIR ArchitectureUser InterfaceProtocolPortal Engine

Page 24: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 24

DiGIR Component Summary

Page 25: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 25

DiGIR Protocol Defines request and response message

formats for communication between provider, portal engine, and user interfaces Metadata requests Search requests Inventory requests

Remains unfettered by the structure of the data it transfers

Page 26: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 26

Portal Engine The entry point for a “user” Can query a registry for

potential providers Can determine, based on

provider metadata, whether a provider should be queried

Can send requests to multiple providers

Communicates via protocol compliant messaging only

Page 27: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 27

Portal Engine, continued Assembles responses

from providers Returns packaged results

to the “user” Logs activity

Page 28: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 28

Provider Receives requests Retrieves data from database Sends results to requestor Supplies metadata to describe

data classification and availability

Logs requests

Page 29: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 29

Registry Supports provider

“advertising” May be global and open May be private Need not be used at all Example: Universal

Description, Discovery and Integration (UDDI)

Page 30: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 30

User Interfaces Must be able to assemble and

send a request document to a portal

Must be able to receive and interpret a response document from the portal

This is where the real fun is!

Page 31: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 31

Example Network Configurations

Page 32: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 32

BNHM Network Configuration

PHMAWorking

Database

OnlineDatabase

UCBGWorking

Database

DiGIRProvider

BNHMDiGIRPortal

UCJEPSWorking

Database

OnlineDatabase

UCMPWorking

Databases (4)

OnlineDatabase

EssigWorking

Database

OnlineDatabase

OnlineDatabase

BNHMPresentation

Layer

Page 33: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 33

MaNIS Network Configuration

WorkingDatabase

OnlineDatabase

WorkingDatabase

DiGIRProvider

MaNISDiGIRPortal

WorkingDatabase

OnlineDatabase

WorkingDatabase

OnlineDatabase

WorkingDatabase

OnlineDatabase

OnlineDatabase

MaNISPresentation

Layer

DiGIRProvider

MaNISDiGIRPortal

MaNISPresentation

Layer

DiGIRProvider

MaNISDiGIRPortal

MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

Page 34: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 34

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 35: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 35

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 36: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 36

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 37: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 37

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 38: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 38

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

TTUFoxPro

Database

OnlineMS AccessDatabase

UWBM4D-Mac

Database

OnlineMS AccessDatabase

CASSQL ServerDatabase

OnlineSQL ServerDatabase

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

LACM-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

Page 39: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 39

Other Network Configurations

WorkingDatabase

OnlineDatabase

WorkingDatabase

DiGIRProvider

DiGIRProvider

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

WorkingDatabase

OnlineDatabase

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRPortal

Page 40: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 40

DiGing a little deeper

Page 41: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 41

Provider Installation Web server (Apache, IIS, etc.) PHP: Hypertext Preprocessor

(PHP) Provider software (DiGIR)

Configuration tool Testing scripts Provider scripts Provider manual (DiGIR)

Page 42: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 42

Provider Configuration Tool Provider metadata Resources Database connection Establishing table

relationships Concept to column (i.e.,

field, attribute) mapping

Page 43: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 43

Portal Configuration Web server (Apache, IIS, etc.) Sun Java 2 (JDK 1.4) Tomcat (Apache) Portal software (DiGIR) Portal installation

documentation (DiGIR)

Page 44: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 44

Portal Installation Engine configuration file

(finding providers) Presentation configuration

file (defining the Information Domain)

Presentation customization Engine start and stop scripts Presentation start and stop

scripts

Page 45: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 45

Portal Demonstrations

Page 46: DiGIR1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley.

DiGIR 46

DiGIR Project Information The DiGIR project is a collaborative effort DiGIR is currently established as an open

source development project on SourceForge (https://sourceforge.net/projects/digir).

Further documentation is available on the DiGIR web site (http://digir.net).