1 Building scientific Virtual Research Environments in D4Science Paul Polydoras...

1

www.d4science.org

Building scientific Virtual Research Environments in D4Science

Paul Polydoras

[email protected]

University of Athens, Greece

2

www.d4science.org

Interoperability & VRE Definition

Interoperability:"The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units“, ISO/IEC 2382-01, Information Technology Vocabulary, Fundamental Terms

VRE: a digital environment that supports researchers across disciplines in the process of creation, validation and exploitation of data, information and knowledge as individuals and as groups, supporting collaboration and fulfilling their needs in ICT resources.

Dimitris

Validate with DonatellaandJJ and put reference

3

www.d4science.org

VRE : Interoperability elements on a domain-agnostic e-Infrastructure

Resource

Software

Data

Policies

4

www.d4science.org

Resources & Software

Resources (Virtualisation & resource model (representation and use)) Resource Virtualization Common Resource Management Facilities

Lifecycle Publication/Registration Monitoring Discovery Employing / Querying

Software (empowering resources) Interface-based architecture

“Cooperate on interfaces, compete on implementations” Loosely coupled components Message exchange and handling Management Compliance with Open Standards

Dimitris

Essential to reduce the complexity of managing heterogeneous systems and to handle diverse resources in a unified way

Dimitris

An important motivation is the composition paradigm or building block approach, where a set of capabilities or functions is built or adapted as required, from a minimalist set of initial capabilities, to meet a need. No prior knowledge of this need is assumed. This provides the adaptability, flexibility and robustness to change that is required in the architecture.

5

www.d4science.org

Data & Policy interoperability

Data interoperability Schema agnostic data handling

Tolerant for structured, semi-structured and “unstructured” data

(Meta) Data / Content interoperability Brokerage Data Transformation

Policy interoperability Conceptualization Interoperating policy enforcement mechanisms

Dimitris

Data can be considered “unstructured” from a software point of view when it cannot understand its internal structure explicitly

6

www.d4science.org

D4Science Reference Model

Resource Model

Information Model

Policy Model

7

www.d4science.org

Resource Model

Dimitris

Promotes Resource Virtualization / Abstraction

Dimitris

The gCube system is a software system conceived to manage an infrastructure consisting of a set of heterogeneous entities.All such heterogeneous resources share some commonalities (gCubeResource): •Each gCube resource has a unique identifier (ID);•Each gCube resource has a type (Type) allowing to discriminate and capture the role/semantic such resource is supposed to play;•Each gCube resource has multiple scopes (Scopes) allowing to characterise the contexts the resource is supposed to operate (VO/VRE);•Each gCube resource has a profile (Profile) to capture the distinguishing features of the resource to support resource discovery and usage.Two abstract classes characterise this domain, the SoftwareResource and the SystemResource. The former is to capture the resources forming the software managed by the gCube system, the latter is to capture the rest of resources managed by the gCube system, e.g. the hosting nodes, the running services. This distinction is justified by one of the distinguishing feature of gCube, i.e. the capability to dynamically deploy software components to produce new resources by relying on other existing resources. Thus, the software forming gCube becomes itself a resource managed through gCube.For what concerns SoftwareResources the following resource typologies exist:•Service, a SoftwareResource delivering its expected functionality through a web based interface. In a Service Oriented Architecture (SOA) it is a constituent unit of the system. gCube exploits the SOA paradigm and implement it by relying of the WSRF framework. Each service is comprised of a software (ServiceLogic) implementing the service-specific business logic and zero or more SoftwareLibrary acting as helper software implementing non-service-specific logic, i.e. piece of software implementing general purpose functions, e.g. XML parse functions. •SoftwareLibrary, a SoftwareResource delivering its expected functionality in a stand-alone manner via a programmatic API. It is important to model such piece of code as resource in order to promote the reuse.•ThirdPartyLibrary, a SoftwareLibrary delivering its expected functionality in a stand-alone manner via a programmatic API. This specialisation of SoftwareLibrary is due to the need to capture the peculiarity of such software at deployment time, i.e. the fact that such a piece of software has its own deployment procedures. For what concerns SystemResources the following resource typologies exist:•gLiteResource, a SystemResource representing a gLite resource, i.e. a placeholder in the gCube infrastructure for a resource forming a gLite based infrastructure. It is further specialised in gLiteService, gLiteCE and gLiteSE to capture the main types a gLiteResource can be.•gHN, a SystemResource representing the hosting machine on which gCube can dynamically deploy a Service (along with all the needed SoftwareLibraries) to create a RunningInstance.•RunningInstance, a SystemResource representing a Service deployed in a gHN. It is the runtime manifestation of a Service and consequently the runtime implementation of the expected Service facilities.•ExternalRunningInstance, a RunningInstance representing an instance of a Service running outside the direct control and management of gCube, i.e. (re-)deployment of such a Service is not allowed since gCube does not manage the Service. An example of such a kind of RunningInstance is an up and running Web Service, e.g. one of the services forming the G-POD application , whose facilities are needed in a VRE;•ApplicationSpecificResource, a SystemResource representing a resource created and managed by a specific Service, e.g. a Collection managed by the CollectionService, a TransformationProgram managed by a MetadataBroker Service.

8

www.d4science.org

Information Model

Reference

Info-Object

Property

has

references

with

Name

has

from

to

(0,n)

(0,n) (1,1)(1,n)

Value

OID

Type

Name

Primary role Secondaryrole

(0,n)

(0,n)

(0,n)

(0,n)

content

Position

Dimitris

The elementary constructs of the model are information-objects (a node of the graph) and object references (the arcs). The ER Diagram.•An Information Object (IO) represents an elementary information unity. It is uniquely identified by an Object Identifier (OID), is labelled with a name and a type and Information optionally annotated with a number of properties. These properties are simple key-type-value associations. Finally, it can be associated with a raw-content. The raw content of an object is content of any kind. The model hides the actual storage details of the content of an object, that can be for instance stored as a file in gLite or as BLOB-field in a database, or maintained in storage facilities not under direct control of the Information Organization Services, e.g. as file stored in a remote server and accessible through some protocol like http, ftp or gridftp. •An object reference “links” two Information Objects. Each object might (i) reference many other objects and (ii) be referenced by many objects (m-n relationship). A reference is directed, it is labelled with a type attribute, called primary role, a secondary role, that may optionally further specify the function of the primary role , and a position attribute, that allows to build ordered graph structures. It can also be associate with a number of other properties. The information-object model introduced above is exposed to higher level Information Organization Services (Storage/Content Management Services). The generality of this simple information model allows to build complex data-structures. The services within the Information Organization stack build on top of this model to offer an organized, high-level view of content. This is done by attaching specialised semantics to the labels used to annotate Information Objects and references.

9

www.d4science.org

VO Model

Dimitris

[gk1]Αυτό ίσως να το βγάλεις, αν δεν έχεις να το στηρίξεις.

Dimitris

In gCube, the Virtual Organization (VO) concept is used to define authorization policies in the infrastructure. A Virtual Organization is a dynamic pool of distributed resources shared by dynamic sets of users from one or more real organizations. Resource Providers (RP) usually make resources available to other parties under certain sharing rules. Users are allowed to use resources under Resource Provider (RP) conditions and with the respect of a set of VO policies.Following this approach, in the gCube VO model a policy is defined as a permission for a user to perform an operation on a specific resource. In the model resources are univocally identified through a resource id and must belong to a resource type. Each resource type is associated to a set of logical operations. These operations can be performed over resources of that type in that model. It is worth notice that operations in the VO model are just identifiers used to define logical operations that can be performed over resources (e.g. read, modify, delete). They not necessarily identify methods exposed by resource implementation (e.g. get, put). Logical operations are useful to describe logical operations a resource exposes and map to methods provided by resource implementation at the resource side.The gCube VO model also leverages the concept of role, to decouple the association between users and permissions. Furthermore, roles are organized in hierarchies, thus allowing a natural way to capture organizational lines of authority and responsibility. Role hierarchies are not constrained to be trees; each role can have several ancestors with the only constraint that cycles are not allowed in the structure.

10

www.d4science.org

Focus on Data Interoperability in gCube

Makes excessive use of XML dataProvides dynamically constructed highly distributed data processing pipe-line

“Players” are all service resourcesProvides the mechanisms for interoperable services exchanging data

Data transfer not effectively captured by WS-* ResultSets

Is data agnostic in the core: Metadata Management, Indexing, storage do not pose restrictions on

payload. Exposes excessive configurability for handling domain-specific

requirementsOffers services for converting data and content among interoperating parties:

The Metadata Broker The gCube Data Transformation Service

Dimitris

Transforms documents from one schema to another. It is also equipped with some inference mechanisms.E.g. if it can do A schema->B schema and B->C, then also A->C.

Dimitris

Same as Metadata Broker buton multimedia objects:E.g. jpeg -> png, or wav -> mp3Now MetadataBroker lives under the gDTS umbrella.

Dimitris

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.

11

www.d4science.org

Focus on Data Interoperability in gCube : On the road

Supports multiple protocols for importing / exporting data Soon OAI-PMH compliant

WS-DAI compliance gCube concepts directly map to WS-DAI ones

OAI-ORE compliance gCube Information model directly matches the ORE model

Advanced data interoperability techniques, based on ontologies and inference

Dimitris

Database Access and Integration Services WG (DAIS-WG)Group DescriptionResearch and development activities relating to the grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organisation, authorisation, etc, for numerous applications. The group is developing standards for grid data services, focusing principally on providing consistent access to existing, autonomously managed databases from web services. By focusing on services, the intention is to ease application development through the provision of composable components. The group does not seek to develop new data storage systems, but rather to make such systems more readily usable individually or collectively within a grid framework.Group focus & ScopeThe group has been working on the development of a family of data access and integration specifications. The WS-DAI specification defines data model independent properties and operations that are shared by interfaces to different kinds of data resource. These properties are then extended and the templates instantiated by realisations - data model specific data access services. To date, the group has focused on realisations for accessing relational (WS-DAIR) and XML (WS-DAIX) data resources. The specifications for WS-DAI, WS-DAIR and WS-DAIX are have all been submitted to the GGF Recommendations track.

Dimitris

Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video. The goal of these standards is to expose the rich content in these aggregations to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. Although a motivating use case for the work is the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, the intent of the effort is to develop standards that generalize across all web-based information including the increasing popular social networks of “web 2.0”.

12

www.d4science.org

D4Science System Reference Architecture

Dimitris

In words:Three (3) layers:gCube run-time environment is the set of subsystems equipping each gCube empowered machine and forming the platform for the hosting and operation of the rest of system constituents. It provides an application framework that allows gCube services to abstract over functionality lower in the web services stack (WSRF, WS Notification, WS Addressing, etc.) and to build on top of advanced features for the management of state, scope, events, security, configuration, fault, service lifetime, and publication and discovery.gCube Infrastructure Enabling Services is the set of subsystems constituting the backbone of the gCube system and responsible to implement (i) the operation of an e-Infrastructure supporting resources sharing and (ii) the definition and operation of Virtual Research Environments. The second (middle) tier represents a higher level of virtualization and logical abstraction. The virtualization and abstraction are directed toward defining a wide variety of capabilities that can be utilized individually or composed as appropriate to provide the infrastructure required to support higher-level applications or “user” domain processes.gCube Application Services is the set of subsystems implementing facilities for (i) storage, organisation, description and annotation of information in a VRE (Information Organisation Services), (ii) retrieval of information in the context of a VRE (Information Retrieval Services) and (iii) provision of VO and VRE users with an interface for accessing such an e-Infrastructure.It is worth going into more detail on the relationship among the three tiers. The service-oriented nature of D4S implies that virtualized resources that are represented as services are peers to other services in the architecture (for example services in the middle and top tiers). The peer relationship implies that service interaction can be initiated by any service in the architecture. Furthermore the services in the second tier need to use and manage the virtualizations (resources) in the bottom tier to deliver the capabilities that an individual service (or collection/composition of services) is to provide. All of these tiers need to interoperate and work synergistically to deliver the required quality of service (QoS). Since this is the QoS of the entire system, including the application tier (or at the very least the services participating in the specific user scenario) that determines the user experience, this is designated as the “Macro Quality of Service.”

13

www.d4science.org

Sum up

D4Science/gCube provides machinery for: Operating an e-Infrastructure that supports Resource

Sharing Design, Creation, Management for Virtual Research

Environments

Its targeted execution environment / scope dictates an inherent interoperable philosophy / architecture

Every aspect (both model-based and systemic) is designed to be interoperable in principle

14

www.d4science.org

Thank you!

1 Building scientific Virtual Research Environments in D4Science Paul Polydoras...

Documents

Transcript of 1 Building scientific Virtual Research Environments in D4Science Paul Polydoras...