XML Metadata Services

11

XML Metadata Services

SKG06 http://www.culturegrid.net/SKG2006/

Guilin China November 3 2006

Mehmet S. Aktas, Sangyoon Oh, Geoffrey C. Fox and Marlon Pierce

Presented by Geoffrey Fox: Computer Science, Informatics, Physics

Pervasive Technology Laboratories

Indiana University Bloomington IN 47401

[email protected]

http://www.infomall.org

22

Different Metadata Systems There are many WS-* specifications addressing meta-data

defined broadly• WS-MetadataExchange• WS-RF• UDDI• WS-ManagementCatalog

And many different implementations from (extended) UDDI through MCAT of the Storage Research Broker

And of course representations including RDF and OWL Further there is system metadata (such as UDDI for core

services) and metadata catalogs for each application domain such as WFS (Web Feature Service) for GIS (Geographical Information Systems)

They have different scope and different QoS trade-offs• e.g. Distributed Hash Tables (Chord) to achieve scalability in large scale networks

• WS-Context• ASAP• WBEM

• WS-GAF

33

Different Trade-offs It has never been clear how a poor lonely service is meant to

know where to look up meta-data and if it is meant to be thought up as a database (UDDI, WS-Context) or as the contents of a message (WS-RF, WS-MetadataExchange)

We identified two very distinct QoS tradeoffs 1) Large scale relatively static metadata as in (UDDI) catalog of

all the world’s services 2) Small scale highly dynamic metadata as in dynamic workflows

for sensor integration and collaboration • Fault-tolerance and ability to support dynamic changes with

few millisecond delay• But only a modest number of involved services (up to 1000’s

in a session)• Need Session NOT Service/Resource meta-data so don’t use

WS-RF

44

Hybrid WS-Context ServiceArchitecture and Prototype

55

WS-Context compliant XML Metadata Services

We designed and built a WS-Context compliant XML Metadata services supporting distributed or central paradigms. This service,

supports extensive metadata requirements of rich interacting systems, such as • correlating activities of widely distributed services, EX:

workflow style GIS Service Oriented Architectures, AND• optimizing Grid/Web Service messaging performance, EX:

mobile computing environment, AND• managing dynamic events especially in multimedia

collaboration, EX: collaboration Grid/Web service applications, AND

• providing information to enable session failure recovery capabilities.

66

Context as Service Metadata We define all metadata (static, semi-static, dynamic)

relevant to a service as “Context”. Context can be associated to a single service, a

session (service activity) or both. Context can be independent of any interaction

slowly varying, quasi-static context Ex: type or endpoint of a service, less likely to

change Context can be generated as result of service

interactions dynamic, highly updated context information associated to an activity or session Ex: session-id, URI of the coordinator of a

workflow session

77

Hybrid XML Metadata Services –> WS-Context + extended UDDI

We combine functionalities of these two services: WS-Context AND extendedUDDI in one hybrid service to manage Context (service metadata).• WS-Context controlling a workflow• (Extended) UDDI supporting semantic service

discovery This approach enables uniform query capabilities on

service metadata catalog. http://www.opengrids.org/wscontext/index.html

88

HTTP(S)

WSDL

Client

WSDL

Client

HTTP

Subscriber

Publisher

Database

JDBC

Extended UDDI Service

WSDL

Database

WSDL

Hybrid-WSContext Service

JDBC

Database

WSDL


JDBC

Topic Based Publish-Subscribe Messaging System

Replica Server-2 Replica Server-N

WSDL WSDL


Database

WS

DL

JDBC

Distributed Hybrid WS-Context XML Metadata Services

Replica Server-1

Note that all Replica Servers are identical in their capabilities. This figure illustrates the system from the perspective of one Replica Server.

99

Key Features Publish-Subscribe exploited to support replicated

storage e.g.• Initial storage of context

• Update to make copies consistent

• Access context Use of Javaspaces cache running in memory on each

WS-Context node• Naturally supports Get Context by name requests

• Backed up every ~30 milliseconds to a MySQL database If query can be satisfied by Javaspaces cache, the query

can be satisfied in < 1ms plus the few milliseconds of Web service overhead

1010

TupleSpaces-Based Caching Strategies TupleSpaces is a communication paradigm

• asynchronous communication

• pioneered by David Gelernter

• first described in Linda project in 1982 at Yale

• communication units are tuples data-structure consisting of one or more typed fields

Hybrid WS-Context Service employs/extends TupleSpaces: • all memory accesses. overhead is negligible (less than 1msec. for inqueries)

• data sharing - mutual exclusive access to tuples

• associative lookup - content based search, appropriate for key-based caching

• temporal, spatial uncoupling of communicating parties

• e.g. a tuple: ("context_id", Context). This indicates a tuple with two fields: a) a string, "context_id" and b) a Java object, "Context".

• back-up with frequent time intervals for fault-tolerance

1111

Managing Context UDDI WS-Context

purpose standard way of publishing, discovering generic Web Service information

standard way of maintaining distributed session state information

metadata characteristics interaction-independent, rarely-changing, small-size

interaction-dependent, highly dynamic, small-size

types of typical queries high degree of complexity in inquiry arguments to improve the selectivity and increase the precision in the search results

simplicity in inquiry arguments,

mostly key-based retrieval queries, selectivity of queries is one.

scalability Whole Grid, UDDI is a domain-independent service for generic service metadata

Sub-Grids, modest number interacting Web Services participating an activity

desired features better expressiveness power of service metadata (e.g., RDF-enabled UDDI Registries), up-to-date service entries (e.g., leasing capable UDDI Registries), domain-specific capabilities (e.g., geospatial query capabilities), persistent storage

notification (members of an activity should be notified of the distributed state information), synchronous callback (loose-coupling of services), high performance, light-weight storage

1212

A general performance evaluation on the most recent implementation of the Hybrid WS-Context Service

1313

Prototype Evaluation - I Performance Experiment: We investigate the practical

usefulness of the system by exploring following research questions.

• What is the baseline performance of the hybrid WS-Context Service implementation for given standard operations?

• What is the effect of the network latency on the baseline performance of the system?

• How does the performance compare with previous metadata management solutions?

1414

Test-4. extended UDDI inquiry/publication

WS

DL

single threaded W

SD

L

extended UDDI Client

1 user/1000 transactions

Extended UDDI Server

Extended UDDIServer Engine

Test-1. Dummy Server

WS

DL

single threaded W

SD

L

Client


Dummy Server

DummyServer

Test-2. Hybrid-WSContext inquiry/publication without database access

WS

DL

single threaded W

SD

L

WS-Context Client



PublishingQueryingModule

JDBC Handler

Expeditor

Test -3. Hybrid-WSContext inquiry/publication with database access

WS

DL

single threaded W

SD

L

WS-Context Client




JDBC Handler

Expeditor

PERFORMANCE TEST

1515

The experimental study indicates that the proposed system can provide comparable performance for standard operations with the existing metadata

management services.

TESTBED: Cluster node configuration

Processor Intel® Xeon™ CPU (2.40GHz)

RAM 2GB total

Network Bandwidth900 Mbits/sec.[1] (among the cluster nodes)

OS GNU/Linux (kernel release 2.4.22)

Java VersionJava 2 platform, Standard Edition (1.4.2-beta-b19)

SOAP Engine Axis 2 (in Tomcat 5.5.8)

Round Trip Time Chart for Inquiry Requests

5

7

9

11

13

15

17

19

1 2 3 4 5

aver

age

resp

on

se t

ime

(mse

c) p

er r

equ

est

Test-1: Dummy service

Test-2: WS-Context inquirywith memory access

Test-3: WS-Context inquirywith dabase access

Test-4: UDDI inquiry

Metadata Services Avg. latency for inquiries

hybrid WS-Context 8.41 ms

extended UDDI 17.5 ms

JUDDI 40 ms

UDDI-MT 20.37 ms

JWSD 18.99 ms

Test 2-Test 1 is Javaspaces overhead

1616

Prototype Evaluation - II Scalability Experiment: We investigate the scalability

of the system by finding answers to the following research questions.

• What is the performance degradation of the system for standard operations under increasing message sizes?

• What is the performance degradation of the system for standard operations under increasing message rates?

• What is the scalability gain (both in numbers and in performance) of moving from a centralized system to a distributed system under the same workload?

1717

TEST-1 - Hybrid-WSContext inquiry/publication with increasing message sizes

TEST-2 - Hybrid-WSContext inquiry/publication with increasing message rates (# of messages per

second)

single threaded W

SD

L

WS-Context Client


WS

DL

Hybrid FTHPIS-WSContext Service


JDBC Handler

Expeditor

HTTP(S)

WS

DLThread

Pool

WS

DLThread

Pool

WS

DL



JDBC Handler

Expeditor

5 Client distributed to cluster nodes 1 to 5, with each running

1 to 15 threadsSCALABILITY TEST-1

1818

0

5

10

15

20

25

30

0.1 1.0 10.0 100.0

context payload size (KB)

av

g r

ou

nd

tri

p t

ime

(m

illis

ec

on

ds

)

Tinquiry=T(RTT)

Tpublication=T(RTT)

The results indicate that the cost of inquiry and publication operations remains the same, as the context’s payload size increases from 100Bytes up to 10KBytes. We also see that the hybrid WS-Context presents better performance than OGSA-DAI approach but latter technology more powerful

TESTBED: Cluster node configuration for hybrid WS-Context tests


RAM 2GB total





Metadata Services Avg. latency for inquiries for 64KByte data retrieval

hybrid WS-Context 14.55 ms

OGSA-DAI WSRF 2.1 232 ms

=> OGSA-DAI Results are from http://www.ogsadai.org.uk/documentation/scenarios/-performanceBoth OGSA-DAI and WS-Context testing cases were conducted on a tightly coupled network.

1919

The results indicate that the proposed system can scale up to 940 simultaneous querying clients or 222 simultaneous publishing clients where each client sending one query per second, for small size context payloads with 30 milliseconds fault

tolerance. Multi-core hosts will improve performance dramatically



RAM 2GB total

Network Bandwidth 900 Mbits/sec.[1] (among the cluster nodes)


Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19)


0

10

20

30

40

50

60

70

80

90

0 100 200 300 400 500 600 700 800 900 1000

message rate (message/per second)

avg

ro

un

d t

rip

tim

e(m

s)

inquiry message rate

publication message rate

Axis2 Performance on Mutlicore Machines

0

10

20

30

40

50

60

70

0 500 1000 1500 2000 2500 3000 3500

Messages per Second

Round T

rip T

ime (m

s) (

ms)

Grid Farm Sun Fire - 6 Cores Sun Fire - 8 Cores HP xw9300 Dell Intel Xeon

2 Chips2 Core/chip

2 Chips1 Core/chip

1 Chip8 Core/chip1 Chip

6 Core/chip

Xeon

Opteron

4 Cores is 3000 messages per second; about one message per millisecond per core for Opteron; one message per 2 ms for Sun Niagara core

2121

HTTP(S)

WS

DLThread

Pool

WS

DLThread

Pool

5 Client distributed to cluster nodes 1 to 5, with each running 1 to 15 threads firing messages to randomly selected servers.

DISTRIBUTION TEST

We investigate scalability when moving from a centralized server to a distributed one under heavy workloads.

• Numbered rectangle shapes correspond to an N-node FTHPIS system with various Publish-Subscribe topologies (this does NOT affect performance)

• 5 different FTHPIS system tested when N range from 1 to 5 under the same workload.

• At each testing case, same volume of data is evenly distributed among the nodes.

node-1

node-5

node-1

node-5

node-4

node-3

node-2

node-1

node-5

node-3

node-1

node-5

node-3

node-2

2 3 4 5

node-5

1

2222

The results indicate that the scalability of metadata store can be increased when moving from a centralized service to a distributed system.



RAM 2GB total





900

950

1000

1050

1100

1150

1200

1250

1300

1 2 3 4 5

number of nodes

me

ss

ag

e r

ate

(m

sg

/se

co

nd

)

Hybrid WS-Context inquiry operation

# of nodes message ratemean ± error (ms)

Stdev(ms)

1 940 47.05 ± 0.24 33.52

2 1005 40.76 ± 0.43 38.22

3 1082 38.58 ± 0.45 34.93

4 1148 36.28 ± 0.42 32.24

5 1221 34.13 ± 0.4 30.76

Non-optimal caching algorithm as does database access BEFORE Publish-Subscribe. Reversingthis choice should lead to throughputLinear in #nodesPub-Sub overhead~ 2ms

2323

Prototype Evaluation - III Fault Tolerance Experiment: We investigate the

empirical cost of having fault-tolerance by finding answers to the following research questions.

• What is the cost of the fault-tolerance in terms of execution time of standard operations on a tight cluster?

• How does the cost of fault-tolerance change when the replica servers separated with significant network distances?

2424

node-1

node-5

node-4

node-3

node-2

client

node-1

node-5

node-4

node-3

node-2

link-1

link-2

link-3

link-4

client

Test-1. LAN experiment. All nodes and client are located on a tightly coupled local area network.

Test-2. WAN experiment. Nodes are located on a loosely coupled wide area network.

San Diego, CAnode-4

Bloomington, IN, CGL

node-5

Austin, TXnode-3

Tallahassee, FL

node-2

Indianapolis, IN

node-1

Bloomington, IN, CGL

client

locationsnodes

15.3 mslink-3

11.3 mslink-2

0.83 mslink-1

31.4 mslink-4

latencylinks

FAULT-TOLERANCE TEST

2525

Summary of machine configurations

Location Processor RAM OS Java Version

gf6.ucs.indiana.edu

Bloomington, IN, USA Intel® Xeon™ CPU (2.40GHz)

2GB GNU/Linux (kernel release 2.4.22)

Java 2, STE, (1.4.2-beta-b19)

complexity.ucs.indiana.edu

Indianapolis, IN, USA Sun-Fire-880, sun4u sparc SUNW

16GB SunOS 5.9 Java HotSpot(TM) 64-Bit Server VM(1.4.2-01)

lonestar.tacc.utexas.edu

Austing, TX, USA Intel(R) Xeon(TM) CPU 3.20GHz



tg-login.sdsc.teragrid.orgSan Diego, CA, USA GenuineIntel IA-64, Itanium

2, 4 processors8GB GNU/Linux Java 2, STE,

(1.4.2-beta-b19)

vlab2.scs.fsu.edu

Tallahase, FL, USA Dual Core AMD Opteron(tm) Processor 270



FAULT-TOLERANCE EXPERIMENT TEST BED

2626

0

2

4

6

8

10

12

14

16

18

1 2 3 4 5

number of replicas

Tim

e (m

sec)

Test1 - LAN testing case -publication

Test2 - WAN testing case -publication

Test3 - Inquiry operation (requestgranted locally with memoryaccess)

Test4 - Inquiry operation (requestgranted locally with databaseaccess)

FAULT-TOLERANCE TEST RESULTS

The results point out the inevitable trade-off between the fault-tolerance (degree of replication or high availability of data) and performance. The lower the level of fault-tolerance, the higher the performance would be for publication operations.

These results also indicated that, high degree of replication could be succeeded (by utilizing an asynchronous communication model such as publish-subscribe paradigm) without increasing the cost of fault-tolerance.

2727

An Application Case Scenarioand

an application-specific performance evaluation

of the Hybrid WS-Context Service

2828

Handheld Flexible Representation (HHFR) is an open source software for fast communication in mobile Web Services. HHFR supports:• streaming messages, separation of message contents and

usage of context store.• http://www.opengrids.org/hhfr/index.html

We use WS-Context service as context-store for redundant message parts of the SOAP messages.• redundant data is static XML fragments encoded in every

SOAP message • Redundant metadata is stored as context associated to service

conversion in place The empirical results show that we gain 83% in

message size and on avg. 41% on transit time by using WS-Context service.

Application – Context Store usage in communication of mobile Web Services

2929

Optimizing Grid/Web Service Messaging Performance

· HHFR Scheme· Representation · Headers· Stream Info.

Context-Store

Save Context (setContents)

Retrieve Context (getContents)

Stream of Messagein Preferred Representation

Negotiation Over SOAP

HHFR Endpoint(Mobile)

HHFR Endpoint(Conventional)

The performance and efficiency of Web Services can be greatly increased in conversational and streaming message exchanges by removing the redundant parts of the SOAP message.

30

Performance with and without Context-store

Message SizeWithout Context-store With Context-store

Ave.±error Stddev Ave.±error Stddev

Medium: 513byte (sec) 2.76±0.034 0.187 1.75±0.040 0.217

Large: 2.61KB (sec) 5.20±0.158 0.867 2.81±0.098 0.538

Experiments ran over HHFR Optimized message exchanged over HHFR after saving

redundant/unchanging parts to the Context-store

Save on average

83% of message size, 41% of transit time

Summary of the Round Trip Time (TRTT)

31

System Parameters

Taccess: time to access to a Context-store (i.e. save a context or

retrieve a context to/from the Context-store) from a mobile client

TRTT: Round Trip Time to exchange message through a HHFR

channel N: number of simultaneous streams supported by stream summed

over ALL mobile clients

Twsctx: time to process setContext operation

Taxis: time consumed for Axis process

Ttrans: transmission time through network

Tstream: stream length

32

Context-store: System Parameters

Context-store(Information Service)

Service Provider(Endpoint A)

Mobile Client(Endpoint B)

Taccess

= Taxis + Twsctx + Ttrans

TRTT

High performance Channel of HHFR

Transit

Client ClientAxisNetwork Network

WS-CTX

Transit

33

Summary of Taxis and Twsctx measurements

Taccess = Twsctx + Taxis + Ttrans

Data binding overhead

at Web Service Container

is the dominant factor to

message processing

1.4 1.6 1.8 20

100

200

300

400

500

Size of Context (KB)

Tim

e (m

sec)

TwsctxTaxis + Twsctx

34

Chhfr = nthhfr + Oa + Ob

Csoap = ntsoap

Breakeven point:

nbe thhfr + Oa + Ob = nbe tsoap

Oa(WS) is roughly 20 milliseconds

Performance Model and Measurements

Average±error (sec) Stddev (sec)

Context-store Access (Oa) 4.127±0.042 0.516

Negotiation (Ob) 5.133±0.036 0.825

Oa : overhead for accessing the Context-store ServiceOb : overhead for negotiation

3535

String Concatenation

Measure the total time to process stream

Independent variables• Number of

messages per stream

• Size of the message

0 5 10 15 20 25 30 350

20

40

60

80

100

120

140

Number Of Messages Per Stream

Tim

e fo

r F

inis

hing

Mes

sage

Str

eam

(se

c) HHFR: 16 String Per MessageSOAP: 16 String Per Message

nbe

XML Metadata Services

Documents

Transcript of XML Metadata Services