Adaptive Parallelization of Queries over Dependent Web Service Calls
-
Upload
sabesan-manivasakan -
Category
Documents
-
view
331 -
download
2
Transcript of Adaptive Parallelization of Queries over Dependent Web Service Calls
![Page 1: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/1.jpg)
11
Adaptive Parallelization of Queries over Dependent Web Service Calls
Manivasakan Sabesan and Tore Risch
Uppsala Database Laboratory
Dept. of Information Technology
Uppsala University
Sweden
![Page 2: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/2.jpg)
2
Outline
Research Area
Queries
Query Parallelization & FF_APPLYP
Experimental setup
AFF_APPLYP
Conclusion & Future work
![Page 3: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/3.jpg)
3
WSMED System (Web Service MEDiator)
WSMED
OWF1
WSDL metadata 1
WS Operation 1
WS Operation n
WS Operation 1
WS Operation m
WS1 WSn
WSDL metadata n
SOAP call
Import metadata
SQL Query
OWFn
Automatically generated Operation Wrapper Function(OWF) makes web services queryable.
Metastore
1 33
1
2
![Page 4: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/4.jpg)
4
Queries calling data providing web services have a similar pattern :- dependent calls.
Web service calls incur high-latency and high message setup
cost
A naïve implementation of an application making these calls sequentially is time consuming
A challenge here is to develop methods to speed up such queries with dependent web service calls
Research Problems
WS1 WS2 WS3 WSn
![Page 5: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/5.jpg)
5
Dependent joinf(x-,y+) Λ g(y-, z+)
• Predicate f binds y for some input value x and passes each y to the predicate g that returns the bindings of z as result.
• Predicates f and g represent calls to parameterized sub queries (plan functions) - execution plans calling data providing web service operations.
• Input parameters are annotated with ‘-‘ and outputs with ‘+’.
• Our solution for the research problem is to parallelize dependent join in an efficient way.
![Page 6: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/6.jpg)
6
Research Area
Queries
Query Parallelization & FF_APPLYP
Experimental setup
AFF_APPLYP
Conclusion & Future work
![Page 7: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/7.jpg)
7
Query1
select gl.City , gl.TypeIdfrom GetAllStates gs, GetPlacesWithin gp, GetPlaceList glwhere gs.state=gp.state and gp.distance=15.0 and gp.placeTypeToFind='City' and gp.place='Atlanta' and gl.placeName=gp.ToPlace+' ,'+gp.ToState and gl.MaxItems=100 and gl.imagePresence='true'
Finds information about places located within 15 km from each city whose name starts with ’Atlanta‘ in all US states.
• Invokes 300 web service calls
• Returns a stream of 360 tuples
![Page 8: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/8.jpg)
8
Query 2
Find the zip code and state of the place ‘USAF Academy’.
select gp.ToState, gp.zipfrom GetAllStates gs, GetInfoByState gi, getzipcode gc,
GetPlacesInside gpwhere gs.State=gi.USState and
gi.GetInfoByStateResult=gc.zipstr and gc.zipcode=gp.zip and gp.ToPlace=´USAF Academy´
• Invokes more than 5000 web service calls
![Page 9: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/9.jpg)
9
Research Area
Queries
Query Parallelization & FF_APPLYP
Experimental setup
AFF_APPLYP
Conclussion & Future work
![Page 10: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/10.jpg)
10
Query Processing in WSMED
Parallel query plan
SQL queryCalculus
Generator
Parallel pipeliner
Plan function generator
Central plan creator
Plan splitter
Phase 1
Phase 2
![Page 11: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/11.jpg)
11
Central plan - Phase1
Query1(pl,st) :- GetAllStates() and GetPlacesWithin(‘Atlanta’,_,15.0,’City’) and
GetPlaceList(_, 100,’true’)
Calculus expression
γGetPlacesWithin(‘Atlanta’, st1, 15.0, ‘City’)
<pl, st>
γGetPlaceList (str, 100, ‘true’)
γGetAllStates()
<st1 >
<city , st2 >
γconcat(city,’, ‘, st2)
<str >
Algebra expression
![Page 12: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/12.jpg)
12
Plan Splitting and Plan Function Generation - Phase2
<city, state2>
γGetPlacesWithin(Atlanta’, st1, 15.0, ‘City’)
γconcat(city,’, ‘, state2)
<str >
PF1
PF1(Charstring st1) → Stream of Charstring str
γGetPlaceList(str,100,’true’)
<pl, st>
PF2
PF2(Charstring str) → Stream of <Charstring pl, Charstring st>
![Page 13: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/13.jpg)
13
WSMED Process Tree
qi- query process (i=0,1,......n)
Level 2
q0
q1
q3 q4
q2 PF1
GetAllStates
PF2q5 q8q7q6
Coordinator
Level 1
Query1
![Page 14: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/14.jpg)
14
Make Parallel Pipeline
<str>
<st1>
FF_APPLYLP(PF2, 3,str)
<pl, st>
γGetAllStates()
FF_ APPLYP(PF1, 2, st1)
Manually set fanouts on both levels
![Page 15: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/15.jpg)
15
FF_APPLYP(Function PF, Integer fo, Stream pstream) → Stream result• fo – fanout , values are manually set
• PF – plan function
• pstream – stream of parameter values pi
• result – stream of results ri
q3
q4q5
PFPF
PFp1
p2
p3
First Finished Apply in Parallel (FF_APPLYP)
FF_APPLYP
r1r2
r3
p4
p5
p6
![Page 16: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/16.jpg)
16
Research Area
Queries
Query Parallelization & FF_APPLYP
Experimental setup
AFF_APPLYP
Conclusion & Future work
![Page 17: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/17.jpg)
17
Experimental Setup• Flat tree :- The fanout vector has f2=0 ({f1,0}) in which case
both OWFs are combined into the same plan function executed at the same level.
• Heterogeneous fanout:- fanout vector {f1,f2}, f1≠f2
• Homogeneous fanout :- fanouts are equal, i.e. f1 = f2
• Queries were run on a computer with a 3 GHz single processor
Intel Pentium 4 with 2.5GB RAM
• Total number of query processes N needed to execute the
parallel queries will be N= f1 + f1 * f2. • Experiments investigate the optimum tree topology for up to 60
query processes
f1=2
f2=4
f1=2
f2=2
f1=5
![Page 18: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/18.jpg)
18
Observation –Query1
• Lowest execution time region is achieved within the range 50 - 60 sec with the fanout vector {5,4}.
• Fastest execution time 56.4 sec outperformed with the speed-up of 4.3 the central plan (244.8 sec).
![Page 19: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/19.jpg)
19
Observation- Query2
1 2 4 6 20 30 40 600
30200400600800
100012001400160018002000220024002600
Execution time
f1
f2
2400-2600
2200-2400
2000-2200
1800-2000
1600-1800
1400-1600
1200-1400
• Best execution time for Query2 is achieved within the range of 1200- 1400 sec with the fanout vector {4,3}
• Fastest execution time, 1203 sec surpassed with the speed-up of 2 compared with the naïve case (2412.9 sec)
![Page 20: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/20.jpg)
20
Observations of Preliminary Experiments
• Best execution time for both queries is achieved close to, but not exactly for, homogenous balanced trees.
–Query1: f1=5, f2=4
–Query2: f1=4, f2=3
• Need an adaptive query process arrangement at runtime
![Page 21: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/21.jpg)
21
Research Area
Queries
Query Parallelization & FF_APPLYP
Experimental setup
AFF_APPLYP
Conclusion & Future work
![Page 22: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/22.jpg)
22
Adaptive First Finished Apply in Parallel (AFF_APPLYP)
The AFF_APPLYP operator adapts the process plan at run time and starts with a binary tree.
AFF_APPLYP (Function PF, Stream pstream) → Stream result
• PF – plan function
• pstream - stream of parameter values pi
• result- stream of results ri
It replaces FF_APPLYP by eliminating the manual fanout parameter
![Page 23: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/23.jpg)
Functionalities of AFF_APPLYP
1. AFF_APPLYP initially forms a binary process tree by always setting fanout to 2 - init stage.
23
q0
q1
q3 q4
q2
q6q5
Coordinator
Level 1
Level 2
![Page 24: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/24.jpg)
..........2. A monitoring cycle for a non-leaf query process is defined when number of received end-of-call messages equal to number of children.
2.1 After the first monitoring cycle AFF_APPLYP adds p new child processes - an add stage.
3. When an added node has several levels of children, the init stages of AFF_APPLYP s in the children will produce a binary sub–tree.
q0
q1
q3 q4
q2
q5
Coordinator
Level 1 q7
q9q8q10Level 2 q6 q11
![Page 25: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/25.jpg)
25
......
4. AFF_APPLYP records per monitoring cycle i the average time ti to produce an incoming tuple from the children.
4.1 If ti decreases more than a threshold (25%) the add stage is rerun.
4.2 If ti increases we either stop or run a drop stage that drops one child and its children.
q0
q1
q3 q4
q2
q5
Coordinator
Level 1
q12q10Level 2 q6 q11
![Page 26: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/26.jpg)
26
Adaptive Results- Query1
Query1
0.00
20.00
40.00
60.00
80.00
100.00
Process selection
Exe
cuti
on
tim
e
Best FF_APPLYP fo1=5 fo2=4 p=1, no drop stage, fo1=3 fo2=3
p=1, drop stage, fo1=2 fo2=3 p=2, no drop stage, fo1=4 fo2=5
p=2, drop stage, fo1=3 fo2=3 p=3, no drop stage, fo1=5 fo2=3.4
p=3, drop stage, fo1=4 fo2=3.25 p=4, no drop stage, fo1=6 fo2=8.7
p=4, drop stage, fo1=5 fo2=4.2 p=5, no drop stage, fo1=7 fo2=7.5
p=4, drop stage, fo1=6 fo2=7.8
![Page 27: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/27.jpg)
27
Adaptive Results –Query2
Query2
0.000100.000200.000300.000400.000500.000600.000700.000800.000900.000
1000.0001100.0001200.0001300.0001400.000
Process selection
Exe
cuti
on
tim
e(se
c)
Best FF_APPLYP fo1=4 fo2=3 p=1, no drop stage, fo1=3 fo2=2.3
p=1, drop stage, fo1=3 fo2=2 p=2, no drop stage, fo1=4 fo2=2.5
p=2, drop stage, fo1=4 fo2=2.25 p=3, no drop stage, fo1=5 fo2=2.6
p=3, drop stage, fo1=5 fo2=2.4
![Page 28: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/28.jpg)
28
Observations with AFF_APPLYP
• For Query1 the execution time with p=4 and no drop stage comes close to the execution time of the best manually specified process tree
• For Query2 the execution with p=2 and no drop stage is the closest one.
• In both cases that execution time with p=2 and no drop stage is reasonably close to the execution time of the best manually specified process tree (Query1 80% , Query2 96 % )
• Dropping stages make insignificant changes in the execution time.
![Page 29: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/29.jpg)
29
Research Area
WSMED System
Query Parallelization & FF_APPLYP
Experimental setup
AFF_APPLYP
Conclusion & Future work
![Page 30: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/30.jpg)
Related work
• Similar to WSMS (U.Srivastava, J.Widom, K.Munagala, and R.Motwani, Query
Optimization over Web Services, VLDB 2006) WSMED also invoke parallel web service calls. In contrast, WSMED supports automated adaptive parallelization.
• In contrast to WSQ/DSQ(R.Goldman, and J.Widom, WSQ/DSQ: a practical
approach for combined querying of databases and the Web, SIGMOD 2000) ,WSMED produces non-materialized adaptive parallel plans based on parameter streams.
• Runtime optimization techniques (A. Gounaris, et al., Robust runtime
optimization of data transfer in queries over Web Services, ICDE 2008 ) investigate adaptation of buffer sizes in web service calls, not dealing with adaptive parallelism on web service calls.
30
![Page 31: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/31.jpg)
Conclusion• An algorithm implemented to transform central plan into
parallel plan by introducing FF_APPLYP.
• FF_APPLYP, with manually set fanouts, is defined to parallelize calls to plan functions, encapsulates web service operations, partitioned for different parameter values.
• AFF_APPLYP starts with a binary tree and then each non-leaf process locally adapts the process sub-trees by adding and removing children , based on the flow of result stream without any static cost model.
• The AFF_APPLYP obtained performance close to the best manually specified process tree with FF_APPLYP by automatically adapting the process tree. 31
![Page 32: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/32.jpg)
32
Future .....
• Generalize the strategy for queries mixed with dependent and independent web service calls, as well bushy trees.
• Investigate different process arrangement strategies with the algebra operators.
• Setup a benchmark to simulate the parallel invocation of web services.
• Find a model for dynamically changing the fanout vector during the runtime.
![Page 33: Adaptive Parallelization of Queries over Dependent Web Service Calls](https://reader030.fdocuments.net/reader030/viewer/2022032419/55a2bb0b1a28ab495f8b46d9/html5/thumbnails/33.jpg)
Thank you for your attention
?
33