Overview: Cloud Computing and Workflow Research in NGSP Group
description
Transcript of Overview: Cloud Computing and Workflow Research in NGSP Group
Dr. Xiao Liu
Sessional Lecturer, Research Fellow
Centre of SUCCESS
Swinburne University of Technology
Melbourne, Australia
Overview: Cloud Computing and Workflow Research in NGSP Group
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Outline SUCCESS Centre and NGSP Group Background: Cloud Computing and Workflow Research Topics
Performance Management in Scientific Workflows Data Management in Scientific Cloud Workflows Security and Privacy Protection in the Cloud Data Reliability Assurance in the Cloud SwinDeW-C Cloud Workflow System
Future Work and Conclusions
2
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
The Centre of SUCCESS SUCCESS: Swinburne University Centre for Computing
and Engineering Software Systems SUCCESS is the NO.1 Software Engineering Centre in
Australia SUCCESS is one of the 7 Tire 1 Centres at Swinburne
University of Technology (Times World Ranking: 351- 400) The ambition of the Centre is to become the top centre for
software research in the Southern Hemisphere within the next five years. To achieve world renowned software innovation and engineering with a balanced theoretic, applied, industry and education impact across the Centre
3
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
SUCCESS Research Focus Areas
Knowledge and Data Intensive Systems Nature of Software Next Generation Software Platforms SE Education and IBL/RBL Software Analysis and Testing Software R&D Group
http://www.swinburne.edu.au/ict/success/research-expertise/
4
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
NGSP (Small) Group Overview This group conducts research into cloud computing and
workflow technologies for complex software systems and services.
Members:
Leader:Prof Yun Yang(PC Member forICSE 07/08, FSE09 ICSE 10/11/12)
Researchers:A/Prof Jinjun Chen (UTS)Dr Xiao Liu (Postdoc)Dr Dong Yuan (Postdoc)Gaofeng ZhangWenhao LiDahai CaoXuyun ZhangChang LiuJofry Hadi SUTANTO
Others:Prof John GrundyProf Chengfei Liu
5
Visitors:Prof Lee OsterweilProf Lori ClarkeProf Ivan StojmenovicProf Paola InverardiProf Amit ShethProf Wil van der Aalst Prof Hai Zhuge
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Primary projects: (Cloud) workflow technology
ARC LP0990393 (Y Yang, R Kotagiri, J Chen, C Liu)
Cloud computing ARC DP110101340 (Y Yang, J Chen, J Grundy)
Secondary project: Management control systems for effective information
sharing and security in government organisations ARC LP110100228 (S Cugenasen, Y Yang)
R&D Projects – Grants
6
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
SwinDeW workflow family including SwinDeW-C Architectures / Models (D Cao) Scheduling / Data and service management (D Yuan, X Liu) Verification / Exception handling (X Liu)
Cloud computing: Data management (D Yuan, X Liu, W Li) Privacy and Security (G Zhang, X Zhang, C Liu)
R&D Projects – Overview
7
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on Software Engineering and Methodology, 20(3), 2011
X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805-825, Nov./Dec. 2011.
D. Yuan, Y. Yang, X. Liu and J. Chen, On‑demand Minimum Cost Benchmarking for Intermediate Datasets Storage in Scientific Cloud Workflow Systems. Journal of Parallel and Distributed Computing, 71:(316-332), 2011
J. Chen and Y. Yang, Localising Temporal Constraints in Scientific Workflows. Journal of Computer and System Sciences, Elsevier, 76(6):464-474, Sept. 2010
G. Zhang, Y. Yang and J. Chen, A Historical Probability based Noise Generation Strategy for Privacy Protection in Cloud Computing. Journal of Computer and System Sciences, Elsevier, published online, Dec. 2011.
Some Recent ERA A* Ranked Publications
8
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Outline SUCCESS Centre and NGSP Group Background: Cloud Computing and Workflow Research Topics
Performance Management in Scientific Workflows Data Management in Scientific Cloud Workflows Security and Privacy Protection in the Cloud Data Reliability Assurance in the Cloud SwinDeW-C Cloud Workflow System
Future Work and Conclusions
9
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Background: Cloud Computing What is cloud computing?
R. Buyya: "A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualised computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers.”
I. Foster: " Cloud computing is a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualised, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet. “
UC Berkeley: Cloud computing is utility computing plus SaaS.
10
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Why Cloud Computing Data explosion
TB (1012), PB(1015), exabyte (EB, 1018), zettabyte (ZB, 1021), yottabyte (YB,1024)
The total amount of global data in 2010: Google processes ? data everyday in 2009: Every day, Facebook 10T, Twitter 7T, Youtube 4.5T
Moore's law vs. data explosion speed Buzzwords: data storage, data processing, parallel, distributed,
virtualisation, commodity machines, energy consumption, data centres, utility computing, software (everything) as a service
11
1.2 ZB
24 PB
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Benefits of Clouds No upfront infrastructure investment
No procuring hardware, setup, hosting, power, etc..
On demand access Lease what you need and when you need..
Efficient Resource Allocation Globally shared infrastructure …
Nice Pricing Based on Usage, QoS, Supply and Demand, Loyalty, …
Application Acceleration Parallelism for large-scale data analysis…
Highly Availability, Scalable, and Energy Efficient Supports Creation of 3rd Party Services & Seamless offering
Builds on infrastructure and follows similar Business model as Cloud
12
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Successful Stories Google Animoto, 750,000 sign up in three days, 25,000 access
one hour, 10 times capability required, Amazon NY Times, articles from 1851 to 1980, accomplished in
24 hours at a cost of only US$240 Facebook, Saleforce CRM, IBM Research Compute
Cloud …..
13
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Cloud Computing Classification Cloud Services
IaaS: infrastructure as a service, Amazon S3, EC2 PaaS: platform as a service, Google App Engine SaaS: software as a servcie, Saleforce.com
Cloud Types Public/Internet Clouds Private/Enterprise Clouds Hybrid/Mixed Clouds
14
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Example (PaaS): Hadoop Project The Apache Hadoop software library is a framework that allows
for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop provides a reliable shared storage and analysis system
Storage provided by HDFS: a distributed file system that provides high-throughput access to application data
Analysis provided by MapReduce: a software framework for distributed processing of large data sets on compute clusters
Hadoop for Yahoo! search Hadoop: The Definitive Guide (by Tom White) http://hadoop.apache.org/
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Cloud in Australia Gartner estimated the global demand in 2009 for cloud computing at $46 billion, rising
to $150 billion by 2013
The Australian Government’s business operations, ICT costs around $4.3 billion p.a. Australian Government ICT Sustainability Plan 2010 – 2015, an energy efficient
technology for the Australian Government Data Centre Strategy. The Department of Finance and Deregulation estimated that costs of $1 billion could
be avoided by developing a data centre strategy for the next 15 years. Australian Taxation Office (ATO), Department of Immigration and Citizenship (DIAC),,
and Australian Maritime Safety Authority (AMSA), proof of concept, initiatives The Australian Academy of Technological Sciences and Engineering (ATSE),
opportunities and challenges for government, universities and business. Westpac, Telstra, MYOB, Commonwealth Bank, Australian and New Zealand Banking
Group and SAP, initiatives to support the migration and running of their business applications in the cloud.
16
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Cloud in China The national twelfth five years plan http://www.chinacloud.cn/ http://www.china-cloud.com/ http://www.cloudcomputing-china.cn/
17
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Background: Workflow The automation of a business process, in whole or part,
during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules.
A Workflow Management System is a system that provides procedural automation of a business process by managing the sequence of work activities and by managing the required resources (people, data & applications) associated with the various activity steps.
-- [Workflow Management Coalition]
18
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Why Workflow Originated from office automation Business process management, business agility Business process analysis, re-design Separation of workflow management system from
software applications Just like the separation of database management system from
software applications
Software component reuse, Web-services Programming by scripting the composition of software
components19
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Workflow Applications Office automation, review and approve process Business process management systems, ERP systems Machine shops, job shops and flow shops Flight booking, insurance claim, tax refund… Scientific workflows IBM WebSphere Workflow Microsoft Windows Workflow Foundation
http://wm.microsoft.com/ms/msdn/netframework/introwf.wmv
20
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Workflow Reference Model
21
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
22
Example: Pulsar Searching Workflow Astrophysics: pulsar searching Pulsars: the collapsed cores of stars that were once more massive than 6-10
times the mass of the Sun http://astronomy.swin.edu.au/cosmos/P/Pulsar Parkes Radio Telescope (http://www.parkes.atnf.csiro.au/) Swinburne Astrophysics group (http://astronomy.swinburne.edu.au/) has been
conducting pulsar searching surveys (http://astronomy.swin.edu.au/pulsar/) based on the observation data from Parkes Radio Telescope.
Typical scientific workflow which involves a large number of data and computation intensive activities. For a single searching process, the average data volume (not including the raw stream data from the telescope) is over 4 terabytes and the average execution time is about 23 hours on Swinburne high performance supercomputing facility (http://astronomy.swinburne.edu.au/supercomputing/).
left: Image of the Crab Nebula taken with the Palomar telescope right: A close up of the Crab Pulsar from the Hubble Space TelescopeCredit: Jeff Hester and Paul Scowen (Arizona State University) and NASA
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Pulsar Searching Workflow
23
AccelerateCollect data
Transfer Data
Pulse Seek
FFA Seek
Get Candidates
Eliminate candidates
Fold to XML
Extract Beam
Get Candidates
U(SW)=24hours
…...
…...
……
…...
Make Decision
U(SW1)=15.25hoursU(SW2)=5.75hours
De-disperse (1200)
De-disperse (3600)
De-disperse (2400)
…...
Extract Beam
1hour
13hours
1.5hours
1hour
20minutes 4hours
20minutes1.5hours
10minutes 20minutes
Transfer Data
…… FFT
Seek
Data Collection
Data Pre-processing Decision Making
Candidate Searching
Dr. Willem van Straten
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Outline SUCCESS Centre and NGSP Group Cloud Computing and Workflow Research Topics
Performance Management in Scientific Workflows Data Management in Scientific Cloud Workflows Security and Privacy Protection in the Cloud Data Reliability Assurance in the Cloud SwinDeW-C Cloud Workflow System
Future Work and Conclusions
24
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Dr. Xiao [email protected] http://www.ict.swin.edu.au/personal/xliu/
Performance Management in Scientific Workflows
Research Topics
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
26
Workflow QoS QoS dimensions
time, cost, fidelity, reliability, security …
QoS of Cloud Services Workflow QoS
the overall QoS for a collection of cloud services but not simply add up!
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
27
Temporal QoS System performance
Response time Throughput
Temporal constraints Global constraints: deadlines Local constraints: milestones, individual activity durations
Satisfactory temporal QoS High performance: fast response, high throughput On-time completion: low temporal violation rate
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
28
Problem Analysis Setting temporal constraints
Coarse-grained and fine-grained temporal constraints Prerequisite: effective forecasting of activity durations
Monitoring temporal consistency state Monitor workflow execution state Detect potential temporal violations
Temporal violation handling Where to conduct violation handling What strategies to be used
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Ultimate Goal Achieving on-time completion Measurements:
Temporal correctness Cost effectiveness
29
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Temporal Consistency Model Temporal correctness: workflow execution towards the
satisfaction of temporal constraints Temporal consistency model defines the system
running state at a specific workflow activity point (i.e. temporal checkpoint) against specific temporal constraints
Basic elements: real workflow running time (before and including the activity point), estimated running time for uncompleted workflow (after the checkpoint), temporal constraints
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Probability Based Temporal Consistency Model
Time attributes for workflow activity ai
Maximum activity duration: D(ai)
Mean activity duration: M(ai)
Minimum activity duration: d(ai)
Runtime activity duration: R(ai)
3 sigm rule, normal distribution, 99.73% (μ-3σ, μ+3σ), R(ai)~N(μ, σ)
D(ai)= μ+3σ, M(ai)= μ, d(ai)= μ-3σ
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Probability Based Temporal Consistency Model
Type of Temporal Constraints Upper bound temporal constraint, U(W) Lower bound temporal constraint, L(W) Fixed-time temporal constraint, F(W)
Relationship Upper bound, lower bound, symmetric Upper bound, fixed-time, special case
Choice Upper bound/lower bound constraint for workflow build-time Fixed-time constraint for workflow runtime
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Probability Based Temporal Consistency Model
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Probability Based Temporal Consistency Model
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Temporal Framework
35
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Temporal Framework Component 1: Temporal Constraint Setting
Forecasting workflow activity durations Setting coarse-grained temporal constraints Setting fine-grained temporal constraints
Component 2: Temporal Consistency Monitoring Temporal checkpoint selection Temporal verification
Component 3: Temporal Violation Handling Temporal violation handling point selection Temporal violation handling
36
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Component 1: Temporal Constraint Setting
37
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Forecasting Activity Durations Statistical time-series pattern based forecasting strategies Selected Publications:
X. Liu, Z. Ni, D. Yuan, Y. Jiang, Z. Wu, J. Chen, Y. Yang, A Novel Statistical Time-Series Pattern based Interval Forecasting Strategy for Activity Durations in Workflow Systems, Journal of Systems and Software (JSS), vol. 84, no. 3, Pages 354-376, March 2011.
X. Liu, J. Chen, K. Liu and Y. Yang, Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns, Proc. of 4th IEEE International Conference on e-Science (e-Science08), pages 23-30, Indianapolis, USA, Dec. 2008.
38
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Setting Temporal Constraints Probability based temporal consistency model Time analysis based on Stochastic Petri Nets Selected Publications:
X. Liu, Z. Ni, J. Chen, Y. Yang, A Probabilistic Strategy for Temporal Constraint Management in Scientific Workflow Systems, Concurrency and Computation: Practice and Experience (CCPE), Wiley, 23(16):1893-1919, Nov. 2011 .
X. Liu, J. Chen and Y. Yang, A Probabilistic Strategy for Setting Temporal Constraints in Scientific Workflows, Proc. 6th International Conference on Business Process Management (BPM2008), Lecture Notes in Computer Science, Vol. 5240, pages 180-195, Milan, Italy, Sept. 2008.
39
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Component 2: Temporal Consistency Monitoring
40
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Temporal Consistency Monitoring Minimum (Probability) Time Redundancy based Checkpoint
Selection Strategy Temporal Dependency based Checkpoint Selection Strategy Selected Publications:
X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805-825, Nov./Dec. 2011.
J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on Software Engineering and Methodology, 20(3), 2011
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Component 3: Temporal Violation Handling
42
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Violation Handling Violation Handling Point Selection (Probability) Time deficit allocation Workflow local rescheduling strategy – ACO, GA, PSO Selected Publications:
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen and Y. Yang, A Novel General Framework for Automatic and Cost-Effective Handling of Recoverable Temporal Violations in Scientific Workflow Systems, Journal of Systems and Software, vol. 84, no. 3, pp. 492-509, 2011
X. Liu, Y. Yang, Y. Jiang and J. Chen, Do We Need to Handle Every Temporal Violation in Scientific Workflow Systems, submitted to ACM Transactions on Software Engineering and Methodology
43
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Experiment Results on Temporal Violation Rates
44
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Cost Analysis
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Yearly Cost and Time Reduction
Yearly cost reduction for the pulsar searching workflow
Yearly time reduction for the pulsar searching workflow
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
47
Dr. Dong Yuan, Dr. Xiao [email protected], [email protected] http://www.ict.swin.edu.au/personal/dyuan/
Data Management in Scientific CloudWorkflows
Research Topics
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Data Management in Cloud Computing Scientific applications in cloud computing
Computation and data intensive applications Massive computation and storage resources Pay-as-you-go model
Computation and storage trade-off Some datasets should be stored (Storage cost) Some datasets can be regenerated (computation cost)
Data Placement
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Data Dependency Graph (DDG) A classification of the application data
Original data and generated data
Data provenance A kind of meta-data that records how data are
generated.
DDG
d1 d2
d3
d8d7
d6
d4
d5
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Attributes of a Dataset in DDG
A dataset di in DDG has the attributes: <xi, yi, fi, vi, provSeti, CostRi>
xi ($) denotes the generation cost of dataset di from its direct predecessors.
yi ($/t) denotes the cost of storing dataset di in the system per time unit.
fi (Boolean) is a flag, which denotes the status whether dataset di is stored or deleted in the system.
vi (Hz) denotes the usage frequency, which indicates how often di is used.
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Attributes of a Dataset in DDG
provSeti denotes the set of stored provenances that are needed when regenerating dataset di.
CostRi ($/t) is di’s cost rate, which means the average cost per time unit of di in the system.
Cost = Computation + Storage Computation: total cost of computation resources Storage: total cost of storage resources
}{)(ikjijk dddprovSetdd kii xxdgenCost
, 1
( ) , 0i i
ii i i
y fCostR
genCost d v f
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Cost Model of Datasets Storage in the Cloud
Total cost rate for storing datasets in a DDG
S is the storage strategy of the DDG
This cost model also represents the trade-off between computation and storage in the cloud
For a DDG with n datasets, there are 2n different storage strategies
SDDGd iS i
RCostTCR
d1 d2 d3
(x1 , y1 ,v1) (x3 , y3 ,v3)(x2 , y2 ,v2)
S1 : f1 =1 f2 =0 f3 =0332221 )(
1vxxvxyTCRS
S2 : f1 =0 f2 =0 f3 =1 322111 )(2
yvxxvxTCR S ...
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Minimum cost benchmark
What is the minimum cost benchmark? The minimum cost for storing and regenerating datasets in the cloud The best trade-off between computation and storage in the cloud We need to find the Minimum Cost Storage Strategy (MCSS) for the
application datasets
Significance of the minimum cost benchmark Due to the pay-as-you-go model, cost-effectiveness is very important
to users for deploying their applications in the cloud The minimum cost benchmark is for users to evaluate the cost-
effectiveness of their storage strategies.
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Static On-Demand Minimum Cost Benchmarking
The static benchmarking is provided as an on-demand service for users
Whenever a benchmarking request comes, the corresponding algorithms will be triggered to calculate the minimum cost benchmark, which is a one-time only computation.
This approach is suitable for the situation that only occasional benchmarking is requested.
CTT-SP algorithm A novel algorithm designed to find the MCSS of a DDG with
polynomial time complexity CTT-SP: Cost Transitive Tournament Shortest Path
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Linear CTT-SP Algorithm
CTT-SP algorithm for linear DDG Essences of the algorithm:
Construct a Cost Transitive Tournament based on DDG In the CTT, every path (from the start to the end) represent a
storage strategy of the DDG. The paths have one-to-one mapping to the storage strategies.
d1 d2 d3d1 d2 d3ds de
DDG CTT
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Linear CTT-SP Algorithm
Set weights to the edges in CTT We denote the weight of the edge from di to dj as ,
which is defined as “the sum of cost rates of dj and the datasets between di and dj, supposing that only di and dj are stored and the rest of datasets between di and dj are all deleted”.
Formally:
The length of each path equals to the TCR (Total Cost Rate) of the corresponding storage strategy.
ji dd ,
{ }
{ }
,
( ) *
k k i k j
k k i k j
i j j kd d DDG d d d
j k kd d DDG d d d
d d CostR CostR
y genCost d v
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Linear CTT-SP Algorithm
Find the Shortest Path from ds to de in the CTT The MCSS Smin is to Store the datasets that Pmin<ds , de> traverses.
The minimum cost benchmark is
y1d1 d2 d3
(x1 , y1 ,v1) (x3 , y3 ,v3)(x2 , y2 ,v2)
x1v1+y2
d1 d2 d3ds de
x3v3
x2v2+y3
x2v2+(x2+x3)v3
x1v1+(x1+x2)v2+(x1+x2+x3)v3
x1v1+(x1+x2)v2+y3
y2 y3 0
DDG CTT
minmin
iS id DDG STCR CostR
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
General CTT-SP Algorithm Take the simple DDG below as example (with a block)
For a general DDG, we select one branch from the first dataset to the last dataset as main branch (e.g. {d1, d2, d5, d6, d7, d8} ) to construct the CTT.
For the rest of datasets, we denote them as sub branches (e.g. {d3, d4} ).
d1 d2
d3
d8d7
d6
d4
d5
DDG
Block
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
General CTT-SP Algorithm
The general CTT-SP algorithm is a recursive algorithm For the sub branches, given different stored predecessors and successors,
the MCSS would be different, hence cannot be calculated at the beginning. In the general CTT-SP algorithm, we will recursively call it on the sub
branches and dynamically add the cost rates to the edges in the CTT of the main branch
d1 d2
d3
d8d7d6
d4
d5ds de
CTT
Main Branch
Sub branch
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Dynamic on-the-fly Minimum Cost Benchmarking
The benchmarking service is delivered on the fly to instantly respond to the benchmarking requests
By saving and utilising the pre-calculated results, whenever the application cost changes in the cloud, we can dynamically calculate the new minimum cost and keep the benchmark updated.
This approach is suitable for the situation that more frequent benchmarking is requested at runtime.
Partitioned Solution Space (PSS) PSS saves all the possible MCSSs of a DDG segment. For a DDG segment, given particular stored predecessors and
successors, we can quickly locate the corresponding MCSS from the PSS.
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
PSS for a DDG_LS (Linear DDG Segment)
A DDG_LS has different MCSSs according to its preceding and succeeding datasets’ storage statuses.
CTT for a DDG_LS Different selections of the start and end datasets (ds and de) may lead to
different MCSSs for the segment.
... de... ...… …ds ......
A Linear DDG
Segment
Start Dataset
End DatasetDeleted
Preceding Datasets
Deleted Succeeding
Datasets
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
PSS for a DDG_LS Partition of the solution space
We assume that Si,j and Si',j' be two MCSSs in the solution space SCRi,j < SCRi',j'. The border of Si,j and Si',j' in the solution space is that given particular X and V, the TCR of storing the DDG_LS with Si,j and Si',j' are equal.
Hence we have
Hence, the border of Si,j and Si',j' in the solution space is a straight line.
jiji TCRTCR ,,
ll n
jkkji
i
kk
n
jkkji
i
kk xVSCRvXxVSCRvX
1,
1
11,
1
1
0,,11
1
1
1
1
jiji
n
jkk
n
jkk
i
kk
i
kk SCRSCRVxxXvv
ll
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
PSS for a DDG_LS
If we assume , the equation can be further simplified to
The figure below demonstrate the partition of the solution space.
X
V
Si,j
Si',j'
o
VX
didi' dj dj'
Si,j
Si',j'
X0
V0
A DDG_LS TCRi,j<TCRi',j'
TCRi,j>TCRi',j'
L<Si,j , Si',j'>
jjii dddd
0,,1
1
jiji
j
jkk
i
ikk SCRSCRVxXv
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
PSS for a DDG_LS
We can calculate the partition lines of all the potential MCSSs in the solution space, which form the PSS.
With PSS, given any X and V, we can quickly locate the corresponding MCSS for the DDG_LS.
S2
S3
S1
S4
S1
o
S5
X V S2
S3
S4
S5
V
XA DDG_LS
PSS
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Dynamic on-the-fly Minimum Cost Benchmarking
PSS based benchmarking approach (key ideas) Merge the PSSs of the DDG_LSs to derive the PSS of the whole DDG,
from which the minimum cost benchmark can be obtained. Save all the calculated PSSs along this process in a hierarchy. Whenever the application cost changes, we can quickly derive the new
minimum cost benchmark from the saved PSSs. Hence, we can dynamically keep the minimum cost benchmark
updated, so that benchmarking requests can be instantly responded on the fly.
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Saving PSSs We save all the PSSs of a DDG in a hierarchy
The level number indicates the number of DDG_LSs merged in the PSS at that level.
The link between two PSSs at Levels i and i+1 in the hierarchy means the corresponding DDG segment of the PSS at Level i+1 contains the DDG segment of the PSS at Level i.
PSS1
PSS12
PSS3PSS2
PSS13
PSS123
...
DDG_LS3
DDG_LS2DDG_LS1
...
...
Dataset Linear DDG Segment Partitioned Solution Space
A DDG with three sub linear segments
Level 1
Level 3
Level 2
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Cost-Effective Storage Strategies
Cost Rate based Storage Strategy The strategy directly compares generation cost rate and
storage cost rate for every dataset to decide its storage status.
The strategy can guarantee that the stored datasets in the system are all necessary.
The strategy can dynamically check whether the re-generated datasets need to be stored, and if so, adjust the storage strategy accordingly.
This strategy is highly efficient with fairly reasonable cost effectiveness.
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Cost-Effective Storage Strategies Local-Optimisation based Storage Strategy
The strategy divides the DDG with large number of application datasets into small linear segments (DDG_LS).
The strategy utilise the linear CTT-SP algorithm to find the MCSS of every segment, hence achieves the local-optimisation
This strategy is highly cost-effective with very reasonable runtime efficiency.
...
...
...
...
Linear DDG1
Linear DDG3
Linear DDG2
Linear DDG4
Partitioning point dataset
Partitioning point dataset
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Pulsar Searching Application Case Study
Analysing ONE PIECE of the observation data, six datasets are generated.
We directly utilise the on-demand benchmarking approach MCSS is storing d2, d4, d6 and deleting d1, d3, d5. The minimum cost benchmark is $0.51 per day.
Raw beam data
Accelerated De-
dispersion files
De-dispersion
files
Extracted & compressed
beamSeek
results files
Candidate list XML files
Size:Generation time:
20 GB245 mins<1 min80 mins300 mins790 mins27 mins
25 KB1 KB16 MB90 GB90 GB
d1 d6d5d4d3d2
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
PSSs merging process
X
V
o
S1
S4S3
S2
0.0230
(0.2394, 0.0277)
0.9048
(0.9048, 0.0222)
(1/h)
($)
MCSS Stored DatasetsS1 d2
S2 d1, d2
S3 d3
S4 d1, d3
Partition lines:L<S1,S2>: 0.0042X - 0.0038 = 0L<S1,S3>: 0.01X - 0.5V + 0.0115 = 0L<S1,S4>: 0.0042X + 0.5V - 0.0149 = 0L<S2,S4>: 0.5V - 0.0111 = 0L<S3,S4>: 0.0142X - 0.0034 = 0
Raw beam data
Accelerated De-
dispersion files
De-dispersion
files
Extracted & compressed
beam
Seek results
files
Candidate list
XML files
Size:Generation time:
20 GB245 mins<1 min80 mins300 mins790 mins27 mins
25 KB1 KB16 MB90 GB90 GB
DDG_LS1
PSS1
d1 d6d5d4d3d2
Usage Frequency: d2 : 1 / 4day; d1 , d3 , d4 , d5 , d6 : 1 / 10day
DDG_LS2
X
V
o
(1/h)
($)
PSS2
Only one MCSS in this PSS, i.e. storing d4 and d6 .Hence, there is no partition line.
X
V
o
(1/h)
Merge
S2S1
0.9048
PSS MCSS Stored DatasetsS1 d2, d4 , d6
S2 d1, d2, d4 , d6
Partition lines:L<S1,S2>: 0.0042X - 0.0038 = 0
($)
There are two phases in the execution: 1)Files Preparation 2)Seeking Candidates.
Two DDG_LSs are generated correspondingly.
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Pulsar Searching Application Case Study
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Pulsar Searching Application Case Study
Datasets
Strategies
Extracted
beam
De-dispersion
files
Accelerated
de-dispersion
files
Seek
results
Pulsar
candidates
XML
files
1) Store none dataset Deleted Deleted Deleted Deleted Deleted Deleted
2) Store all datasets Stored Stored Stored Stored Stored Stored
3) Generation cost based strategy
Deleted Stored Stored Deleted Deleted Stored
4) Usage based strategy
Deleted Stored Deleted Deleted Deleted Deleted
5) Cost rate based strategy
Deleted
Stored
(deleted
initially)
Deleted Stored Deleted Stored
6) Local-optimisation based strategy
Deleted Stored Deleted Stored Deleted Stored
7) Minimum cost benchmark
Deleted Stored Deleted Stored Deleted Stored
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Data Placement Compute near big data! In scientific cloud workflows, large amounts of
application data need to be stored in distributed data centres, a data manager must intelligently select data centres in which these data will reside, by considering:
The dependencies between datasets The movement of large datasets Some data has fixed locations
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
A matrix based k-means clustering strategy
Build-time: to group the existing datasets into k data centres based on data dependencies
Step 1: Setup and cluster the dependency matrix Step 2: Partition and distribute datasets
Runtime: to dynamically clusters newly generated datasets to the most appropriate data centres based on dependencies
Step 1: Data pre-allocation by the clustering algorithm Step 2: Adjust data placement among data centres when
new workflows are deployed or some data centres become overloaded
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
76
Gaofeng Zhang
Security and Privacy Protection in the Cloud
Research Topics
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Background Data Security vs. Data Privacy Privacy in cloud computing
Massive data store and compute in open cloud environment Customers cannot control inside cloud
The severity of privacy risk in cloud computing
One specific privacy risk in cloud computing Indirectly private information (collectively information) Normal service processes and functions (not disruption)
The approach: noise obfuscation for privacy protection
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Privacy Protection in Cloud
Roles in the view of privacy in regular IT system Privacy owner, Privacy user and Privacy theft
Privacy ownerPrivacy theft
Privacy userKeep safe between Privacy owner and Privacy user!
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Privacy Protection in Cloud Microsoft’s View on Cloud Ecosystem
Powerful, Green and Smart Cloud—IBM
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Privacy Protection in Cloud Roles in the view of privacy in Cloud
Privacy owner, privacy user and privacy theft
Privacy ownerPrivacy theft
Privacy user
Virtualisation disable the “keeping safe between Privacy owner and Privacy user!”
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Noise Obfuscation(1) Background
Massive data stores and computes in open cloud environments. Customers cannot control inside cloud.
Main idea: “Dilute” real private information with noise information Not noise signal!
Real Information
Noise Information
Final Information
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Noise Obfuscation(2) A Motivating example:
One customer, who often travels to one city in Australia, like ‘Sydney’, checks the weather report regularly from a weather service in cloud environments before departure. The frequent appearance of service requests about the weather report for ‘Sydney’ can reveal the privacy that the customer usually goes to ‘Sydney’. But if a system aids the customer to inject other requests like ‘Perth’ or ‘Darwin’ into the ‘Sydney’ queue, the service provider cannot distinguish which ones are real and which ones are ‘noise’ as it just sees a similar style of service request. These requests should be responded and cannot reveal the location privacy of the customer. In such cases, the privacy can be protected by noise obfuscation in general.
From ‘data’ privacy to ‘process’ privacy!
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Noise Generation Historical probability based noise generation strategy Time-series pattern based noise generation strategy Association probability based noise generation strategy ……
Noise Utilisation Trust model and injection strategy for noise obfuscation ……
Noise Cooperation Mechanism Privacy protection framework under noise obfuscation
Research Topics
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
84
Wenhao [email protected]
Cost-Effective Data Reliability Assurance in the Cloud
Research Topics
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
The growing of Cloud data: It is estimated that by 2015 the data stored in the Cloud will
reach 0.8 ZB, while more data are stored or processed temperately in their journey. (IDC)
The size of Cloud applications is also expanding
Challenge: How to reduce the data storage cost for using Cloud storage
services without sacrificing data reliability assurance.
Background
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Data reliability modeling in the Cloud Replication-based cost-effective data reliability
management approaches Data loss detection and data recovery
Research issues
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Incremental replication strategy CIR (Cost-effective Incremental Replication)
The generation of replicas follows an incremental pattern, in which replica is created only when current replicas cannot provide sufficient data reliability assurance to meet users requirement.
Data reliability management mechanism based on proactive replica checking PRCR (Proactive Replica Checking for Reliability)
According to different data reliability requirements, each file have no more than two replicas stored in the Cloud.
A replica checking process is proactively conducted to detect data loss and recover replica.
Replication-based Approaches
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
88
CIR can significantly reduce at most 2/3 of current Cloud storage cost, especially for data with short storage duration and low data reliability.
PRCR can reduce 1/3 to 2/3 of current Cloud storage cost, especially when the data amount is big.
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
89
Dahai [email protected]
Cloud Workflow System Design and Development
Research Topics
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
SwinCloud – Cloud Computing Testbed
SwinCloud
90
Swinburne Computing Facilities
Astrophysics Supercomputer
VMware
Cloud Simulation Environment
Data Centres with Hadoop
· GT4· SuSE Linux
Swinburne CS3
…...
…...
· GT4· CentOS Linux
Swinburne ESR
…...
…...
· GT4· CentOS Linux
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Prototype: SwinDeW-C Cloud Workflow System
SwinDeW-C
91
Activity
Workflow Execution
UKVPAC
HongKong
SwinburneCS3
· SwinDeW-G· GT4· CentOS Linux
BeihangCROWN· SwinDeW-G· CROWN· Linux
SwinburneESR
· SwinDeW-G· GT4· CentOS Linux
AstrophysicsSupercomputer
· SwinDeW-G· GT4· SuSE Linux
PfC
na 1na
2na
3na 4na
5na 6na Na
ma 1ma
2ma
3ma 4ma
5ma 6ma Ma
Amazon Data Centre
Google Data Centre
Microsoft Data Centre
SwinDeW-G Grid Computing Infrastructure
Commercial Cloud
Infrastructure
VMVMVM VM VMVMVM VMVMVMVMVM
……..
……..
……..Application
Layer
Platform Layer
Unified Resource
Layer
Fabric Layer
SwinCloud……..
VM
SwinDeW-C Peer
SwinDeW-C Coordinator Peer
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
New Progress Successfully deploy on the Amazon Cloud
Eucalyptus: the cloud infrastructure platform
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
Call for paper and call for workshop 2012 International Conference on Cloud and Green
Computing, Nov. 1-3, 2012, Xiangtan, Hunan, China http://kpnm.hnust.cn/confs/cgc2012/
Important Dates: Workshop Proposal: Ongoing as received Submission Deadline:
June 30, 2012 Authors Notification: July 30, 2012 Final Manuscript Due: August 10, 2012 Registration Due: August 18, 2012
93
Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Friday, April 21, 2023
End - Q&A Thanks for your attention!
94