Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: •...

56
Analysis of Data-Centric Workflows Victor Vianu UC San Diego & INRIA-Saclay

Transcript of Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: •...

Page 1: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Analysis of Data-Centric Workflows

Victor Vianu UC San Diego & INRIA-Saclay

Page 2: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Workflows centered around data

Increasingly common:

•  E-commerce •  Supply chain management •  Case management in health-care •  Scientific workflows •  Business processes

….

Complex logic ! need for analysis tools

Cautionary tale: healthcare.gov !

Page 3: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Analysis tasks

•  Automatic verification of desirable properties “no product can be shipped before being paid” “if a product is paid, it is eventually delivered or the customer is refunded” •  Comparison, customization, optimization of workflows “is my workflow equivalent to yours ?” •  Runtime analysis “what can I infer from my local view of the workflow?” “what should I do next?”

Page 4: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Long collaboration with INRIA

•  Automatic Verification PODS 2008, ICDT 2009, TODS 2009, ICDT 2011, BPM 2011, TODS 2012 •  Comparison, customization, optimization of workflows

ICDT 2011, TODS 2012 •  Runtime analysis for collaborative workflows

PODS 2013

Page 5: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

What is a data-centric workflow?

•  States (data) •  Events (with data) •  Transitions

state

state

state

state event

event

event

Page 6: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

DB transition system

....

Infinite-state system

Page 7: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Specifications of data-centric workflows

Two frameworks used in our work: •  Active XML (INRIA Gemo): XML with embedded functions events: function calls

•  Business Artifacts (IBM Yorktown) relational database, evolving “artifact” tuple events: pre/post conditions

Page 8: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

AXML workflow by example

Mail-Order-Center

Catalog !GetOrder

Page 9: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Mail-Order-Center

Catalog

Customer

Order

Product

“Joe” “ipod80G”

!Bill !Credit check

!Deliver

AXML workflow by example

Page 10: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Mail-Order-Center

Catalog Order

Customer Product

“Joe” “ipod80G”

!Bill !Credit check

!Deliver

Invoice

Customer

“Joe”

Amount

“$399”

guard: product available?

AXML workflow by example

Page 11: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Mail-Order-Center

Catalog Order

Customer Product

“Joe” “ipod80G”

!Credit check

!Deliver Payment

“VISA” “$399”

AXML workflow by example

Page 12: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Mail-Order-Center

Catalog Order

Customer Product

“Joe” “ipod80G”

Payment !Credit check

!Deliver

“VISA” “$399”

Request

Customer

“Joe”

Amount

“$399”

guard: paid correct amount ?

AXML workflow by example

Page 13: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Mail-Order-Center

Catalog Order

Customer Product

“Joe” “ipod80G”

Payment !Deliver

“VISA” “$399”

Rating

“Good”

AXML workflow by example

Page 14: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Mail-Order-Center

Catalog Order

Customer Product

“Joe” “ipod80G”

Payment Rating !Deliver

“VISA” “$399”

“Good”

guard: rating good or excellent ?

AXML workflow by example

Page 15: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Mail-Order-Center

Catalog Order

Customer Product

“Joe” “ipod80G”

Payment Rating !Deliver

“VISA” “$399”

“Good”

Delivery

Customer

“Joe”

Product

“ipod80G”

guard: rating good or excellent ?

AXML workflow by example

Page 16: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Mail-Order-Center

Catalog Order

Customer Product

“Joe” “ipod80G”

Payment Rating Delivered

“VISA” “$399”

“Good”

AXML workflow by example

Page 17: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

IBM’s Business Artifacts

Tuple artifacts: •  evolving tuples of data values •  fixed underlying database •  services (pre/post conditions on tuple and a global relational database)

Page 18: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Example Status   Prod_id   Ship_type   Coupon   Amount  

owed  Amount  paid  

Amount  refunded  

Single  ar*fact  record:  

PRODUCTS(id,  price,  availability,  weight)  COUPONS(code,  type,  value,  min_value,  free_shiptype)  SHIPPING(type,  cost,  max_weight)  OFFERS(prod_id,  discounted_price,  ac*ve)  

Underlying  database:  

Page 19: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Services:  

Services, by example

Status   Edit  shiptype  

Product  id   23  

Ship  type  

Coupon  

…  

Current  snapshot:  

Every  precondi*on  is  evaluated  on  the  current  snapshot  

Status   Edit  coupon  

Product  id   23  

Ship  type   Fedex  2-­‐day  

Coupon  

Successor  snapshot:  

Status   Edit  shiptype  

Product  id   23  

Ship  type  

Coupon  

Edit  coupon  

23  

Fedex  2-­‐day  

processing  

23  

Fedex  2-­‐day  

$50  coupon  

…  

Run  (infinite  sequence  of  successive  ar*facts):  

DB  

Modify  product  

pre   post  

Choose    shiptype  

pre   post  

Service  n  pre   post  

One  is  chosen  non-­‐determinis*cally  

Page 20: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Specifying temporal properties of runs LTL + statements about data

Linear-time temporal logic

always ( p ! eventually q)

Page 21: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

LTL(L) propositions ! formulas in a db language

L p q

...

ϕp ϕq

Page 22: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

LTL(FO) property:

“every paid product is eventually delivered”

always ( p ! eventually q)

delivered (x) ∃y (pay(x,y) ∧ price(x,y))

∀x

Page 23: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Variant for AXML workflows

always (p ! eventually q) ∀ X

Tp(X) Tq(X)

tree patterns with free variables X

Page 24: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

always(p ! eventually q) ∀ X

Order

Product

X

Delivered

Mail-Order-Center Tq(X) Mail-Order-Center

Order Customer Product

X Y Z

Product

Price

Y Z

Mail-Order-Center

Order

Payment

X

Product

Price

X Y

Name

Tp(X)

Y

“every paid product is eventually delivered”

LTL(XPath)

Page 25: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

VERIFIER  

ϕ  A  

YES   NO,  counterexample  

The verification problem

Page 26: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Results on verification

•  Business artifacts: UCSD + INRIA + IBM •  Active XML: UCSD + INRIA

Restrictions that guarantee decidability

Boundary of decidability, complexity

Page 27: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Results on verification

•  Tuple artifacts: UCSD + INRIA + IBM •  Active XML: UCSD + INRIA

Restrictions that guarantee decidability

Efficient implementation: WAVE verifier

Symbolic model checking: marriage of database and CAV techniques

Page 28: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Long collaboration with INRIA

•  Automatic Verification PODS 2008, ICDT 2009, TODS 2009, ICDT 2011, BPM 2011, TODS 2012 •  Comparison, customization, optimization of workflows

ICDT 2011, TODS 2012 •  Runtime analysis for collaborative workflows

PODS 2013

Page 29: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

How to compare different workflows?

Building a beehive Building an ant nest

Page 30: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Use abstraction!

Building a beehive Building an ant nest

bee

hive

ant

nest

insect

dwelling

Page 31: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Use abstraction!

Building a beehive Building an ant nest

•  Map to a common abstraction •  Define a notion of simulation

Page 32: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Use abstraction!

Building a beehive Building an ant nest

Allows to compare very different workflows

Page 33: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

AXML vs Artifacts

•  AXML workflows can simulate Artifacts with respect to appropriate abstractions •  Artifacts cannot simulate AXML

Page 34: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Long collaboration with INRIA

•  Automatic Verification PODS 2008, ICDT 2009, TODS 2009, ICDT 2011, BPM 2011, TODS 2012 •  Comparison, customization, optimization of workflows

ICDT 2011, TODS 2012 •  Runtime analysis for collaborative workflows

PODS 2013

Page 35: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Collaborative workflows are pervasive

Business processes, health care, scientific workflows, government …

Page 36: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

tasks & honey tasks & honey tasks & honey tasks & honey actions & data actions & data actions & data actions & data

Author PC chair PC member Referee

Some possible actions Author: submit, respond, final, copyright

PC chair: assign, discuss, decide, notify, remind PC member: enter review, invite referee, discuss

Page 37: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

virtual database

Author PC chair PC member Referee

The collaborative workflow model

Simple peer views

tasks & honey tasks & honey tasks & honey tasks & honey actions & data actions & data actions & data actions & data

Page 38: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Submitted Id title author abstract

Submitted Id title author abstract

Author

Submitted Id title abstract

PC member

PC chair

σ Author = self

Page 39: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Peer actions

conjunction of literals in the view sequence of insertions/deletions into the view

Update :- Condition

PC Chair Assign(Id, PCmember), Status(Id, assigned) :- Submitted(Id, title, author, abstract), ¬ Status(Id, assigned) PC(PCmember), ¬ Conflict(author, PCmember)

Page 40: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

action

virtual database

Author PC chair PC member Referee

tasks & honey tasks & honey tasks & honey tasks & honey actions & data actions & data actions & data actions & data

Page 41: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

action

virtual database

Author PC chair PC member Referee

tasks & honey tasks & honey tasks & honey tasks & honey actions & data actions & data actions & data actions & data

Page 42: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Focus: runtime peer reasoning about global run

based on local observations

•  Monitoring events

•  Diagnosing anomalous behavior

•  Competitive advantage

Has my paper been accepted?

A paper was assigned to a PC member with a COI: could the assignment have been made by the PC chair, or was it necessarily made by a track chair?

Is my own submission as a PC member still under discussion?

Page 43: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Runtime peer reasoning about global run

based on local observations

Trace ρ

Infinitely many compatible global runs

Page 44: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Specifying properties of global runs PLTL-FO

PLTL: past linear-time propositional temporal logic

PLTL-FO:

PLTL with propositions interpreted as FO statements

Page 45: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Specifying properties of global runs PLTL-FO

•  My paper (submission 12) has not yet been accepted nor rejected:

always -1 ¬ ∃ review (Accept(12, review) ∨ Reject(12, review))

Page 46: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Specifying properties of global runs PLTL-FO

•  My paper (submission 12) has not yet been accepted nor rejected:

•  Semantics of a formula φ: relative to a trace ρ

Cert(φ) : φ holds in every global run compatible with ρ

Poss(φ) : φ holds in some global run compatible with ρ

always -1 ¬ ∃ review (Accept(12, review) ∨ Reject(12, review))

Page 47: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Given a workflow specification W, a trace ρ and an PLTL-FO property φ of global runs

evaluate Cert(φ) and Poss(φ)

Goal

Theorem: Cert(φ) and Poss(φ) can be evaluated in PSPACE with respect to φ and ρ.

Page 48: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Proof idea

Trace ρ

Infinitely many compatible global runs

Symbolic transition system

Page 49: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Proof idea

Trace ρ

symbolic instances variables + constraints

Page 50: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Proof idea

Trace ρ

Transitions among symbolic instances each computed in PSPACE

Page 51: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Proof idea

Trace ρ

Finite transition system on which Poss(φ) can be evaluated

by a nondeterministic PSPACE search

Cert(φ) = ¬ Poss(¬ φ)

Page 52: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Other results

•  Incremental evaluation •  Pre-emptive monitoring

φ is not known a priori

Can be done for classes of properties sharing a temporal type: underlying propositional PLTL formula

Example: always-1 (p à sometime -1 q)

FO FO

Page 53: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Other results

•  Incremental evaluation •  Pre-emptive monitoring •  Using knowledge about global run in workflow specifications

PC member As long as I don’t know that my paper is accepted,

I will reject every other paper

Reject(ID) :- paper(ID, author), author ≠ self, ¬ cert (“my paper is accepted”)

epistemic atom

Page 54: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Summary

•  Models for data-centric workflows •  Automatic verification •  Framework for comparing workflows •  Runtime analysis for collaborative workflows

Better tools for the design and analysis of data-centric workflows

Page 55: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Summary

•  Models for data-centric workflows •  Automatic verification •  Framework for comparing workflows •  Runtime analysis for collaborative workflows

Exciting area, much more to be done!

Page 56: Analysis of Data-Centric Workflows...Workflows centered around data Increasingly common: • E-commerce • Supply chain management • Case management in health-care • Scientific

Thank you !

Merci !