A PRESENTATION ABOUT LOGISTICS. A GENERAL OVERVIEW Prepared for by Paraschos Maniatis.
Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan...
-
Upload
gillian-may -
Category
Documents
-
view
221 -
download
0
Transcript of Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan...
![Page 1: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/1.jpg)
QUERY-BASED DATA PRICING
Paraschos KoutrisPrasang UpadhyayaMagdalena BalazinskaBill HoweDan Suciu
University of WashingtonPODS 2012
![Page 2: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/2.jpg)
2
MOTIVATION
• Data is increasingly sold and bought on the web• Websites that sell data:
– AggData [www.aggdata.com]
– Xignite (financial data) [www.xignite.com]
– Gnip (social media) [www.gnip.com]
• Data marketplace services:– Windows Azure Marketplace (100+ datasets) [datamarket.azure.com]
– Infochimps (15,000 datasets) [www.infochimps.com]
Query-based pricing customized for buyers
![Page 3: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/3.jpg)
3
CURRENT PRICING (1)
• A fixed price for the whole dataset or for a specific set of views
• Example: CustomLists– USA Business Database for $399– Email addresses for $299– Businesses in WA for $199
• Limitations:– Restaurants in WA ?– Businesses in cities with population >100,000 ?
![Page 4: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/4.jpg)
4
CURRENT PRICING (2)
• API Subscriptions (Azure Marketplace, Infochimps)– Allow queries over the data– Pay by number of transactions (page of results)
![Page 5: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/5.jpg)
5
ISSUES WITH PRICING
• Buyers today need to buy a superset of the data they are interested in
• Sellers can’t easily anticipate all possible queries that buyers might ask
• Solution: we need a more flexible pricing scheme, parameterized by queries
![Page 6: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/6.jpg)
6
OUTLINE
1. The Pricing Framework
2. The Pricing Formula
3. The Complexity of Pricing
4. Dichotomy and Algorithms for Selections
![Page 7: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/7.jpg)
7
THE PRICING FRAMEWORK
• The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), … }
• A buyer can buy any query Q • The system will compute priceD
S(Q)
Seller
V1,p1
V2,p2
…
Buyer Q(D) ?
Pricing System+
Database D
priceDS(Q)
![Page 8: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/8.jpg)
8
INSTANCE-BASED DETERMINACY
Definition. V = V1,…,Vk determine Q given D, denoted D ⊢ V ↠Q, if: forall D’, if V(D) = V(D’), then Q(D) = Q(D’)
Intuitively, “V1,…, Vk determine Q” means that Q(D) can be answered only from V1(D),…,Vk(D), without accessing the database instance D
![Page 9: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/9.jpg)
9
ARBITRAGE-FREE
Suppose V determines Q and priceD(Q) > priceD(V). Then, we can
1. buy V(D) for priceD(V)
2. compute Q(D) from V(D)3. now we have answered Q at some price
p<priceD(Q)
Axiom 1.Given D, the pricing function priceD(Q) is arbitrage-free if for all views V1, …, Vk and query Q where D
⊢ V1, …, Vk ↠ Q: priceD(Q) ≤ priceD(V1) + … + priceD(Vk)
![Page 10: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/10.jpg)
10
DISCOUNT-FREE
• The intuition is that the price points represent discounts that the seller offers relative to the price of the whole database
• A pricing function is discount-free if it is maximal
Axiom 2.The pricing function priceD(Q) should not offer any other additional discounts except for the explicit price points defined by the seller.
![Page 11: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/11.jpg)
11
EXAMPLE: ORIGAMI DATABASE
![Page 12: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/12.jpg)
12
EXAMPLE: ORIGAMI DATABASE
Shape Color Picture
Swan White . . . . .
Swan Yellow . . . . .
Dragon Yellow . . . . .
Car Yellow . . . . .
Fish White . . . . .
View Price
V1(x,y,z) :- S(x,y,z), x=‘Swan’ $2
V2(x,y,z) :- S(x,y,z), x=‘Dragon’ $2
V3(x,y,z) :- S(x,y,z), x=‘Car’ $2
V4(x,y,z) :- S(x,y,z), x=‘Fish’ $2
W1(x,y,z) :- S(x,y,z), y=‘White’ $3
W2(x,y,z) :- S(x,y,z), y=‘Yellow’ $3
W3(x,y,z) :- S(x,y,z), y=‘Red’ $3
Price pointsDatabase S Get all dragonorigami for $2
Get all red origami for $3
What is the price of the entire database? Q(x,y,z) :- S(x,y,z)
Exhausts the active domain
V1, V2, V3, V4 determine Q: price(Q) ≤ $8W1, W2, W3 determine Q: price(Q) ≤ $9 price(Q)=$8
![Page 13: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/13.jpg)
13
EXAMPLE: ORIGAMI DATABASE
Shape Color Picture
Swan White . . . . .
Swan Yellow . . . . .
Dragon Yellow . . . . .
Car Yellow . . . . .
Fish White . . . . .
What is the price of the full join? Q(x,y,z,u,v) :- R(x,u), S(x,y,z), T(y,v)
Shape Instructions
Swan fold, cut, fold…
Dragon cut, fold, cut,…
Color PaperSpecs
White 15g/100, $10
Black 20g/100, $15
p(σshape)=$99 p(σcolor)=$50
p(σcolor)=$5p(σshape)=$2
R
S T
![Page 14: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/14.jpg)
14
OUTLINE
1. The Pricing Framework
2. The Pricing Formula
3. The Complexity of Pricing
4. Dichotomy and Algorithms for Selections
![Page 15: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/15.jpg)
15
THE QUERY PRICING FORMULA
15
Given:1. Price points S = {(V1,p1),…,(Vk, pk)}2. Database instance D3. Query Q.
Compute: priceDS(Q)
Properties: (a) arbitrage-free, (b) discount-free, (c) priceDS(Vi)=pi
If it exists, we say that the price points are consistent
Theorem.(a)The price points are consistent iff pD(Vi)=pi for any price point i=1,…,k(b) priceD
S(Q) = pD(Q) is the unique arbitrage-free, discount-free pricing function that agrees with the price points
Method:• Consider all subsets of V ={V1,…,Vk} that determine Q• Let C be the subset with the minimum price, Σi pi, for Vi in C• Define pD(Q) = Σi pi
![Page 16: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/16.jpg)
16
DISCUSSION
• If the result of Q1 is always a subset of Q2, should Q1 be priced less than Q2? No!
Example:– V(x,y) :- Fortune500(x,y)
Q(x,y) :- Fortune500(x,y), StrongBuyRec(x)– price(Q) >> price(V)
• We ignore computation costs in our framework– Cost of computing query Q– Q(D)=f(V(D)), but f can be hard to compute
![Page 17: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/17.jpg)
17
OUTLINE
1. The Pricing Framework
2. The Pricing Formula
3. The Complexity of Pricing
4. Dichotomy and Algorithms for Selections
![Page 18: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/18.jpg)
18
DETERMINACY
Definition. [Instance-independent]V determines Q, denoted as V Q, if:↠forall D, D’, if V(D) = V(D’), then Q(D) = Q(D’)
[Nash, Segoufin, Vianu ‘07]
V ↠ Q iff there exists a function f such that Q(D) = f(V(D)) for all D
iff for every D, we have that D V Q ⊢ ↠
Definition. [Instance-dependent]V determines Q given D, denoted as D V Q, if:⊢ ↠forall D’, if V(D’) = V(D), then Q(D) = Q(D’)
![Page 19: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/19.jpg)
19
COMPLEXITY OF DETERMINACY
V, Q are UCQ V, Q are CQ
Instance-independentV Q↠
Undecidable[NSV ’07]
?
Instance-dependentD V Q ⊢ ↠
data coNP-complete[this paper]
coNP-complete [this paper]
combined
Π2P
[this paper]Π2
P
[this paper]
Open Question: is the bound on the combined complexity tight?
![Page 20: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/20.jpg)
20
COMPLEXITY OF PRICING
Corollary.Deciding whether priceD
S(Q) ≤ k is:• Combined complexity [input S, D]: Σp
2
• Data complexity [input D]: coNP-hard
Proposition.Pricing is at least as hard as determinacy
How do we deal with the hardness of computation?
![Page 21: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/21.jpg)
21
OUTLINE
1. The Pricing Framework
2. The Pricing Formula
3. The Complexity of Pricing
4. Dichotomy and Algorithms for Selections
![Page 22: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/22.jpg)
22
RESTRICTING PRICE POINTS TO SELECTIONS
• A seller can specify only the prices of selection queries of the form σR.X=a: prices on columns
• The domain of each column is finite and known to buyers and sellers
• Price points on selections is how prices are set in most cases today
![Page 23: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/23.jpg)
23
DICHOTOMY THEOREMTheorem.Assuming selection views only, for any Conjunctive Query w/o self-joins Q, one of the following holds (data complexity):(a) priceQ
S(D) is in PTIME(b) checking whether priceQ
S(D)≤k is NP-complete
• PTIME:– Q(x,y,z,u,v) :- R(x,u),S(x,y,z),T(y,v) [Chains]– Q(x1,…,xk) :- R1(x1,x2),…,Rk(xk,x1) [Cycles]
• NP-complete: – Q(x) :- R(x,y) [Projections]– Q(x,y,z) :- R(x,y,z),S(x),T(y),U(z)
![Page 24: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/24.jpg)
24
ALGORITHM FOR PTIME CASES
• The algorithm uses a reduction to maximum flow• Edges of finite capacity represent price points• A set of edges of finite cost is a cut iff they
determine the query• Example:
– Chain query Q(x,y):-R(x),S(x,y),T(y)
X
a1
a2
X Y
a1 b1
a2 b2
a2 b2
a3 b2
a4 b1
Y
b1
b3
Dom(X) = {a1,a2,a3,a4}Dom(Y) = {b1,b2,b3}
R
S
T
![Page 25: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/25.jpg)
FLOW GRAPH
25
a4
a3
a2
a1
R
b1
b2
b3
T
b1
b2
b3
S
a4
a3
a2
a1
X
a1
a2
X Y
a1 b1
a2 b2
a2 b2
a3 b2
a4 b1
Y
b1
b3
RS
T
A set of edges of finite cost is a cut iff they determine the query
![Page 26: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/26.jpg)
26
CONCLUSIONS
• Summary:– The seller sets prices to some views, while the system
computes the price of any query– Interesting application of query determinacy– Complexity: dichotomy for CQs w/o self-joins
• Future Work:– Pricing in the presence of updates– How do we overcome pricing for intractable queries?– Connection of pricing and privacy
![Page 27: Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649e9f5503460f94ba24ee/html5/thumbnails/27.jpg)
27
Thank you !