Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj...

31
Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image: National Geographic

Transcript of Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj...

Page 1: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

ConclaveSecure Multi-Party Computation on Big Data

1Nikolaj Volgushev Malte Schwarzkopf Ben Getchell

Andrei Lapets Mayank Varia Azer Bestavros

Image: National Geographic

Page 2: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Pre-presentation disclaimers/ confessions

• Less focus on developing new MPC protocols

• Instead focus on integrating MPC into big data analytics domain where the current roadblocks are:

- Accessibility (data analysts are not MPC experts)

- Scalability (large data volumes)

• Semi-honest adversaries throughout! 😈

2

Page 3: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

🏛#

$🚕

&🚗How concentrated is the market for

hired vehicles in NYC?

&🚙3

Is there healthy competition?

A dangerous monopoly forming?

Page 4: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Secure MPC 🏛#

$🚕

&🚙

🔒

4

A solution: Secure MPC

&🚗

🔒

🔒

🔒

🔒

🔒

How concentrated is the market for hired vehicles in NYC?

📊

🔒

🔒

🔒

🔒?!?

Page 5: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

5

PROJECT

SUM

a 11 3a 12 4c 23 1

a 1b 2b 0a 3

JOIN

a 11a 12c 23

a 4b 2

a 11 a 4a 12 a 2

DECLARE TABLE trips (start_lat int, start_lon int, … );SELECT SUM(mkt_share*mkt_share) AS hhi, SUM(trips.price) AS … WHERE start_lat = … AND …;

Page 6: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Secure MPC 🏛#

$

&

🚕

🚗

&🚙

🔒

175M annual trips!

6

Page 7: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Does MPC scale? — Agg: SUM

7

bette

r

N.B.log-scale!

Page 8: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Does MPC scale? — Join

8

bette

r

Page 9: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Our TODO list

• Make MPC more accessible to data analysts

• Make MPC scale for common analytics queries

9

Page 10: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Conclave

10

Key insight:

For most queries, only some of the work musthappen under MPC.

Automatically rewrite query to combine scalable local computation and MPC.

Page 11: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Contributions

11

1. MPC query compilation: relational query compiler that optimizes for efficient MPC.

2. Automated analysis to minimize MPC use while maintaining required guarantees.

3. Hybrid operators: new MPC protocols that give the option to relax privacy requirements to further accelerate expensive operators under MPC.

4. Prototype query compiler implementation using Spark and Sharemind & performance evaluation.

Page 12: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

$🚕 &🚗 &🚙

12

Conclavecompiler

Conclavecompiler

Conclavecompiler

DECLARE TABLE trips (start_lat int, start_lon int, … );SELECT SUM(mkt_share*mkt_share) AS hhi, SUM(trips.price) AS … WHERE start_lat = … AND …;

Conclavedispatcher

Conclavedispatcher

Conclavedispatcher

Secure MPC 🔒

Page 13: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

# compute the Herfindahl-Hirschman Index (HHI)rev = taxi_data.project(["companyID", "price"]) .sum("local_rev", group=[“companyID”], over="price") .project([0, "local_rev"]) market_size = rev.sum(“total_rev", over=“local_rev") share = rev.join(market_size, left=[“companyID"], right=[“companyID"]) .divide("m_share", "local_rev", by="total_rev") hhi = share.multiply(share, "ms_squared", "m_share") .sum(“hhi", on="ms_squared") hhi.writeToCSV()

13

Relational query specificationParties’ data treated as single relation

Page 14: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

import conclave as cc pA, pB, pC = cc.Party(mpc.a.com), […], cc.Party(mpc.c.org) schema = [Column("companyID", cc.INTEGER), … Column("price", cc.INTEGER)] # 3 parties each contribute inputs with the same schemataxi_data = cc.defineTable(schema, at=[pA, pB, pC])

# compute the Herfindahl-Hirschman Index (HHI) rev = taxi_data.project(["companyID", "price"]) .sum("local_rev", group=[“companyID”], over="price") .project([0, "local_rev"]) market_size = rev.sum(“total_rev", over=“local_rev") share = rev.join(market_size, left=[“companyID"], right=[“companyID"]) .divide("m_share", "local_rev", by="total_rev") hhi = share.multiply(share, "ms_squared", "m_share") .sum(“hhi", on="ms_squared") hhi.writeToCSV(to=[pA])

14

Relational query specification

Page 15: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Contributions

15

1. MPC query compilation: relational query compiler that optimizes for efficient MPC.

2. Automated analysis to minimize MPC use while maintaining required guarantees.

3. Hybrid operators: new MPC protocols that give the option to relax privacy requirements to further accelerate expensive operators under MPC.

4. Prototype query compiler implementation using Spark and Sharemind & performance evaluation.

Page 16: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

16

🔒

$🚕 &🚗 &🚙

🏛#

SUM🔒

MULTIPLY🔒

DIVIDE🔒

JOIN🔒

PROJECT🔒

CONCATENATE🔒

SUM🔒

DIVIDE by 10k🔒

PROJECT🔒

CONCATENATE🔒

proj(concat(a, b)) = concat(proj(a), proj(b))

Page 17: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

17

🔒

$🚕 &🚗 &🚙

🏛#

SUM🔒

MULTIPLY🔒

DIVIDE🔒

JOIN🔒

PROJECT🔒

CONCATENATE🔒SUM🔒

DIVIDE by 10k🔒

PROJECT🔒 PROJECT🔒

PROJECT🔒

proj(concat(a, b)) = concat(proj(a), proj(b))

Page 18: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

18

🔒

$🚕 &🚗 &🚙

🏛#

SUM🔒

MULTIPLY🔒

DIVIDE🔒

JOIN🔒

PROJECT

CONCATENATE🔒SUM🔒

DIVIDE by 10k🔒

PROJECT PROJECT

PROJECT🔒

sum(concat(a, b)) = sum(concat(sum(a), sum(b)))

Page 19: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

19

🔒

$🚕 &🚗 &🚙

🏛#

SUM🔒

MULTIPLY🔒

DIVIDE🔒JOIN🔒

PROJECT

CONCATENATE🔒

SUM🔒

DIVIDE by 10k🔒

PROJECT PROJECT

PROJECT🔒

SUMSUM SUM

Page 20: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Contributions

20

1. MPC query compilation: relational query compiler that optimizes for efficient MPC.

2. Automated analyses to determine which parts of a query must run under MPC.

3. Hybrid operators: new MPC protocols that give the option to relax privacy requirements to further accelerate expensive operators under MPC.

4. Prototype query compiler implementation using Spark and Sharemind & performance evaluation.

Page 21: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

21

🏛+ &💳

$💳(ssn, zip) (ssn, assets) (ssn, assets)

Page 22: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

JOIN ON ssn🔒

import conclave as cc pA, pB, pC = cc.Party(“mpc.ftc.gov"), cc.Party(“mpc.a.com"), \ cc.Party(“mpc.b.cash”)demo_schema = [Column(“ssn", cc.INTEGER), Column(“zip”, cc.INTEGER)] demographics = cc.defineTable(demo_schema, at=pA) # credit card companies trust the regulator to compute on SSNs bank_schema = [Column("ssn", cc.INTEGER, trust=[pA]), Column("assets", cc.INTEGER)]scores-1 = cc.defineTable(bank_schema, at=[pB])…

22

Regulator trusted with SSN columns

🏛+ &💳

$💳(ssn, zip) (ssn, assets) (ssn, assets)

CONCATENATE🔒

(ssn)

…🔒

🔒 HYBRID JOIN 🔒

Page 23: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

23

Hybrid operators• Two roles in hybrid operator scenario:

- Semi-trusted party (STP) may learn a specific column in the clear; does not collude with other parties

- Untrusted parties may not learn anything

• Goal:

- Outsource expensive sub-steps to STP for local processing

- Without leaking information to untrusted parties

23

Page 24: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

24

Complexity (Oblivious)

Complexity (Hybrid)

Bottle-neck operation (Oblivious)

Bottle-neck operation (Hybrid)

Join O(n2)comparisons

O(n+m log (n+m))multiplications

(where m is size of result)

Pair-wise comparison

between all rows

Batched oblivious array access

Aggregation O(n log2 n)comparisons

O(n log n)multiplications Oblivious sort Oblivious shuffle

Page 25: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Evaluation1. How does Conclave scale to increasingly large inputs?

2. How much does automatic MPC frontier placement reduce query runtime?

3. What impact do hybrid operators have on query runtime?

• Three parties3 VM Spark cluster + Sharemind endpoint at each

• Two queries 1. Taxi market concentration: up to 1.3B trip records2. Credit card regulation: up to 100k SSNs

25

Page 26: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

26

Taxi market concentration querybe

tter

Five orders of magnitude!

Page 27: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

27

Credit card regulation querybe

tter

Four orders of magnitude!

Page 28: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Related work

• Mixed-mode MPC: Wysteria [S&P 2014] — custom DSL

• Query rewriting for MPC

• SMCQL [VLDB 2017]: binary public/private columns, no hybrid operators

• Opaque [NSDI 2017]: computation under SGX, focus on reducing oblivious shuffles

28

Page 29: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Summary• Conclave is a query compiler for efficient MPC on “big

data”

• Automatically shrinks MPC step to be as small as possible

• New hybrid MPC-cleartext protocols speed up operators

• Scales up to 5 orders of magnitude better than pure MPC

https://github.com/multiparty/conclave

29

Page 30: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

Conclave Implementation

• Relational front-end

• Rewrite rules on intermediate DAG of operators

• Back-ends generate code

- Cleartext: Spark, sequential Python

- MPC: Sharemind, Obliv-C (partial support)

• ~5,000 lines of Python

30

Page 31: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:

31

Hybrid MPC-cleartext operator impact

Join Aggregation