Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization:...

26
Secure Parallel Computation on National-Scale Volumes of Data Sahar Mazloom, Phi Hung Le, Samuel Ranellucci, S. Dov Gordon

Transcript of Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization:...

Page 1: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Secure Parallel Computationon National-Scale Volumes of DataSahar Mazloom, Phi Hung Le, Samuel Ranellucci, S. Dov Gordon

Page 2: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

User

Learning on User Data

User

Data Model

Machine Learning Engine

User Data

2

Page 3: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

3

“Computation on Encrypted Data”

Goal: Creating methods for parties to jointly compute a function over their inputs while keeping those inputs private.

Secure Multi-Party Computation:

Page 4: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Example of a ML algorithm(Sparse) Matrix Factorization:• De-composition of a sparse

matrix of Ratings (R), into Users’ (U), and Items’ (V) matrices.

Gradient Descend:• An optimization algorithm

used in many machine learning algorithms.

4

R VU ✕⟹

Items

Use

rs

items

Use

rs

d

d

Matrix Factorization for Recommendation Systems

Page 5: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Matrix Factorization using Gradient Descentfor Movie Recommendation

5

R VU ✕⟹

Movies

Use

rs

Movie ProfilesUse

r Pr

ofile

s

d

d

rij ui

vj

𝑟!" ≈ 𝑢! , 𝑣"

Objective function: 𝐿 = min∑(!,")&' 𝑟!," − 𝑢! , 𝑣"( + 𝜇∑!&) 𝑢! ( + 𝜆∑"&* 𝑣"

(

User gradient: 𝛿+! = −2∑"&* 𝑣" 𝑟!" − 𝑢! , 𝑣" + 2𝜇𝑢!Movie gradient: 𝛿," = −2∑!&) 𝑢! 𝑟!" − 𝑢! , 𝑣" + 2𝜆𝑣"

Page 6: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Distributed Graph Parallel Computation

Non-secure Frameworks:MapReduce, GraphLab, PowerGraph [Gonzalez et al. 2012]

Supported algorithms: Matrix Factorization, Histogram, PageRank, Markov Random Field Parameter Learning, Name Entity Resolution, …

Users Movies

6

Page 7: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

PowerGraph:Think as a

Vertex!Move computation

to data

GAS model of Operation

7

Page 8: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Secure Graph Parallel Computation

• GraphSC [Nayak et al. SP’15]

• Use Oblivious Sort to hide node degree and edge structure

Users Movies

Complexity:𝑂( 𝐸 + 𝑉 𝐥𝐨𝐠𝟐 𝐸 + 𝑉 )

V vertices, E edges

Running time: 6K users, 4K movies, 1 M Ratings => 13 Hrs

8Threat model: Honest-But-Curious Adversary

Page 9: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Primary Question [Mazloom, Gordon CCS’18]

Can we make secure computation algorithms faster

if we allow something small to be learned?

And prove the leakage is Differentially Private!

9

Page 10: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Differentially-Oblivious Graph Parallel Computation

• OblivGraph [Mazloom, Gordon CCS’18]

• Noisy node degree by adding dummy edges• No. of dummy edges determined by DP parameters• Use Oblivious Sort to hide the edge structure

Users Movies

Shuffle

10V vertices, E’ all edges

Complexity:𝑂( 𝐸4 + 𝛼 𝑉 𝐥𝐨𝐠 𝐸′ + 𝛼 𝑉 ), 𝛼: O !"# $%!"#&

'

Page 11: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

OblivGraph [Mazloom, Gordon CCS’18]

input

Alice Bob

users

input input

11

Page 12: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

12

OblivShuffle

OblivGather

OblivApply

OblivScatter

Alice Bob

OblivGraph [Mazloom, Gordon CCS’18]

Running time: 6K users, 4K movies, 1 M Ratings => 2 Hrs

Threat model: Honest-But-Curious Adversary

Page 13: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Current Question

Can we make these differentially private secure

computation algorithms even faster?

13

Page 14: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Can we do better?

q Low communication MPC [Gordon et al. Asiacrypt’18]

q Differentially Private Leakage in Secure Computation [Mazloom, Gordon CCS’18]

q Graph Parallel Computation => Constructing an MPC protocol that can

14

MF on 20 million inputs < 6 mins (MovieLens dataset)Histograms on 300 million inputs in only 4 mins (Counting users in each zip code)

Running time: 6K users, 4K movies, 1 M Ratings => 25 Sec

Page 15: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Key playing factors

ü Using 4 computation servers instead of 2 ü Linear Oblivious Shuffle instead of Quasi-Linear OblivShuffleü Fixed-Point Arithmetic Computation instead of Boolean Circuitü Secure against one malicious adversary

15

Page 16: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Challenge 1

16

o The party that access the data should NOTlearn the shuffling pattern

Merging these construction ==> Opportunities and Challenges

ü Partition the tasks between 4 parties:Group 1: Shuffle the dataGroup 2: Access the data

Solution

Page 17: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Challenge 2

17

o Secure against active adversary

ü Verifying the result of each operation to detect cheating behaviorSolution

Page 18: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Alice Bob

OblivShuffle

Verify Gather

Verify Scatter

Charlotte David

Verify Shuffle

OblivGather

OblivApply

OblivScatter18

Malicious-secure 4-party Secure Parallel Computation

OblivApplyCross-Check

Improved by 230X

Improved by 880X

Page 19: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Challenge 3

19

o Fixed-Point Arithmetic Computation

ü Truncation and handling the rounding errorSolution

Page 20: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Cross–Check Verification after Apply phase

21

Alice & Bob compute on some masked values and truncate

Charlotte & David compute on some masked values and truncate

If their results don’t match?

The verification may Fail if data is a decimal value, and it’s NOT because

of malicious behavior!

Abort

inspired by [Gordon et al. 2018]

Page 21: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Implementation Results

22

Implemented in C++, run experiments on AWS

Multiple benchmark algorithms, including Matrix Factorization and Histogram

4 computation servers, 32 cores each, 10 Gbps network

Input size and privacy parameters for different experiments

Page 22: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Run Time on National-Scale Histogram Problem

23

Run time (s) for computing Histogram problem on different input sizes (LAN)Counting people in each zip code

Page 23: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Run Time Large-Scale MF Problem

24

Run time (s) for computing Matrix Factorization problem on real-world dataset, MovieLens on different input sizes for Movie Recommendation

Page 24: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Run Time Comparison with previous works

25

Run time comparison on this work vs. OblivGraph vs. GraphSC. Single iteration of Matrix Factorization on real- world dataset, MovieLenswith 6K users ranked 4K movies with 1M ratings

Page 25: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Summarize

Goal: Learning on large-scale data with security and privacy

• Secure MPC for Privacy Preserving Machine Learning• Secure against one malicious corruption• Leverage Differential Privacy to improve efficiency

26

Page 26: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’

Thanks!Q&A

27

Secure Parallel Computationon National-Scale Volumes of DataSahar Mazloom, Phi Hung Le, Samuel Ranellucci, S. Dov Gordon

[email protected] is publicly available!