Privacy-preserving Data Mining and Machine Learning
Transcript of Privacy-preserving Data Mining and Machine Learning
![Page 1: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/1.jpg)
Privacy-preserving Data Mining and Machine Learning
Tokyo Institute of Technology, Jun Sakuma
![Page 2: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/2.jpg)
AgendaBasic concepts of PPDM
Secure multiparty computation, trusted third partyRandomized approach, cryptographic approach
PPDM: Cryptographic approachyp g p ppDefinitionProof methodologyProof methodologyBuilding blocks
Studies from our groupStudies from our groupPrivacy-preserving k-means clustering in P2P (PAKDD2008)Privacy preserving reinforcement learning (ICML2008)Privacy-preserving reinforcement learning (ICML2008)
Future direction of PPDM/ML
2
![Page 3: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/3.jpg)
Basics of PPDM
3
![Page 4: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/4.jpg)
Privacy-preserving Data Mining
X X×Alice BobXA XB×
Share mining results only!
Alice holds a private dataset XABob holds a private dataset XBPrivacy-preserving data mining (PPDM) problem:
Both do not wish to share their datasetsTh i h t t ifi d t i i j i t d t t X XThey wish to execute a specific data mining over joint dataset XA ∪ XB
At the end, they wish to share only mining results
4
![Page 5: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/5.jpg)
Scenario 1: Identification of epidemic source
Hospital Airline
individuals
Outbreak of epidemicHospitals wish to identify the source of epidemic geographicallyHospitals wish to identify the source of epidemic geographicallyClustering of patients’ medical records with geographical movement
Integration of databases is difficult due to...gprivacy preservationlegal constraints
5
How can we do clustering without sharing private information?
![Page 6: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/6.jpg)
Scenario 2: Bankruptcy Prediction
Bank A Bank B
enterprisesBankruptcy prediction
Enterprises have accounts of banks k i h di h b bili f b k i h j iBanks wish to predict the probability of bankruptcy with joint
transactionsIntegration of transactions is difficult due to...
confidentialitylegal constraints
H b k di h b k i h h i fid i lHow can banks predict the bankruptcy without sharing confidential information?
6
![Page 7: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/7.jpg)
Trusted Third Party
TrentxA xB
Trent
Alice BobyA yB
Alice Bob
Private input xA Private input xB
Introduce a trusted third party (TTP): Trent
Private input xA Private input xB
p y ( )Trent processes any specified computationTrent is always faithful to the specified protocol
7
![Page 8: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/8.jpg)
Trusted Third PartyTTP is good, but need to facilitate an authorityQuestion: Can we do computation in a standard setting?
No authorityRegular network (e.g., TCP/IP)
8
![Page 9: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/9.jpg)
Secure Multiparty ComputationYao’s secure two-party computation [Yao86]
Assumption: parties are semi-honestDo not deviate form the protocolAttempt to learn extra information from the message transcriptstranscripts
Yao’s protocolAny computation can be made privateAny computation can be made privateComplexity is polynomially bounded by the size of the circuit that evaluates the specified function
9
![Page 10: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/10.jpg)
Secure Multiparty ComputationIs Yao the final answer?
Yao is good but too costly, particularly whenLarge input, (xA,xB)Large circuit (complex computation)
Elapsed exec time (sec) in LAN
Elapsed exec time (sec) in WAN
Bit-and operation 0.41 2.57
Billionaires (comparison of two 32bit integers ) 1.25 4.01
Key database search (search an item from 16 items with 6bit key) 0.49 3.38
M diMedian(find median form 10 16bit integers) 7.09 16.63
Malkhi, D. et al, Fairplay - a secure two-party computation system,
10
a , a , a p ay a u o pa y o pu a o y ,Proc. of the 13th USENIX Security Symposium, 287̶302, 2004
![Page 11: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/11.jpg)
Two approaches for PPDMRandomization approach [Agrawal et al. SIGMOD2000]
Add random perturbation to original dataApply data mining algorithmRemove the perturbation effect (maximum likelihood)p ( )
Cryptographic approach [Lindel et al. CRYPTO2000]Composition of security protocols, including YaoComposition of security protocols, including Yao
Mostly Yao is used for a small portion of the entire computation
11
![Page 12: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/12.jpg)
Comparison of approaches
accuracy comp cost authority generality privacyaccuracy comp. cost authority generality privacy
TTP perfect small required general provably secure
SMC (Yao) perfect large not required
general provably secure
Random approx. small not limited statistically required secure
Crypto perfect medium not required
limited provably secure
12
![Page 13: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/13.jpg)
Cryptographic approach
13
![Page 14: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/14.jpg)
Privacy-preservation in distributed computation
AlicePrivate input xA
BobPrivate input xB
{MB{Output yA Output yB
ll l d
Messages: MB
MB : all message transcripts Alice received
14
![Page 15: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/15.jpg)
Privacy-preservation in distributed computation
AlicePrivate input xA
BobPrivate input xB
Simulator of Bobinput xA, yA, yB
{MB{Output yA Output yB
ll l d
Messages: MBSimulated messages: M’B
MB : all message transcripts Alice receivedMB’: messages generated by simulator S given only Alice’s input and the protocol output
15
![Page 16: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/16.jpg)
Privacy-preservation in distributed computation
AlicePrivate input xA
BobPrivate input xB
Simulator of Bobinput xA, yA, yB
{MB{Indistinguishable? Output xA Output xB
ll l d
Messages: MBSimulated messages: M’B
MB : all message transcripts Alice receivedMB’: messages generated by simulator S given only Alice’s input and the protocol outputPrivacy-preservation
The protocol does not reveal unnecessary information of Bob if MB and MB’ are indistinguishableMB are indistinguishable
16
![Page 17: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/17.jpg)
Security protocols as building blocksSecret SharingOblivious TransferSecure function evaluation (=Yao’s protocol)Homomorphic public-key cryptosystemp p y yp yOblivious polynomial evaluationSecure set intersection, etc…Secure set intersection, etc
17
![Page 18: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/18.jpg)
Secure Function Evaluation
Alice’ input: xA Bob’s input: xBEvaluation of f by
SFE
Alice’s output: yB Bob’s output: yB
Secure function evaluation ( including Yao’s protocol) Function: For any f, SFE enables private evaluation ofy , p
How does SFE works? (intuitively)Convert function f to a circuitUse secure computation of “and” and “or” (garbled circuit)Use secure computation of and and or (garbled circuit)
ExamplePrivate comparison: Which is greater xA or xB?Private matching: Is element xA included in list xB?
18
![Page 19: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/19.jpg)
Homomorphic Public-key CryptosystemLet m ∊ ZN be a message and r ∊ ZN be a random numberLet (pk, sk) be a pair of public and secret key. Then,
Encryption:Decryption:
Let m0,m1, r1,r2 ∊ ZN
Homomorphic cryptosystem allows:Addition of encrypted values
Multiplication of encrypted value and a plain valueMultiplication of encrypted value and a plain value
e.g. Paillier cryptosystem[Pai00]
19
![Page 20: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/20.jpg)
Example: private computation of ax+y Bob has y, aAlice has x
Problem: compute random shares of ax+y p y
20
![Page 21: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/21.jpg)
Example: private computation of ax+y Bob has y, aAlice has x
Problem: compute random shares of ax+y
Key pair
p y
Key pair
21
![Page 22: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/22.jpg)
Example: private computation of ax+y Bob has y, aAlice has x
Problem: compute random shares of ax+y
Key pair
p y
Public keyKey pair Public key
22
![Page 23: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/23.jpg)
Example: private computation of ax+y Bob has y, aAlice has x
Problem: compute random shares of ax+y
Key pair
p y
Public keyKey pair Public key
Generate a random numberGenerate a random number
23
![Page 24: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/24.jpg)
Example: private computation of ax+y Bob has y, aAlice has x
Problem: compute random shares of ax+y
Key pair
p y
Public keyKey pair Public key
Generate a random numberGenerate a random number
24
![Page 25: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/25.jpg)
Example: private computation of ax+y Bob has y, aAlice has x
Problem: compute random shares of ax+y
Key pair
p y
Public keyKey pair Public key
Generate a random numberGenerate a random number
25
![Page 26: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/26.jpg)
Example: private computation of ax+y Bob has y, aAlice has x
Problem: compute random shares of ax+y
Key pair
p y
Public keyKey pair Public key
Generate a random numberGenerate a random number
26
![Page 27: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/27.jpg)
Example: private computation of ax+y Bob has y, aAlice has x
Problem: compute random shares of ax+y
Key pair
p y
Public keyKey pair Public key
Generate a random numberGenerate a random number
27
![Page 28: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/28.jpg)
Example: private comparison of scalar productBobAlice
Problem: which is greater ?Problem: which is greater ?
28
![Page 29: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/29.jpg)
Example: private comparison of scalar productBobAlice
Problem: which is greater ?
Key pair
Problem: which is greater ?
Public keyKey pair Public key
Generate a random number rBGenerate a random number rB
rA and rB are random shares of
Comparison by SFErA rB
C i lt29
void Comparison result
![Page 30: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/30.jpg)
PPDM for k-means
30
![Page 31: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/31.jpg)
Book shop & music store example
XA XB
Bookshop Musicstore
A B
Customers…… CustomersUsers deposit their personal data to organizationsBased on customers’ personal records:Based on customers personal records:
Bookshop offers us a recommendation “Customers buy this book with…”Music store offers us a recommendation “Customers buy this music with…”y
Can we know “Customers who buy book A loves music B” without sharing our private information?
31
![Page 32: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/32.jpg)
Book shop & music store example
XA XB
Bookshop Musicstore
A BPPDM
Customers…… CustomersServer-centric privacy-preservation
If the book shop and the music store reach agreement of PPDM it would beIf the book shop and the music store reach agreement of PPDM, it would be possible If not, collaboration between the book shop and the music store may not happenhappen
We cannot exploit our private information by ourselvesHow can we privately manage our data by ourselves?
32
p y g y
![Page 33: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/33.jpg)
User-centric Privacy Preservation
XA XB
Bookshop Musicstore
PPDM
A B
……PPDM
Users who agree with PPDM
User-Centric Privacy-preservationRun the protocol among users who agree with PPDM
The challengeThe challengeScalability: the number of parties would be hundreds or more
Our solution: peer-to-peer network
33
![Page 34: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/34.jpg)
P2P K-means clusteringalternate iterations of... 1. Update cluster center to the mean vector (global)
x2
x1
2. Update cluster label to the nearest cluster center (local)
1
p ( )
34
![Page 35: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/35.jpg)
Problem statementNode Pj has a datapoint xj and cluster label zijProblem 1:
Compute cluster center without sharing xi and zij among nodes
Use gossip-based aggregation and homomorphic cryptographyProblem 2:
Compute cluster label without sharing xj and zij among nodes
Use Yao (private comparison) and homomorphic cryptographyUse Yao (private comparison) and homomorphic cryptography
35
![Page 36: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/36.jpg)
Gossip-based aggregation[Kowalczyk05]h hAsynchronous averaging without
central serverNode Pj owns data xj μ1← (μ1+ μ7)/2Node Pj owns data xj
InitializationXn 82
1 μ8← (μ4+ μ8)/2
μ1 (μ1 μ7)/
Contact with a node chosen uniform randomly;
Xn 8
73
2μ7← (μ1+ μ7)/2
Update the local estimate as the average of two estimates
7
64
Convergence:
5μ4← (μ4+ μ8)/2
Convergence:
36
![Page 37: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/37.jpg)
Privacy-preserving Gossip-based aggregationk dServer generates a key pair and
distributes the public keyNode Pj owns data xj
SKey pair (pk, sk)Node Pj owns data xj
InitializationXn 82
1pk
pk
Contact with a node chosen uniform randomly;
Xn 8
73
2pk
pk
7
64pk
pk
5pk
pk
37
![Page 38: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/38.jpg)
Cryptographic Gossip-based aggregation
Update in regular gossip based aggregation
38
![Page 39: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/39.jpg)
Cryptographic Gossip-based aggregation
Update in gossip-based aggregation
Update without division
(converges to )
39
![Page 40: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/40.jpg)
Cryptographic Gossip-based aggregation
Update
Asynchronous update without division
tj : the number of messagingConvergence to
40
![Page 41: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/41.jpg)
Cryptographic Gossip-based aggregation
Update
Asynchronous update without division
Cryptographic extension
Convergence:Convergence:
41
![Page 42: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/42.jpg)
Cryptographic Gossip-based aggregation
Update
Asynchronous update without division
Cryptographic extension
Cluster centers can be estimated privately!
42
Cluster centers can be estimated privately!
![Page 43: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/43.jpg)
Private Cluster Label DeterminationNode jServer : key pair (pk,sk)
Problem: which is greater
rA and rB are random shares of
Comparison by SFErA rB
Comparison by SFE
void The nearest cluster center
43
![Page 44: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/44.jpg)
The bottleneck of label determination
All nodes need to run the protocol between the server
821
73 S
65
45
Bottleneck!
44
![Page 45: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/45.jpg)
Distributed private update of cluster center Nodes Rj and Bj are called “pair”Enc(μi) are propagated through the binary tree such that Enc(μi) of Bj can be decrypted only by Rj’s secret keycan be decrypted only by Rj s secret key
SS B0 Enc(μi)
R1 R2B1 B2 Enc(μi) Enc(μi)
R3 R4 R5 R6B3 B4 B5 B6Enc(μ )Enc(μ )Enc(μ )Enc(μ ) Enc(μi) Enc(μi) Enc(μi) Enc(μi)
45
![Page 46: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/46.jpg)
Distributed private update of cluster center Private comparison is processed between “paired” red node Rj and blue node Bj
Private i
SS B1 Enc(μi) comparison
Private iPrivate
R1 R2B2 B3 Enc(μi) Enc(μi)
comparisonPrivate comparison
R3 R4 R5 R6B4 B5 B6 B7Enc(μ )Enc(μ )Enc(μ )Enc(μ )
Private comparison
Private comparison
Private comparisonPrivate
comparison
Enc(μi) Enc(μi) Enc(μi) Enc(μi)
46
![Page 47: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/47.jpg)
Privacy-preserving P2P k-means clustering Computational Complexity
w/o decentralization with decentralization/Private cluster center update O(dkT) ---
Private cluster label update O(dkn) O(dklogn)
Total complexity per iteration: O(dkT+dklogn)T: maximum cycle of gossip-based aggregationn: number of nodes
47
![Page 48: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/48.jpg)
Experiments (private cluster center update)
Computational time# of nodes n=10, ..., 1,000,000Dimension d=1maximum cycle of gossip-based aggreagation T=20,40Paillier cryptosystem [Pai00]Key-length 512bit, 1024bit
48
![Page 49: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/49.jpg)
Experiments (private cluster labal update)
Computational time# of nodes n=10 1 000 000# of nodes n 10, ..., 1,000,000Dimension d=2,4,8# of clusters k=2o c uste sPaillier cryptosystem [Paillier00]Key-length 512bit, 1024bity g ,
49
![Page 50: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/50.jpg)
Experiments (k-means)Computational time per one step of k-means
#of #of #of nodes Time (sec) Time (sec)#of dim
#of clusters
#of nodes Time (sec)Cluster center
Time (sec)Cluster label
Small scale 2 2 1,000 180 660
Large scale 4 4 1,000,000 740 9,100
Small scale: about 13(min) per one step
g , , ,
Small scale: about 13(min) per one stepLarge scale: about 2.7 (hour) per one stepNot very efficient but can be terminated in practical timeNot very efficient, but can be terminated in practical time
50
![Page 51: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/51.jpg)
Privacy-preserving Reinforcement Learning
51
![Page 52: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/52.jpg)
Distributed Reinforcement Learning
Environmentstate sA, reward rA state sB, reward rB
Alice Bobaction aA action aB
Existing approaches for distributed reinforcement learning
(sA, rA , aA) (sB, rB , aB)
Existing approaches for distributed reinforcement learningDistributed Value Function [Schneider99]Policy gradient approach [Peshkin00][Moallemi03][Bagnell05]
The main focusManage huge state-action spaceLi it th i tiLimit the communication
52
![Page 53: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/53.jpg)
DRL from private perceptions
Environmentstate sA, reward rA state sB, reward rB
Aliceaction aA action aB
DRL from private perceptions
(sA, rA , aA) (sB, rB , aB)
p p pAgents have sufficient computation resources and communication bandwidthTheir perceptions are desired to be kept private
Ali d t i h t l ( A A A) t B bAlice does not wish to reveal (sA, rA , aA) to BobBob does not wish to reveal (sB, rB , aB) to Alice, either
In the end, Alice and Bob wish to learn the optimal policy collaboratively
53
![Page 54: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/54.jpg)
Motivating application: Load balancing
Order Shipment Order Shipment
d iRedirection when heavily loaded
ProductionProduction
A load balancing among competing factoriesObtain a reward by processing a job, butF t i d t di t j b t th th f t h h il l d dFactories may need to redirect jobs to the other factory when heavily loaded
Large penalty for overflowSmall penalty for redirectionp y
When should factories redirect jobs to the other factory?
54Jun Sakuma
![Page 55: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/55.jpg)
Motivating application: Load balancing
Order Shipment Order Shipment
d i
If two factories are competing
Redirection when heavily loadedProductionProduction
If two factories are competing…The frequency of orders and the speed of production is private (private model)The backlog is private (private state observation)The backlog is private (private state observation)The profit is private (private reward)
Privacy-preserving Reinforcement LearningStates, actions, and rewards are not sharedBut the learned policy is shared in the end
55Jun Sakuma
![Page 56: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/56.jpg)
Are existing RLs privacy-preserving?
environmentCentralized RL (CRL)
Distributed RL (DRL), [Schneider99][Ng05]environment
Agent Agent Agent AgentAgent Agent Agent Agent
LeaderAgent Leader agent learns
g g g g
Each distributed agent shares partial observation and learns
Independent DRL (IDRL) Optimality Privacyenvironment CRL optimal disclosed
DRL medium partly disclosedIDRL bad Preserved
environment
Agent Agent Agent Agent
Each agent learns independently
IDRL bad PreservedPPRL optimal preserved
Target: achievement of privacy preservationindependently Target: achievement of privacy preservation without sacrificing the optimality
56Jun Sakuma
![Page 57: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/57.jpg)
Step 3: Private Update of Q-values
×K, ×L
EncryptionEncryption
The agent can update c(s,a) without knowledge of Q(s,a)!
57Jun Sakuma
![Page 58: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/58.jpg)
Experiment: Load balancing among factoriesJob is assigned w.p. pin
SA=5 SB=2
aA=“redirect”
aB=“no redirect”
Setting
Job is processed w.p. pout
State space: SA,SB∊{0,1,…,5}Action space: AA,AB∊{redirect, no redirect}Reward:
Regular RL/PPRL
Reward:Cost for backlog : rA= 50-(sA)2Cost for redirection: rA = rA -2
DRL(rewards are shared)
IDRL(no sharing)Cost for overflow: rA=0
Reward rB is set similarlySystem reward: rt= rtA + rtB
IDRL(no sharing)
58Jun Sakuma
![Page 59: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/59.jpg)
Experiment: Load balancing among factoriesJob is assigned w.p. pin
SA=5 SB=2
aA=“redirect”
aB=“no redirect”
Comparison
Job is processed w.p. pout
Comparison
detail comp(sec)* profit privacyCRL All information shared 5.11 90.0 Disclosed all
* Java 1.5.0+Fairplay, 1.2 GHz Core solo
DRL Rewards are shared 5.24 87.4 Partially disclosedIDRL No sharing 5.81 84.2 PerfectPPRL S it t l 8 85*105 90 0 P f tPPRL Security protocol 8.85*105 90.0 PerfectRL/SFE SFE[Yao86] >7.00*107 90.0 Perfect
59Jun Sakuma
![Page 60: Privacy-preserving Data Mining and Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022052412/5589de59d8b42a7e778b45db/html5/thumbnails/60.jpg)
Future directionMany privacy-preserving version of data mining and machine learning have been presented
Classifier: Decision tree, naïve Bayes, support vector machine, k-nearest neighborClustering: k-means, EM for mixture models, DBSCANMachine learning: Linear regression, Bayesian network construction(k2), brief propagation, boosting, reinforcement learning
Next stepConnection with personalized services
To increase the degree of personalization, more sensitive information will beTo increase the degree of personalization, more sensitive information will be required
Connection with network/graph mininglinks in network are intrinsically privatelinks in network are intrinsically privateE.g., Personal / business network, Communication graph
Connection with ubiquitous appliances and environment: mining with…C ll h hi l tCell phone + geographical movementRFID attached to personal items
Application: Planned in information grand voyage project
60