Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data...
Transcript of Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data...
![Page 1: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/1.jpg)
Privately Evaluating Decision
Trees and Random Forests
David J. Wu, Tony Feng, Michael Naehrig, and Kristin Lauter
July, 2016
![Page 2: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/2.jpg)
Machine Learning as a Service
Big Data + Machine Learning = New Applications
patient profile and
symptoms
recommended
treatment plan
![Page 3: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/3.jpg)
Machine Learning as a Service
adversary that compromises cloud
service learns patient profile
Big Data + Machine Learning = New Risks
![Page 4: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/4.jpg)
Machine Learning as a Service
malicious client might recover
information about the model
Big Data + Machine Learning = New Risks
![Page 5: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/5.jpg)
Our Work: Decision Trees
N
Y N
�� ≤ 5 �� > 5
�� ≤ 2 �� > 2
• Nonlinear models for
regression or classification
• Consists of a series of
decision variables (tests on
the feature vector)
• Evaluation corresponds to
tree traversal
internal nodes or
decision nodes
leaf nodes
Input: feature vector [��, … , ��]
![Page 6: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/6.jpg)
Fully Private Decision Tree Evaluation
input: feature vector �
Only learn �(�) and minimal
information about � (e.g.,
bound on size of tree)
Learns nothing
about �
input: decision tree �
![Page 7: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/7.jpg)
Fully Private Decision Tree Evaluation
input: feature vector �
input: decision tree �
Focus on model evaluation –
assume server already has model
![Page 8: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/8.jpg)
Comparison of Approaches
Bandwidth
Computation
Generic methods
based on Yao 2PC
[Yao82, LP09]
Not drawn to scale
Generic methods
based on SWHE
[BPTG15]
Our protocol
![Page 9: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/9.jpg)
Comparison of Approaches
Bandwidth
Computation
Generic methods
based on Yao 2PC
[Yao82, LP09]
Not drawn to scale
Generic methods
based on SWHE
[BPTG15]
Our protocol
Slightly more
computation, but
much smaller
bandwidth
![Page 10: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/10.jpg)
Protocol Building Blocks: Comparisons
N
Y N
�� ≤ 5 �� > 5
�� ≤ 2 �� > 2
Require protocol to
compare components of
client’s feature vector with
thresholds
![Page 11: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/11.jpg)
Comparison Protocol [DGK07, BPGT15]
client input: � server input: �
comparison protocol
Learns
� � < �Learns nothing
![Page 12: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/12.jpg)
Private Decision Tree Evaluation
Suppose client knows ��,
��, and the structure of the
tree
Then, client can compute
the index of the outcome
��
�� ��
�� = 0 �� = 1
�� = 0 �� = 1
��
��
![Page 13: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/13.jpg)
Private Decision Tree Evaluation
Suppose client knows the
index of the outcome
Problem reduces to
oblivious transfer: treat
leaves as database, client
knows index
��
�� ��
��
��
![Page 14: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/14.jpg)
Oblivious Transfer (OT) [Kil88, NP99, NP01]
client input:
index �
server input:
database ��, … , ��
oblivious transfer
protocol
Learns �� Learns nothing
![Page 15: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/15.jpg)
Private Decision Tree Evaluation
Suppose client knows the
index of the outcome
Problem reduces to
oblivious transfer: treat
leaves as database, client
knows index
��
�� ��
��
��
�� �� ��leaves become
OT database
![Page 16: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/16.jpg)
Private Decision Tree Evaluation
1. Client obtains ��, ��using comparison
protocol
2. Client uses OT to
retrieve classification
value
��
�� ��
�� = 0 �� = 1
�� = 0 �� = 1
��
��
Problem: Requires client to learn/know structure of the
tree
![Page 17: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/17.jpg)
Hiding the Structure
1. Padding: Insert “dummy” nodes to obtain
complete tree
��
�� ��
�� = 0 �� = 1
�� = 0 �� = 1
��
��
�� ��
�� = 0 �� = 1
�� = 0 �� = 1
��
��
�� ��
�� = 0 �� = 1
��
![Page 18: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/18.jpg)
Hiding the Structure
2. Randomization: Randomly flip decision variables:
��� ≔ 1− ��
�� ��
�� = 0 �� = 1
�� = 0 �� = 1
��
��
�� ��
�� = 0 �� = 1
��
�� ��
�� = 0 �� = 1
��
��� = 0 ��� = 1
���
�� ��
�� = 0 �� = 1
��
![Page 19: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/19.jpg)
Private Decision Tree Evaluation
1. Server: Pad and permute the decision tree
2. Server & Client: Comparison protocol to compute ��in permuted tree
3. Client: Compute the index � of the leaf node
4. Client & Server: Engage in OT to obtain ��
Theorem. This protocol is secure against semi-honest
adversaries.
![Page 20: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/20.jpg)
Further Extensions
evaluating random forests without
revealing individual classifications
Ensuring security against malicious adversaries
See paper for details!
![Page 21: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/21.jpg)
Experiments
Implemented private decision tree + random forest protocol
Benchmarks taken between a laptop client and an EC2 server
![Page 22: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/22.jpg)
Decision Tree Evaluation on ECG Data
Security LevelComputation (s)
Bandwidth (KB)Client Server
[BFK+09] 80 2.609 6.260 112.2
[BPGT14] 80 2.297 1.723 3555
Generic 2PC
(Estimated)128 - - ≥ 180.5
This work 128 0.091 0.188 101.9
Experimental Parameters:
• Data Dimension: 6
• Depth of Decision Tree: 4
• Number of Comparisons: 6
![Page 23: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/23.jpg)
Decision Tree Evaluation on ECG Data
Security LevelComputation (s)
Bandwidth (KB)Client Server
[BFK+09] 80 2.609 6.260 112.2
[BPGT14] 80 2.297 1.723 3555
Generic 2PC
(Estimated)128 - - ≥ 180.5
This work 128 0.091 0.188 101.9
Experimental Parameters:
• Data Dimension: 6
• Depth of Decision Tree: 4
• Number of Comparisons: 6
10x faster than previous
protocols
![Page 24: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/24.jpg)
Performance for Complete Decision Trees
![Page 25: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/25.jpg)
Conclusions
Simple protocols for decision tree evaluation in both semi-
honest (and malicious) setting
Semi-honest (and malicious-secure) decision tree protocols
provide new computation/communication tradeoffs
![Page 26: Privately Evaluating Decision Trees and Random Forests · Decision Tree Evaluation on ECG Data Security Level Computation(s) Bandwidth (KB) Client Server [BFK +09] 80 2.609 6.260](https://reader034.fdocuments.net/reader034/viewer/2022042219/5ec55abeee8fe70feb538630/html5/thumbnails/26.jpg)
Thanks!
http://eprint.iacr.org/2015/386