Sequence-Aware Privacy Preserving Data-Leak Detection
description
Transcript of Sequence-Aware Privacy Preserving Data-Leak Detection
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
Sequence-Aware PrivacyPreserving Data-Leak Detection
Xiaokui Shu11/29/2011
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
2Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Applications of Privacy Preserving Data-Leak Detection (PDLD)
Challenges and our schema Sequence-aware PDLD (SPDLD) Implementation & evaluation
Content
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
3Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Application :: Outsourced Security Service
Service Provider• Professional solution• Value-added service provider (VASP)• Semi-honest: honest, but curiousCustomer• Zero knowledge required• Better business concentration
Outsourced Security Service
Internet
Guest
Customer’s network
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
4Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Application :: Introspective Security Service
Sensitive Data Owner• Knowledge of all sensitive data• Distribute sensitive data fingerprints
to endpointsDLD Endpoint• Inside or outside the intranet• Being monitored to be data-leak-free
Internet
Intranet Endpoint Sensitive Data Owner
VPN Endpoint
Normal Endpoint
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
5Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Accuracy decrease both false positive and false negative
Privacy minimize the DLD executor’s knowledge of the sensitive data
Efficiency real-time processing of the traffic in PCs as well as through network gateways
Robustness the ability to handle modified leaked data, or variants
Challenges
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
6Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Challenges
• A computer, called a [[router]], is provided with an interface to each network. It forwards [[packet (information technology)|packets]] back and forth between them.<ref>RFC 1812
Sensitive Data
• A computer, called a [[router]], is provided with an interface to each network. It forwards [[packet (information technology)|packets]] back and forth between them.&amp;lt;ref&amp;gt;RFC 1812
Filtered pkt payload (HTML display)
• A+computer%2C+called+a+%5B%5Brouter%5D%5D%2C+is+provided+with+an +interface+to+each+network.+It+forwards+%5B%5Bpacket+%28information +technology%29%7Cpackets%5D%5D+back+and+forth+between+them. %26lt%3Bref%26gt%3BRFC+1812
Filtered pkt payload (WordPress posting)
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
7Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Robustness extract local features to represent the sensitive data
Accuracy take into account features of the sensitive data as well as the relationship among features
Privacy hash/fingerprinting values, samples Efficiency sample both sensitive data and
network traffic to improve performance
SPDLD Schema
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
8Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Fingerprint Tape II
Shingling
Fingerprinting
Sampling
SPDLD :: Whole View
Sensitive Data
Alignment
Network Traffic
Shingling
Fingerprinting
Sampling
Data Owner
DLD Executor
Fingerprint Tape I
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
9Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
SPDLD :: Basic Alignment w/o Sampling
between them.<ref>RFC 1812
[“bet”, “etw”, “twe”, “wee”, “een”, “en ”, “n t”, “ th”, “the”, “htm”…]
[78722441, 3408810406, 241089130, 1961653472, 4238137974, 2383179562, 2158327725, 813110136, 3271865588, 1833769119 ...]
between+them.%26lt%3Bref%26gt%3BRFC+1812
[“bet”, “etw”, “twe”, “wee”, “een”, “en+”, “n+t”, “+th”, “the”, “htm”…]
[78722441, 3408810406, 241089130, 1961653472, 4238137974, 434026146, 1666297446, 1008925849, 3271865588, 1833769119...]
…
…
AlignmentResult
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
10Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
SPDLD :: Flow Sampling Requirement
ABCDEFGHIJKLMNOPQ
CDEFGHIJKLMNOPQRS
…FJM…
…FJM…
No matter where we start,
We should always have the same sample for an identical segment.
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
11Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
SPLDL :: Punching FingerprintTape
……
Fingerprint
Minimum fingerprint in the window
Sliding window
Punched fingerprint in FingerprintTape
……
… …
Quasi-gap encoded in FingerprintTape
FP Flow
FPTape
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
12Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
SPLDL :: Advanced FingerprintTape
Quasi-gap encoding/decoding Start flags bound for each FingerprintTape Start position recorded
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
13Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
SPLDL :: Alignment
Needleman-Wunsch Algorithm Dynamic programming Gap penalty
Unit comparison function replaced to expand quasi-gap
Implementation optimized for Python using 1D array and multiple iterators
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
14Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Implementation & Evaluation
Implementation Environment Python 2.7
Sensitive data One paragraph from the source of TCP/IP
wikipedia page Leaked network traffic
Whole source of TCP/IP wikipedia page MediaWiki & WordPress
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
15Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Implementation & Evaluation
Parameters of the system 3-byte shingles 64 bit Rabin’s fingerprint Window size: 100 Number of minima: 5 Unit score in alignment
Match: 12, Mismatch: -1, Gap: -4
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
16Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Implementation & Evaluation :: Speed
My optimization of Needleman–Wunsch algorithm achieves 2.5 times speed as the naive (my previous) implementation
Comparison of set intersection, basic alignment, FingerprintTape
Mediawiki WordPress
set intersection 0.090 0.200
basic alignment 142.390 218.368
FingerprintTape 4.839 7.077
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
17Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
1 2 3 4 5 6 7 8 9 101112131415161718192021220%
10%20%30%40%50%60%70%80%90%
100%
MediaWiki
Set intersection FingerprintTape
Packet #
Sens
itivi
ty (o
f a p
acke
t)
Implementation & Evaluation :: Accuracy
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
18Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
1 3 5 7 9 11 13 15 17 19 21 23 250%
10%
20%
30%
40%
50%
60%
WordPress
Set intersection FingerprintTape
Packet #
Sens
itivi
ty (o
f a p
acke
t)
Implementation & Evaluation :: Accuracy
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
19
Thank you!
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
20Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Background :: Shingling & Fingerprinting
shingling
hashing
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
21Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Evolution of pattern matching in NIDS
Background :: Automation-based RE Matching
Boyer–Moore
Aho–CorasickMulti-pattern search
Regular Expression Support
AutomationsDFA
NFA
D2FACD2FA
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
22Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Needleman-WunschDialign
Background :: List Alignment
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
23Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
fingerprints
shingles
SPDLD :: Shingling & Fingerprinting
SENSITIVE INFO
SENSITIVENSITIVE
NSITIVE SITIVE I
ITIVE INTIVE INF
IVE INFO
658955452785
123587754812
458763885621
645853
shingling
fingerprinting
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Click to edit Master title style
24Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science
Sequence-Aware PP-DLD
set
list
flow