Sequence-Aware Privacy Preserving Data-Leak Detection

24
Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Sequence-Aware Privacy Preserving Data-Leak Detection Xiaokui Shu 11/29/2011

description

Sequence-Aware Privacy Preserving Data-Leak Detection. Xiaokui Shu 11/29/2011. Content. Applications of Privacy Preserving Data-Leak Detection (PDLD) Challenges and our schema Sequence-aware PDLD (SPDLD) Implementation & evaluation. Application :: Outsourced Security Service. Internet. - PowerPoint PPT Presentation

Transcript of Sequence-Aware Privacy Preserving Data-Leak Detection

Page 1: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

Sequence-Aware PrivacyPreserving Data-Leak Detection

Xiaokui Shu11/29/2011

Page 2: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

2Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Applications of Privacy Preserving Data-Leak Detection (PDLD)

Challenges and our schema Sequence-aware PDLD (SPDLD) Implementation & evaluation

Content

Page 3: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

3Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Application :: Outsourced Security Service

Service Provider• Professional solution• Value-added service provider (VASP)• Semi-honest: honest, but curiousCustomer• Zero knowledge required• Better business concentration

Outsourced Security Service

Internet

Guest

Customer’s network

Page 4: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

4Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Application :: Introspective Security Service

Sensitive Data Owner• Knowledge of all sensitive data• Distribute sensitive data fingerprints

to endpointsDLD Endpoint• Inside or outside the intranet• Being monitored to be data-leak-free

Internet

Intranet Endpoint Sensitive Data Owner

VPN Endpoint

Normal Endpoint

Page 5: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

5Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Accuracy decrease both false positive and false negative

Privacy minimize the DLD executor’s knowledge of the sensitive data

Efficiency real-time processing of the traffic in PCs as well as through network gateways

Robustness the ability to handle modified leaked data, or variants

Challenges

Page 6: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

6Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Challenges

• A computer, called a [[router]], is provided with an interface to each network. It forwards [[packet (information technology)|packets]] back and forth between them.<ref>RFC 1812

Sensitive Data

• A computer, called a [[router]], is provided with an interface to each network. It forwards [[packet (information technology)|packets]] back and forth between them.&amp;amp;lt;ref&amp;amp;gt;RFC 1812

Filtered pkt payload (HTML display)

• A+computer%2C+called+a+%5B%5Brouter%5D%5D%2C+is+provided+with+an +interface+to+each+network.+It+forwards+%5B%5Bpacket+%28information +technology%29%7Cpackets%5D%5D+back+and+forth+between+them. %26lt%3Bref%26gt%3BRFC+1812

Filtered pkt payload (WordPress posting)

Page 7: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

7Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Robustness extract local features to represent the sensitive data

Accuracy take into account features of the sensitive data as well as the relationship among features

Privacy hash/fingerprinting values, samples Efficiency sample both sensitive data and

network traffic to improve performance

SPDLD Schema

Page 8: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

8Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Fingerprint Tape II

Shingling

Fingerprinting

Sampling

SPDLD :: Whole View

Sensitive Data

Alignment

Network Traffic

Shingling

Fingerprinting

Sampling

Data Owner

DLD Executor

Fingerprint Tape I

Page 9: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

9Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

SPDLD :: Basic Alignment w/o Sampling

between them.<ref>RFC 1812

[“bet”, “etw”, “twe”, “wee”, “een”, “en ”, “n t”, “ th”, “the”, “htm”…]

[78722441, 3408810406, 241089130, 1961653472, 4238137974, 2383179562, 2158327725, 813110136, 3271865588, 1833769119 ...]

between+them.%26lt%3Bref%26gt%3BRFC+1812

[“bet”, “etw”, “twe”, “wee”, “een”, “en+”, “n+t”, “+th”, “the”, “htm”…]

[78722441, 3408810406, 241089130, 1961653472, 4238137974, 434026146, 1666297446, 1008925849, 3271865588, 1833769119...]

AlignmentResult

Page 10: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

10Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

SPDLD :: Flow Sampling Requirement

ABCDEFGHIJKLMNOPQ

CDEFGHIJKLMNOPQRS

…FJM…

…FJM…

No matter where we start,

We should always have the same sample for an identical segment.

Page 11: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

11Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

SPLDL :: Punching FingerprintTape

……

Fingerprint

Minimum fingerprint in the window

Sliding window

Punched fingerprint in FingerprintTape

……

… …

Quasi-gap encoded in FingerprintTape

FP Flow

FPTape

Page 12: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

12Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

SPLDL :: Advanced FingerprintTape

Quasi-gap encoding/decoding Start flags bound for each FingerprintTape Start position recorded

Page 13: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

13Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

SPLDL :: Alignment

Needleman-Wunsch Algorithm Dynamic programming Gap penalty

Unit comparison function replaced to expand quasi-gap

Implementation optimized for Python using 1D array and multiple iterators

Page 14: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

14Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Implementation & Evaluation

Implementation Environment Python 2.7

Sensitive data One paragraph from the source of TCP/IP

wikipedia page Leaked network traffic

Whole source of TCP/IP wikipedia page MediaWiki & WordPress

Page 15: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

15Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Implementation & Evaluation

Parameters of the system 3-byte shingles 64 bit Rabin’s fingerprint Window size: 100 Number of minima: 5 Unit score in alignment

Match: 12, Mismatch: -1, Gap: -4

Page 16: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

16Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Implementation & Evaluation :: Speed

My optimization of Needleman–Wunsch algorithm achieves 2.5 times speed as the naive (my previous) implementation

Comparison of set intersection, basic alignment, FingerprintTape

Mediawiki WordPress

set intersection 0.090 0.200

basic alignment 142.390 218.368

FingerprintTape 4.839 7.077

Page 17: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

17Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

1 2 3 4 5 6 7 8 9 101112131415161718192021220%

10%20%30%40%50%60%70%80%90%

100%

MediaWiki

Set intersection FingerprintTape

Packet #

Sens

itivi

ty (o

f a p

acke

t)

Implementation & Evaluation :: Accuracy

Page 18: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

18Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

1 3 5 7 9 11 13 15 17 19 21 23 250%

10%

20%

30%

40%

50%

60%

WordPress

Set intersection FingerprintTape

Packet #

Sens

itivi

ty (o

f a p

acke

t)

Implementation & Evaluation :: Accuracy

Page 19: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

19

Thank you!

Page 20: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

20Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Background :: Shingling & Fingerprinting

shingling

hashing

Page 21: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

21Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Evolution of pattern matching in NIDS

Background :: Automation-based RE Matching

Boyer–Moore

Aho–CorasickMulti-pattern search

Regular Expression Support

AutomationsDFA

NFA

D2FACD2FA

Page 22: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

22Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Needleman-WunschDialign

Background :: List Alignment

Page 23: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

23Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

fingerprints

shingles

SPDLD :: Shingling & Fingerprinting

SENSITIVE INFO

SENSITIVENSITIVE

NSITIVE SITIVE I

ITIVE INTIVE INF

IVE INFO

658955452785

123587754812

458763885621

645853

shingling

fingerprinting

Page 24: Sequence-Aware Privacy Preserving Data-Leak Detection

Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Click to edit Master title style

24Fall, 2011 - Privacy&Security - Virginia Tech – Computer Science

Sequence-Aware PP-DLD

set

list

flow