Inference Problem Privacy Preserving Data Mining.
-
Upload
andrew-robinson -
Category
Documents
-
view
228 -
download
0
Transcript of Inference Problem Privacy Preserving Data Mining.
Inference ProblemPrivacy Preserving
Data Mining
CSCE 522 - Farkas 2Lecture 19
Readings and Assignments
I. Moskowitz, M. H. Kang: Covert Channels – Here to Stay? http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.itd.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf
Jajodia, Meadows: Inference Problems in Multilevel Secure Database Management Systems http://www.acsac.org/secshelf/book001/book001.html, essay 24
CSCE 522 - Farkas 3Lecture 19
Indirect Information Flow Channels
Covert channels Inference channels
CSCE 522 - Farkas 4Lecture 19
Communication Channels Overt Channel: designed into a system and
documented in the user's manual Covert Channel: not documented. Covert
channels may be deliberately inserted into a system, but most such channels are accidents of the system design.
CSCE 522 - Farkas 5Lecture 19
Covert Channel Timing Channel: based on system times Storage channels: not time related
communication Can be turned into each other
CSCE 522 - Farkas 6Lecture 19
Inference Channels
+ Meta-data Sensitive Information
Non-sensitiveinformation =
CSCE 522 - Farkas 7Lecture 19
Inference Channels Statistical Database Inferences General Purpose Database Inferences
CSCE 522 - Farkas 8Lecture 19
Statistical Databases Goal: provide aggregate information about groups of
individuals E.g., average grade point of students
Security risk: specific information about a particular individual E.g., grade point of student John Smith
Meta-data: Working knowledge about the attributes Supplementary knowledge (not stored in database)
CSCE 522 - Farkas 9Lecture 19
Types of Statistics Macro-statistics: collections of related statistics presented in 2-
dimensional tables
Micro-statistics: Individual data records used for statistics after identifying information is removed
Sex\Year 1997 1998 Sum
Female 4 1 5
Male 6 13 19
Sum 10 14 24
Sex Course GPA Year
F CSCE 590 3.5 2000
M CSCE 590 3.0 2000
F CSCE 790 4.0 2001
CSCE 522 - Farkas 10Lecture 19
Statistical Compromise Exact compromise: find exact value of an
attribute of an individual (e.g., John Smith’s GPA is 3.8)
Partial compromise: find an estimate of an attribute value corresponding to an individual (e.g., John Smith’s GPA is between 3.5 and 4.0)
CSCE 522 - Farkas 11Lecture 19
Methods of Attacks and Protection Small/Large Query Set Attack
C: characteristic formula that identifies groups of individuals
If C identifies a single individual I, e.g., count(C) = 1 Find out existence of property
If count(C and D)=1 means I has property D If count(C and D)=0 means I does not have D
OR Find value of property
Sum(C, D), gives value of D
CSCE 522 - Farkas 12Lecture 19
Small/Large Query Set Attack cont.
Protection from small/large query set attack: query-set-size control
A query q(C) is permitted only if
N-n |C| n , where n 0 is a parameter of the database and N is all the records in the database
CSCE 522 - Farkas 13Lecture 19
Tracker attack
Tracker C
C1C2
C=C1 and C2T=C1 and ~C2
q(C)=q(C1) – q(T)
q(C) is disallowed
CSCE 522 - Farkas 14Lecture 19
Tracker attack
TrackerC
C1C2
C=C1 and C2T=C1 and ~C2
D
C and Dq(C and D)=q(T or C and D) – q(T)
q(C and D) is disallowed
CSCE 522 - Farkas 15Lecture 19
Query overlap attack
C1 C2
JohnKathy
Max
Fred
EvePaul
Mitch
Q(John)=q(C1)-q(C2)
Protection: query-overlap control
CSCE 522 - Farkas 16Lecture 19
Insertion/Deletion Attack Observing changes overtime
q1=q(C)
insert(i)q2=q(C)
q(i)=q2-q1
Protection: insertion/deletion performed as pairs
CSCE 522 - Farkas 17Lecture 19
Statistical Inference Theory Give unlimited number of statistics and correct
statistical answers, all statistical databases can be compromised (Ullman)
CSCE 522 - Farkas 18Lecture 19
Inferences in General-Purpose Databases Queries based on sensitive data Inference via database constraints Inferences via updates
CSCE 522 - Farkas 19Lecture 19
Queries based on sensitive data Sensitive information is used in selection
condition but not returned to the user. Example: Salary: secret, Name: public
NameSalary=$25,000
Protection: apply query of database views at different security levels
CSCE 522 - Farkas 20Lecture 19
Database Constraints Integrity constraints Database dependencies Key integrity
CSCE 522 - Farkas 21Lecture 19
Integrity Constraints C=A+B A=public, C=public, and B=secret B can be calculated from A and C, i.e., secret
information can be calculated from public data
CSCE 522 - Farkas 22Lecture 19
Database DependenciesMetadata: Functional dependencies Multi-valued dependencies Join dependencies etc.
CSCE 522 - Farkas 23Lecture 19
Functional Dependency FD: A B, that is for any two tuples in the relation, if
they have the same value for A, they must have the same value for B.
Example: FD: Rank Salary
Secret information: Name and Salary together Query1: Name and Rank Query2: Rank and Salary Combine answers for query1 and 2 to reveal Name and
Salary together
CSCE 522 - Farkas 24Lecture 19
Key integrity Every tuple in the relation have a unique key Users at different levels, see different versions
of the database Users might attempt to update data that is not
visible for them
CSCE 522 - Farkas 25Lecture 19
Example
Name (key) Salary Address
Black P 38,000 P Columbia S
Red S 42,000 S Irmo S
Secret View
Name (key) Salary Address
Black P 38,000 P Null P
Public View
CSCE 522 - Farkas 26Lecture 19
UpdatesPublic User:
Name (key) Salary Address
Black P 38,000 P Null P
1. Update Black’s address to Orlando2. Add new tuple: (Red, 22,000, Manassas)IfRefuse update: covert channelAllow update: • Overwrite high data – may be incorrect• Create new tuple – which data it correct
(polyinstantiation) – violate key constraints
CSCE 522 - Farkas 27Lecture 19
Updates
Name (key) Salary Address
Black P 38,000 P Columbia S
Red S 42,000 S Irmo S
Secret user:
1. Update Black’s salary to 45,000IfRefuse update: denial of serviceAllow update: • Overwrite low data – covert channel• Create new tuple – which data it correct
(polyinstantiation) – violate key constraints
CSCE 522 - Farkas 28Lecture 19
Inference Problem No general technique is available to solve the
problem Need assurance of protection Hard to incorporate outside knowledge
29
Web Evolution
Past: Human usage Static Web pages
(HTML, XML)
Present: Human & Automated usage Semantic Web, WS, SOA
Future: Mobile Computing
30
Web Data Security Access Control Models Heterogeneous Data: XMLXML, Stream, Text Limitations:
Syntax-basedSyntax-basedNo association protectionLimited handling of updates No data or application semantics No inference control
31
Secure XML Views - ExampleSecure XML Views - Example
<medicalFiles> UC <countyRec> S <patient> S <name>John Smith </name> UC <phone>111-2222</phone> S </patient> <physician>Jim Dale </physician> UC </countyRec> <milBaseRec> TS <patient> S <name>Harry Green</name> UC <phone>333-4444</phone> S </patient> <physician>Joe White </physician> UC <milTag>MT78</milTag> TS </milBaseRec></medicalFiles>
medicalFiles
countyRec
patient
nameJohn Smith
milBaseRec
physicianJim Dale
physicianJoe White
nameHarry Green
milTagMT78
patient
phone111-2222
phone333-4444
View over UC data
32
<medicalFiles> <countyRec> <patient> <name>John Smith</name> </patient> <physician>Jim Dale</physician> </countyRec> <milBaseRec> <patient> <name>Harry Green</name> </patient> <physician>Joe White</physician> </milBaseRec></medicalFiles>
medicalFiles
countyRec
patient
nameJohn Smith
milBaseRec
physicianJim Dale
physicianJoe White
nameHarry Green
patient
View over UC data
Secure XML Views - ExampleSecure XML Views - Example
33
medicalFiles
countyRec
patient
nameJohn Smith
milBaseRec
physicianJim Dale
physicianJoe White
nameHarry Green
patient
View over UC data
<medicalFiles> <tag01> <tag02> <name>John Smith</name> </tag02> <physician>Jim Dale</physician> </tag01> <tag03> <tag02> <name>Harry Green</name> </tag02> <physician>Joe White</physician> </tag03></medicalFiles>
Secure XML Views - ExampleSecure XML Views - Example
34
<medicalFiles> UC <countyRec> S <patient> S <name>John Smith</name> UC </patient> <physician>Jim Dale</physician> UC </countyRec> <milBaseRec> TS <patient> S <name>Harry Green</name> UC </patient> <physician>Joe White</physician> UC </milBaseRec></medicalFiles>
medicalFiles
countyRec
patient
nameJohn Smith
milBaseRec
physicianJim Dale
physicianJoe White
nameHarry Green
patient
View over UC data
Secure XML Views - ExampleSecure XML Views - Example
35
medicalFiles
nameJohn Smith
physicianJim Dale
physicianJoe White
nameHarry Green
View over UC data
<medicalFiles> <name>John Smith</name> <physician>Jim Dale</physician> <name>Harry Green</name> <physician>Joe White</physician></medicalFiles>
Secure XML Views - ExampleSecure XML Views - Example
36
The Inference ProblemThe Inference Problem
General Purpose Database:
Non-confidential data + Metadata Undesired Inferences
Semantic Web:
Non-confidential data + Metadata (data and application semantics) + Computational Power +
Connectivity Undesired Inferences
37
Correlated Inference
address fortPublic
district basinPublic
Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base
placebase
Water SourceWater source
Base
Place
Water source Base
Confidential
Organizational Data
Confidential
Attacker
Public
Access Control
MisinfoMisinfoX
OntologyData Integration
andInferences
Web Data
X
Inference Control
Organizational Data
Confidential
PublicMisinfoMisinfo
ACCESS and INFERENCE CONTROL POLICY• Logic-based inference detection• Exact and partial disclosure• Data and metadata protection• Heterogeneous data manipulation• Metadata discovery
Inference Control
Data Mining and Privacy
Statistical inference:K-anonymityCorrelation
General inference:Pattern metadataBiased learning
CSCE 522 - Farkas 40Lecture 19
Future
41
CSCE 522 - Farkas 42Lecture 19
Next Class
Midterm exam