Background on security
Definition of security
1. Attacker’s knowledge/capability– The attacker observes a set of encrypted values only – Ciphertext-only
attack (COA)• Suitable for most real life applications
– The attacker can generate the encrypted values of any plaintext of his choice – chosen-plaintext attack (CPA)• Baseline for public key cryptosystem. The attacker can use the public key to
generate as many as he wants
2. Attacker’s goal– To derive information about the plaintext, any information is fine –
semantic security• E.g., knowing one’s salary > 50k/month but not exact value may be a security
concern
– (A malicious data mining service provider) To return a wrong answer to the user - integrity
Some facts
• There isn’t really a formal method to prove the security against COA– People prefer provable security
• There is always a brute-force attack w.r.t. CPA– Try all the keys and find the one that matches all
plaintext-ciphertext pairs.– Security under CPA means the attack is a (proven)
hard problem
Views from crypto
• We do not know what the attacker knows– Better prepare for the worst• Require provable semantic security under a strong
attack model (at least CPA)
Semantic security
• Definition: no information about the plaintext (except the size) is leaked to the attacker
• An proven equivalent definition – indistinguishability (IND)– Given two encrypted values, the attacker cannot
distinguish them• Remark: Semantic security under CPA is often
written as IND-CPA
Security game
• IND-XXX can be modeled as a game– The attacker generates two messages m0 and m1
and send them to the key owner– The key owner randomly chooses 1 message and
encrypts it, c = E(mi)– With c, the attacker guesses which plain message c
corresponds to – Secure if Pr(guess correct) <= 0.5 + ε
• Where ε is a negligible value, often in the form of 1/xk
– Note: x is a constant, k is key length
Security vs performance
• In general (but not proven), a more secure scheme is more expensive
• Fact 1: Non-deterministic encryption must be required for semantic security– Deterministic encryption
• E(x1) = E(x2) iff x1 = x2
• One-to-one mapping• Onto function most of the time
– Simple attack• The attacker generates g0 = E(m0), g1 = E(m1)
• If gi = c, answer i• Pr(guess correct) = 100%
1
2
3
a
b
c
d
1
2
3
Security vs performance
• Non-deterministic encryption– One-to-many mapping
• Problem:– Ciphertext is longer– Storage cost and processing
cost are thus higher
1
2
3
a
b
c
d
e
f
g
h
1
2
3
Example
• RSA is a deterministic function– Public key: <e, n>, private key <d, n>– E(x) = xe mod n– D(y) = yd mod n
• RSA is not semantic secure
RSA with padding• When the industry refer to RSA, is it actually RSA with padding
– The padding scheme is optimal asymmetric encryption padding (OAEP)– Proven IND-CCA2 (a high security definition)
• Example of simpler padding– Encryption:
• Input: m• Generate random r• Let c = r xor m• Ciphertext: c||E(r)
– Decryption• y = c||E(r)• Recover r from D(E(r))• Decrypted message: m = c xor r
This padding doubles the size of an encrypted value
Secure database (SDB) problem
DB
Service provider (SP)Data Owner (DO)
Query Query
AnswerAnswer
DBDatabase should be encrypted
Compute query on encrypted data
Return an encrypted answer
(In)-feasibility of IND in SDB problem
• Security game:– The attacker generates two queries q0 and q1 and
send them to the DO– The DO randomly chooses 1 query and executes it
with SP– The (encrypted) result r is observed by the
attacker– With r, the attacker guesses which query r
corresponds to
Attacker’s strategy
• Pick q0 = “SELECT count(*)”
• Pick q1 = “SELECT *”
• If r is just an encrypted value, it is q0
• If r is a table, it is q1
• To prevent the above attack, at least make the query results indistinguishable by its size each query result is at least Ω(n)where n is number of tuples
• Decryption cost by DO is then Ω(n) - not better than computing the query using a linear scan
• Selection processing requires the SP to observe whether an encrypted tuple satisfies the query condition or not
Remark: Fully homormophic encryption with IND-CPA in SDB
Discussion paper: Shiyuan Wang, Divyakant Agrawal, and Amr El Abbadi. Is homomorphic encryption the holy grail for database queries on encrypted data? Technical report, Department of Computer Science, UCSB 2012
Cannot jump to an encrypted address
All operations in terms of circuit can be supported(AND, OR, NOT)All input and output are encrypted
Implication of knowing the result of a branch operation
Unknown process
Jump to a
Jump to b
Plain data:10, 20, 21,
22, 23
Plain data:24, 27, 28,
29, 40
Knowledge of plaintext from CPA
Implication of knowing the result of a branch operation
Unknown process
Jump to a
Jump to b
Plain data:10, 20, 21,
22, 23
Plain data:24, 27, 28,
29, 40
E(c)
Pick a = 50, b = 7
Attacker answer: c = a
Attack:
Re-writing the query may help
• If (x>10) { y = 20;} else { y = 100;}
r = cmp_grt(x, 10) // return 1 if x > 10, 0 otherwise
y = 20 + 80 * r
Cannot solve all problems!
Leakage of knowing branch result in practice
• Assume now we allow the SP to observe the branch (i.e., comparison) results, what kind of information is leaked?– Locality of data
Result of cmp(Y, E(q1))
E(t1)
E(t3)
E(t5)
E(t7)
E(t10)
Result of cmp(Y, E(q2))
E(t1)
E(t3)
E(t9)
E(t13)
Derived knowledge – COA:1. q2 q1
2. q2 t1[Y], t3[Y] q1
3. t5[Y] t1[Y], t3[Y] t9[Y]
So, we just protect the exact values in our scheme.And the use of index may make sense
Another way to prove IND (in SMC)
• Proof by simulation• Background– Each party received several messages from the
other party– Can they use these information to observe
anything about the other party?Alice:
Secret x = 3Bob:
Secret y = 7
Result: x+y = 10
Secure sum
Simulation
• Say Bob is the attacker now• Is there any difference on the messages Bob
received if Alice provides different input?– Indistinguishable
Alice:Secret x = 3
Bob:Secret y = 7
Result: x+y = 10
Secure sum
Secure Sum
Alice:Secret x = 3
Bob:Secret y = 7
Result: x+y = 10
Generate r1 = 70
Public parameter: n=100
Send m1 = r1+x mod n= 73
Send m2 = r2+y+m1 mod n= 30 Generate r2 = 50
Alice:Secret a = 60
Bob:Secret b = 50
Keep r2 as share
Keep m2-r1 as share
Bob’s view
Bob:Secret y = 7
Result: x+y = 10
Public parameter: n=100
Send m1 = r1+x mod n= 73Simulation:For any value of xGenerate r1’ = m1 – x mod nThe message m1 can be generated
Simulation succeeds. This protocol is secure w.r.t. IND.
A not secure example
Key agreement protocol
Public parameters:p, g
Bob
Observed: YA, XB
How to derive XA?
Note: since it must be a specific XA so that YA = gXA
Simulation fails.
Note: This protocol is not for protecting parties’ input from the other party
Relaxed security definition
• Also the approach of our paper• Bounded leakage of protocols– Can be proven by the simulations
• Used a lot by Chris Clifton from Purdue University
Jaideep Vaidya and Chris Clifton, Secure Set Intersection Cardinality with Application to Association Rule Mining, JCS 13(4), 2005.Jaideep Vaidya and Chris Clifton, Privacy-Preserving K-Means Clustering over Vertically Partitioned Data, SIGKDD, 2003.Murat Kantarcioglu and Chris Clifton, Privacy Preserving Data Mining of Association Rules on Horizontally Partitioned Data, TKDE 16(9), 2004.
Proof of relaxed definition
• Attacker’s knowledge– Its own input– Messages in the protocol– Leaked knowledge
• If the above is enough to simulate the execution of the protocol, there is not other information leak
• Then, argue the leaked knowledge is not very harmful
Top Related