Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

57
Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY

Transcript of Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

Page 1: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

Recovering Data in Presence of Malicious Errors

Atri RudraUniversity at Buffalo, SUNY

Page 2: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

2

The setupC(x)

x

y = C(x)+error

x Give up

Mapping C Error-correcting code or just code Encoding: x C(x) Decoding: y X C(x) is a codeword

Page 3: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

3

Codes are useful!

CellphonesSatellite Broadcast Deep-space

communicationInternet

CDs/DVDs RAID ECC MemoryPaper Bar-codes

Page 4: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

4

Redundancy vs. Error-correction Repetition code: Repeat every bit say 100

times Good error correcting properties Too much redundancy

Parity code: Add a parity bit Minimum amount of redundancy Bad error correcting properties

Two errors go completely undetected

Neither of these codes are satisfactory

1 1 1 0 0 1

1 0 0 0 0 1

Page 5: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

5

Two main challenges in coding theory Problem with parity example

Messages mapped to codewords which do not differ in many places

Need to pick a lot of codewords that differ a lot from each other

Efficient decoding Naive algorithm: check received word with all

codewords

Page 6: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

6

The fundamental tradeoff

Correct as many errors as possible with as little redundancy as possible

This talk: Answer is yes

Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?

Page 7: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

7

Overview of the talk Specify the setup

The model What is the optimal tradeoff ?

Previous work Construction of a “good” code High level idea of why it works Future Directions

Some recent progress

Page 8: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

8

Error-correcting codesC(x)

x

y

x Give up

Mapping C : kn

Message length k, code length n n≥ k

Rate R = k/n 1

Efficient means polynomial in n Decoding Complexity

Page 9: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

9

Shannon’s world

Noise is probabilistic Binary Symmetric Channel

Every bit is flipped

w/ probability p Benign noise model

For example, does not capture

bursty errorsClaude E. Shannon

Page 10: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

10

Hamming’s world

Errors are worst case error locations arbitrary symbol changes

Limit on total number of errors Much more powerful than

Shannon Captures bursty errors

We will consider this channel

model

Richard W. Hamming

Page 11: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

11

A “low level” view

Think of each symbol in being a packet The setup

Sender wants to send k packets After encoding sends n packets Some packets get corrupted Receiver needs to recover the original k packets

Packet size Ideally constant but can grow with n

Page 12: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

12

Decoding

C(x) sent, y received x k, y n

How much of y must be correct to recover x ? At least k packets must be correct At most (n-k)/n = 1-R fraction of errors 1-R is the information-theoretic limit

: the fraction of errors decoder can handle Information theoretic limit implies 1-R

x C(x)

yR = k/n

Page 13: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

13

Can we get to the limit or 1-R ? Not if we always want to uniquely recover the

original message Limit for unique decoding, (1-R)/2

(1-R)/2 (1-R)/2

1-R

c1

c2

y

R 1-R

(1-R)/2

Page 14: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

14

List decoding [Elias57, Wozencraft58] Always insisting on unique codeword is

restrictive The “pathological” cases are rare

“Typical” received word can be decoded beyond (1-R)/2

Better Error-Recovery Model Output a list of answers List Decoding Example: Spell Checker

(1-R)/2

Almost all the space in higher dimension.

All but an exponential (in n) fraction

Page 15: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

15

Advantages of List decoding

Typical received words have an unique closest codeword List decoding will return list size of one such

received words Still deal with worst case errors How to deal with list size

greater than one ? Declare an error; or Use some side information

Spell checker

(1-R)/2

Page 16: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

16

The list decoding problem

Given a code and an error parameter For any received word y

Output all codewords c such that c and y disagree in at most fraction of places

Fundamental Question The best possible tradeoff between R and ?

With “small” lists Can it approach information-theoretic limit 1-R ?

Page 17: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

17May 25, 2007 Ph.D. Final Exam 17

Other applications of list decoding Cryptography

Cryptanalysis of certain block-ciphers [Jakobsen98] Efficient traitor tracing scheme [Silverberg, Staddon, Walker 03]

Complexity Theory Hardcore predicates from one way functions [Goldreich,Levin 89;

Impagliazzo 97; Ta-Shama, Zuckerman 01] Worst-case vs. average-case hardness [Cai, Pavan, Sivakumar 99;

Goldreich, Ron, Sudan 99; Sudan, Trevisan, Vadhan 99; Impagliazzo, Jaiswal,

Kabanets 06] Other algorithmic applications

IP Traceback [Dean,Franklin,Stubblefield 01; Savage, Wetherall, Karlin,

Anderson 00] Guessing Secrets [Alon,Guruswami,Kaufman,Sudan 02; Chung, Graham,

Leighton 01]

Page 18: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

18

Overview of the talk Specify the setup

The model The optimal tradeoff between rate and fraction of

errors Previous work Construction of a “good” code High level idea of why it works Future Directions

Some recent progress

Page 19: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

19

Information theoretic limit

< 1 - R Information-

theoretic limit Can handle

twice as many errors

Rate (R)

Unique decoding

Inf. theoretic limit

Fra

c. o

f Err

ors

()

Page 20: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

20

Achieving information theoretic limit There exist codes that achieve the

information theoretic limit ≥ 1-R-o(1) Random coding argument

Not a useful result Codes are not explicit No efficient list decoding algorithms

Need explicit construction of such codes We also need poly time (list) decodability

Requires list size to be polynomial

Page 21: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

21

The challenge

Explicit construction of code(s) Efficient list decoding algorithms up to the

information theoretic limit For rate R, correct 1-R fraction of errors

Shannon’s work raised similar challenge Explicit codes achieving the information theoretic

limit for stochastic models The challenge has been met [Forney 66, Luby-

Mitzenmacher-Shokrollahi-Spielman 01, Richardson-Urbanke01] Now for stronger adversarial model

Page 22: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

22

Guruswami-Sudan

The best until 1998

1 - R1/2

Reed-Solomon codes

Sudan 95, Guruswami-Sudan98

Better than unique decoding

At R=0.8 Unique: 10% Inf. Th. limit: 20% GS : 10.56 %

Unique decoding

Inf. theoretic limit

Fra

c. o

f Err

ors

()

Rate (R)

Motivating Question:

Close the gap between blue and

green line with explicit efficient codes

Page 23: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

23

The best until 2005

1-(sR)s/(s+1)

s 1 Parvaresh,Vardy

s=2 in the plot

Based on Reed-Solomon codes

Improves GS for R < 1/16

Unique decoding

Inf. theoretic limit

Guruswami-Sudan

Parvaresh-Vardy

Fra

c. o

f Err

ors

()

Rate (R)

Page 24: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

24

Our Result

1- R - > 0 Folded RS codes [Guruswami, R.

06]

Unique decoding

Inf. theoretic limit

Guruswami-Sudan

Parvaresh-Vardy

Fra

c. o

f Err

ors

()

Rate (R)

Our work

Page 25: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

25

Overview of the talk Specify the setup

The model The optimal tradeoff between rate and fraction of

errors Previous work Our Construction High level idea of why it works Future Directions

Recent progress

Page 26: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

26

The main result

Construction of algebraic family of codes For every rate R >0 and >0

List decoding algorithm that can correct 1 - R - fraction of errors

Based on Reed-Solomon codes

Page 27: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

27

Algebra terminology

F will denote a finite field Think of it as integers mod some prime

Polynomials Coefficients come from F Poly of degree 3 over Z7

f(X) = X3 +4X +5 Evaluate polynomials at points in F

f(2) = (8 + 8 + 5) mod 7 = 21 mod 7 =0 Irreducible polynomials

No non-trivial polynomial factors X2+1 is irreducible over Z7 , while X2-1 is not

Page 28: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

28

Reed-Solomon codes

Message: (m0,m1,…,mk-1) Fk

View as poly. f(X) = m0+m1X+…+mk-1Xk-1

Encoding, RS(f) = ( f(1),f(2),…,f(n) ) F ={ 1,2,…,n}

[Guruswami-Sudan] Can correct up to

1-(k/n)1/2 errors in polynomial timef(1) f(2) f(3) f(4) f(n)

Page 29: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

29

Parvaresh Vardy codes (of order 2)

f(1) f(2) f(3) f(4) f(n)

g(1) g(2) g(3) g(4) g(n)

f(X) g(X)g(X)=f(X)q mod E(X)

Extra information from g(X) helps in decoding Rate, RPV = k/2n [PV05] PV codes can correct 1 -(k/n)2/3 errors

in polynomial time 1 - (2RPV)2/3

Page 30: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

30

Towards our solution

Suppose g(X) = f(X)q mod E(X) = f(X) Let us look again at the PV codeword

f(1) f(1)

g(1) g(1)f(1) f(1)

Page 31: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

31

Folded Reed Solomon Codes Suppose g(X) = f(X)q mod E(X) = f(X) Don’t send the redundant symbols Reduces the length to n/2

R = (k/2)/(n/2) = k/n Using PV result, fraction of errors

1 - (k/n)2/3 = 1 - R2/3

f(1) f(1)

f(1) f(1)

Page 32: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

32

Getting to 1-R-

Started with PV code with s = 2 to get 1 - R2/3

Start with PV code with general s 1 - Rs/(s+1)

Pick s to be “large” enough to approach 1-R- Decoding complexity increases from that of

Parvaresh-Vardy but still polynomial

Page 33: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

33

What we actually do We show that for any generator F\{ 0 }

g(X) = f(X)q mod E(X) = f(X) Can achieve similar compression by grouping

elements in orbits of m’~n/m, R ~ (k/m)/(n/m) = k/n

f(1) f(m) f((m’-1)m )

f(m-1) f(2m-1) f(mm’-1)

f() f(m+1) f((m’-1)m+1 )

Page 34: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

34

Proving f(X)q mod E(X) = f(X) First use the fact f(X)q = f(Xq) over F

Need to show f(Xq) mod E(X) = f(X) Proving Xq mod E(X) = X suffices Or, E(X) divides Xq-1 - E(X) = Xq-1 – is irreducible

Page 35: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

35

Our Result

· 1- R - > 0 Folded RS codes [Guruswami, R.

06]

Unique decoding

Inf. theoretic limit

Guruswami-Sudan

Parvaresh-Vardy

Fra

c. o

f Err

ors

()

Rate (R)

Our work

Page 36: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

36

“Welcome” to the dark side…

Page 37: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

37

Limitations of our work

To get to 1 - R - , need s > 1/ Alphabet size = ns > n1/

Fortunately can be reduced to 2poly(1/)

Concatenation + Expanders [Guruswami-Indyk’02] Lower bound is 21/

List size (running time) > n1/

Open question to bring this down

Page 38: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

38

Time to wake up

Page 39: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

39

Overview of the talk List Decoding primer Previous work on list decoding Codes over large alphabets

Construction of a “good” code High level idea of why it works

Codes over small alphabets The current best codes

Future Directions Some (very) modest recent progress

Page 40: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

40

Optimal Tradeoff for List Decoding Best possible is H-1 (1-R)

H()= - log - (1- )log(1- ) Exists (H-1(1-R-),O(1/ )) list decodable code

Random code of rate R has the property whp > H-1(1-R+) implies super poly list size

For any code

For large q, H-1 (1-R) 1-R

q

q

q

q

Page 41: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

41

Our Results (q=2)

Optimal tradeoff H-1(1-R)

[Guruswami, R. 06] “Zyablov”

bound [Guruswami, R.

07] Blokh-Zyablov

# E

rro

rs

Rate

Zyablov bound

Blokh-Zyablov bound

Previous best

Optimal Tradeoff

Page 42: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

42

How do we get binary codes ? Concatenation of codes [Forney 66]

C1: (GF(2k))K (GF(2k))N (“Outer” code)

C2: GF(2)k (GF(2))n (“Inner” code)

C1± C2: (GF(2))kK (GF(2))nN

Typically k=O(log N) Brute force decoding for inner code

m1 m2

wNw1 w2

mKm

C1(m)

C2(w1) C2(w2)C2(wN) C1± C2(m)

Page 43: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

43

List Decoding concatenated code C1 = folded RS code

C2 = “suitably chosen” binary code Natural decoding algorithm

Divide up the received word into blocks of length n

Find closest C2 codeword for each block

Run list decoding algorithm for C1 Loses Information!

Page 44: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

44

List Decoding C2

y1 y2 yN

How do we “list decode” from lists ?

2 GF(2)n

S1 S2 SN

2 GF(2)k

Page 45: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

45

The list recovery problem

Given a code and an error parameter For any set of lists S1,…,SN such that

|Si| s, for every i

Output all codewords c such that ci 2 Si for at least 1-fraction of i’s

List decoding is special case with s=1

Page 46: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

46

List Decoding C1± C2

y1 y2 yN

S1 S2 SN

List decode C 2

List Recovering Algorithm for C1

Page 47: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

47

Putting it together [Guruswami, R. 06] C1 can be list recovered from 1 and C2 can be

list decoded from 2 errors C1± C2 list decoded from 12 errors

Folded RS of rate R list recoverable from 1-R errors

Exists inner codes of rate r list decoded from H-1 (1-r) errors Can find one by “exhaustive” search

C1± C2 list decodable fr’m (1-R)H-1(1-r) errors

Page 48: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

48

Multilevel Concatenated Codes C1: (GF(2k))K (GF(2k))N (“Outer” code 1)

C2: (GF(2k))L (GF(2k))N (“Outer” code 2)

Cin: GF(2)2k (GF(2))n (“Inner” code)

m1 m2 mK m

vNv1 v2 C1(m)

M1 M2 ML M

wNw1 w2 C2(M)

Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)

C1 and C2 are FRS

Page 49: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

49

Advantage over rate rR Concat Codes C1, C2 ,Cin

have rates R1, R2 and r Final rate r(R1+R2)/2, choose R1< R

Step 1: Just recover m List decode Cin up to H-1 (1-r) errors

List recover C1 up to 1-R1 errors m1 m2 mK m

vNv1 v2 C1(m)

M1 M2 ML M

wNw1 w2 C2(M)

Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)

Can handle (1-R1)H-1(1-r) >(1-R)H-1(1-r)

errors

Page 50: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

50

Advantage over Concatenated Codes Step 2: Just recover M, given m

Subcode of Cin of rate r/2 acts on M List decode subcode upto H-1(1-r/2) errors List recover C2 upto 1-R2 errors

Can handle (1-R2) H-1(1-r/2) errorsm1 m2 mK m

vNv1 v2 C1(m)

M1 M2 ML M

wNw1 w2 C2(M)

Cin(v1,w1) Cin(v2,w2) Cin(vN,wN)

Page 51: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

51

Wraping it up

Total errors that can be handled min{(1-R1)H-1(1-r) , (1-R2) H-1(1-r/2) }

Better than (1-R)H-1 (1-r) (R1+R2)/2=R (recall that R1<R) H-1(1-r/2) > H-1(1-r) so choose R2 a bit > R

Optimize over choices of r, R1 and R2

Need nested list decodability of inner code Blokh Zyablov follows from multiple outer

codes

Page 52: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

52

Our Results (q=2)

Optimal tradeoff H-1(1-R)

[Guruswami, R. 06] “Zyablov”

bound [Guruswami, R.

07] Blokh-Zyablov

# E

rro

rs

Rate

Zyablov bound

Blokh-Zyablov bound

Previous best

Optimal Tradeoff

Page 53: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

53

How far can concatenated codes go? Outer code: folded RS Random and independent inner codes

Different inner codes for each outer symbol Can get to the information theoretic limit

= H-1(1-R) [Guruswami, R. 08]

Page 54: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

54

To summarize

List decoding: A central coding theory notion Permits decoding up to the optimal fraction of

adversarial errors Bridges adversarial and probabilistic approaches

to information theory Shannon’s information theoretic limit p = H-1 (1-R) List decoding information theoretic limit = H-1(1-R)

Efficient list decoding possible for algebraic codes

Page 55: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

55

Our Contributions

Folded RS codes are explicit codes that achieve information theoretic limit for list decoding

Better list decoding for binary codes Concatenated codes can get us to list

decoding capacity

Page 56: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

56

Open Questions

Reduce decoding complexity of our algorithm List decoding for binary codes

Explicitly achieve error bound = H-1(1-R) Erasures: decode when = 1-R

Non-algebraic codes ? Graph based codes ? Other applications of these new codes

Extractors [Guruswami, Umans, Vadhan 07] Approximating NP-witnesses [Guruswami, R. 08]

Page 57: Recovering Data in Presence of Malicious Errors Atri Rudra University at Buffalo, SUNY.

57

Thank You

Questions ?