What is the information content of an algorithm? - AofA...

43
29 May 2013 What is the information content of an algorithm? Joachim M. Buhmann Computer Science Department, ETH Zurich

Transcript of What is the information content of an algorithm? - AofA...

Page 1: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013

What is the information content of an algorithm?

Joachim M. Buhmann

Computer Science Department, ETH Zurich

Page 2: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 2

Big Data in Neuroscience

WM

Page 3: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3

Connectivity-based Cortex Parcellation

Connectivity-based cortex parcellation is a compact description of the data:

Page 4: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 5

Connectivity-based Cortex Parcellation

Seed voxels at the cortical boundary

Fiber tracking Clustering

Cortex parcellation

Biological Question: What is the relationship between the morphology of a cortical region of interest (ROI) and its differences in connectivity?

Tractograms of distinct cortical areas show a variety of connectivity pattern

Page 5: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 6

Roadmap

§  Big data in neuroscience: cortex parcellation

§  Learning and robust optimization §  Measuring the information content of algorithms

§  Model validation by information theory §  Graph cut for gene expression analysis

§  Robust sorting

§  Outlook & vision: Low power computing, resilient programming

Page 6: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 7

What is learning?

§  Given: data and a hypothesis class

§  Learning is identifying a set of preferable hypotheses in the hypothesis class!

CX ⇠ P(X)

… 1

2 3

4

6

5 9

7 8

10 11

12 1

2 3

4

6

5 9

7 8

10 11

12

1

2 3

4

6

5

9

7 8

10 11

12

Page 7: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 8

How can we find good models? Informativeness vs. robustness §  Learning game: The learner receives two (k)

instances and is tested on a third (k+1) instance.

§  Idea: Convert an optimization problem into a coding problem! A set of approximate solutions represent a code word.

§  Accuracy is bounded by generalization capacity => Information theoretic model validation: Find an identifiable generalization set with minimal size!

(JB, ISIT 2010)

Page 8: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 9

How to design optimal generalization sets?

§  Fundamental problem: Data often live in much higher dimensional spaces than solutions! §  Graphs: Cuts:

§  How does noise perturb solutions?

=> Measure sensitivity of solution sets to data noise!

2(n2)

1

2 3

4

6

5

9

7 8

10 11

12

|C| = 2n

1

2 3

4

6

5 9

7 8

10 11

12 1

2 3

4

6

5 9

7 8

10 11

12

Page 9: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 10

{ ,…, }

Information Theory & Machine Learning

§  IT-Components §  Code book ( {strings} =

hypothesis class )

§  Noisy channel 1100110 -> 0100110

§  Decoder: minimize Hamming distance |0100110-1100110| < |0100110-1010101|

§  Machine Learning elements §  Set of generalization sets

§  Noisy optimization problem

§  Decoding by approximate optimization of test instance

0000000 0100101 1000011 1100110 0001111 0101010 1001100 1101001 0010110 0110011 1010101 1110000 0011001 0111100 1011010 1111111

{ ,…, } 1 2 3

4 6

5 9

7 8

10 11 12 5

12 7

4 6

1 11

3 8

2 10 9

5 12 7

4 6

1 11

3 8

2 10 9 5

12 7

4 6

1 11

3 8

2 10 9

Page 10: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 11

Approximate Optimization

§  Define weights for hypotheses with low costs

§  Total weight of low cost hypotheses (partition fct.)

§  Special case: Approximation set

w�(c,X) =

(1, R (c,X) R

�c?,X

�+ 1/�;

0, otherwise.

R(c,X)

c? 2 arg minc2C

R(c,X)

Z(X) =X

c2C(O)

exp(��R(c,X))

w�(c,X) = exp(��R(c,X)), c 2 C(O)

Page 11: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 12

§  Quantize hypothesis class by generalization sets

Coarsening of hypothesis classes and the two instances test

§  Size: partition function

§  Weight overlap models

joint approximations

Generalization set with minimizer

X0 ⇠ P(X)

X00 ⇠ P(X)

�Z(X0,X00) =X

c2C(O0)

w�(c,X0) w�(c,X00)

Z(X0) =X

c2C(O0)

w�(c,X0)

Page 12: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 13

Asymptotic equipartition property

§  Condition for cost function R(c,X): Convergence of log partition functions towards free energies

F 0 := limn!1

� log Z(X0(n))

log |C(O0(n))|,

F 00 := limn!1

� log Z(X00(n))

log |C(O00(n))|,

�F := limn!1

� log�Z(X0(n),X00(n)

)

log |C(O00(n))|

Page 13: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 14

Jointly typical instances

A(n)✏ :=

((X0(n)

,X00(n)) 2 X (n) ⇥ X (n) :

������log Z(X0(n)

)

log |C(O0(n))|� F 0

����� < ✏ ,

������log Z(X00(n)

)

log |C(O00(n))|� F 00

����� < ✏ ,

������log�Z(X0(n)

,X00(n))

log |C(O00(n))|��F

����� < ✏

)

Page 14: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 15

§  Set of generalization sets

§  How to generate sets of generalization sets?

Random transformations

Generalization sets as symbols for coding

§  Cover hypothesis class densely, but identifiably!

Transformation

{ ,…, } 1 2 3

4 6

5 9

7 8

10 11 12 5

12 7

4 6

1 11

3 8

2 10 9

1 2

3

4

5

5 9

7

4

1

1 2

3

4

5

5 3

2

1

4

T =�⌧ : R(c,X) = R(⌧ � c, ⌧ � X)

C(O0)

Page 15: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 16

Validating spectral clusterings

§  Graph representation: vertices denote objects edges express (dis)similarities

§  Hypothesis class: all 3-cuts of a graph

1 1 1 11 1 1 1 11 1 11 1 1 11 1 1 1 1

1 1 1 11 1 1 1

1 1 11 1 1 1

1 1 11 1

1 1 1

1 1 1 11 1 1 1 11 1 11 1 1 11 1 1 1 1

1 1 1 11 1 1 1

1 1 11 1 1 1

1 1 11 1

1 1 1

1

2 3

4

6

5

9

7 8

10 11

12 1

2 3

4

6

5

9

7 8

10 11

12

Page 16: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 17

1 1 1 11 1 1 1 11 1 11 1 1 11 1 1 1 1

1 1 1 11 1 1 1

1 1 11 1 1 1

1 1 11 1

1 1 1

5

12 7

4

6

1

11

3 8

2 10

9 5

12 7

4

6

1

11

3 8

2 10

9 1

2 3

4

6

5

9

7 8

10 11

12 1

2 3

4

6

5

9

7 8

10 11

12

1 1 1 11 1 1 1 11 1 11 1 1 11 1 1 1 1

1 1 1 11 1 1 1

1 1 11 1 1 1

1 1 11 1

1 1 1

1 1 1 1 11 1 1

1 1 1 11 1 1 11 1 1 1

1 1 1 11 1 1

1 1 11 1 11 1

1 1 1 11 1 1 1 1

1 1 1 1 11 1 1

1 1 1 11 1 1 11 1 1 1

1 1 1 11 1 1

1 1 11 1 11 1

1 1 1 11 1 1 1 1

1 1 1 1 -1 1 1

- 1 1 11 1 1 11 1 1 -

- - 1 1 11 1 1

1 - 1 11 1 11 1 1

1 1 1 1 1 1- 1 1 - 1 1 1

Code problem generation for Graph Cut

graph cut code problems graph cut

test problem

… M

1 1 1 1 -1 1 1

- 1 1 11 1 1 11 1 1 -

- - 1 1 11 1 1

1 - 1 11 1 11 1 1

1 1 1 1 1 1- 1 1 - 1 1 1

5

12 7

4

6

1

11

3 8

2 10

9 5

12 7

4

6

1

11

3 8

2 10

9 5

12 7

4

6

1

11

3 8

2 10

9

Page 17: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 18

Coding with Graph Cut approximation sets

sender

problem generator PG

receiver

define a set of code problems

1

2 3

4

6

5

9

7 8

10 11

12

1

2 3

4

6

5

9

7 8

10 11

12

5

12 7

4

6

1

11

3 8

2 10

9

5

12 7

4

6

1

11

3 8

2 10

9

R(·,X0) R(·,X0)

⌧ � X0{⌧1, . . . , ⌧M}

Page 18: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 19

Communication by approximate Graph Cuts

sender

problem generator

receiver

τs

estimate the coding error

1.  Sender sends a permutation index τs to problem generator.

2.  Problem generator sends a new problem with permuted indices to receiver without revealing τs.

3.  Receiver identifies the permutation by comparing generalization sets.

c ? ( ~ X ( 2 ) )

~ X ( 2 ) = ¾ s ± X ( 2 )

1

2 3

4

6

5

9

7 8

10 11

12

1

2 3

4

6

5

9

7 8

10 11

12

5

12 7

4

6

1

11

3 8

2 10

9

5

12 7

4

6

1

11

3 8

2 10

9

⌧̂

⌧̂

c?(X̃)

R(·, ⌧s � X00), s.t.

X0,X00 ⇠ P (X)

X̃ = ⌧s � X00

Page 19: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 20

Communication Process

§  Receiver compares sets of hypothesis weights

§  Decoding by maximizing weight overlap

§  Error event:

§  Calculate

⌧̂ := arg max⌧

# ( \ )⌧

⌧̂ 6= ⌧s

limn!1

P (⌧̂ 6= ⌧s|⌧s) = 0

Page 20: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 21

Error Probability §  Estimate error given random transformations

Markov ineq.

⌧ 2 T

Union bound

�Zj :=X

c2C(X00)

w�(c, ⌧j � X0) w�(c, ⌧s � X00)

P (⌧̂ 6= ⌧s|⌧s) = P

✓maxj 6=s

�Zj > �Zs

��⌧s

X

j 6=s

P��Zj > �Zs

��⌧s

MP��Z 6=s > �Zs

��⌧s

M EX0,X00E⌧ 6=s

�Z 6=s

�Zs

Page 21: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 22

Decoupling of generalization sets

§  Intersecting an approximation set with a randomly transformed one yields decoupling!

E⌧�Z⌧ = E⌧

X

c2C(X00)

w�(c, ⌧ � X0) w�(c, ⌧s � X00)

=X

c2C(X00)

w�(c, ⌧s � X00)1

|T|X

⌧2Tw�(⌧�1 � c,X0)

1

|T| Z(X00) Z(X0)

Page 22: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 23

Generalization capacity from typicality

§  Asymptotic error free communication is possible for

§  Interpretation: “mutual information” > rate log M

limn!1

P (⌧̂ 6= ⌧s|⌧s) = 0

P (⌧̂ 6= ⌧s|⌧s) M EX0,X00Z(X0) Z(X00)

|T| �Zs(X0,X00)

M

|T| exp⇣�F 0 log |C0| � (F 00 ��F) log |C00|

M Z(X0) Z(X00)|T| �Zs(X0,X00)

= exp⇣��log

|T|Z 0 + log

|C00|Z 00 � log

|C00|�Zs

� log M�⌘

Page 23: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 24

Algorithm selection by maximizing generalization capacity

sender

problem generator

receiver

Communication channel

§  Optimize the communication channel w.r.t. precision β, cost function R(.,.) and algorithm

§  Maximize “Mutual Information”

⌧̂R(·, ⌧s � X(2)), s.t.

X(1),X(2) ⇠ P (X)

⌧s

Page 24: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 25

§  Hypothesis class: set of binary

strings

§  Communication:

§  Costs of string s: Hamming

distance

§  Mutual information:

ASC for binary channel consistent with Shannon information theory

Channel capacity of BSC

⇠0, ⇠00 2 {�1, 1}n

R(s, ⇠0) =

nX

i=1

I{si 6=⇠0i}

⇠0 ) ) ⇠00

�� = 1

n

��{i : ⇠0 6= ⇠00}���

I� = ln 2 + (1 � �) ln cosh� � ln(cosh� + 1)

for (⇤) dI�

d� = 0(⇤)= ln 2 + � ln � + (1 � �) ln(1 � �)

Page 25: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 28

Clustering of Gene expression data

§  Spectral clustering of correlations in yeast cell cycle data

§  Pairwise clustering (PC), normalized cut (NCut), correlation clustering (CC), adaptive ratio cut, (ARC) dominant set clustering (DS), for different distance measures

Yeast gene expression data: Eisen et al. (1998) PNAS 95,14863

Page 26: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 29

Validating relational clustering costs

§  Cost function for pairwise clustering

§  Normalized cut

Defs.:

Rpc(c,X) = �KX

k=1

Gk

X

(i,j)2Ekk

Xij

|Ekk|

Gk = {i : c(i) = k}, Ekl = {(i, j) : c(i) = k ^ c(j) = l}

Rnc(c; W ) =X

vk

✓cut(Gv, V \ Gv)

assoc(Gv, V)

Page 27: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 30

§  Shifted correlation clustering shift s balances clusters sizes and should be adapted

§  Adaptive ratio cut Exponent interpolates between arithmetic and harmonic normalization to balance clusters

R(c,X, K, s) =1

2

X

1uK

X

(i,j)2Euu

(|Xij + s| � Xij � s)

+1

2

X

1uK

X

1v<u

X

(i,j)2Euv

(|Xij + s| + Xij + s).

R(c,X) =

KX

↵=1

KX

�=↵+1

cut(V↵, V�)⇣|V↵| 1

p + |V� | 1p

⌘�p

p 2 [�1, 1]

Page 28: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 31

Validating Dominant Set clustering §  Dominant set clustering employs a replicator

dynamics of evolutionary game theory to derive a partition

§  The replicator dynamics is applied in an iterative way to identify one cluster after another.

(Pavan & Pelillo, PAMI 2007)

xi(t + 1) = xi(t)(Ax)i

x(t)T Ax(t), x 2 simplex of Rn

Page 29: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 32

§  Gene expression data of Eisen et al. (1998) PNAS 95,14863

§  Pairwise clustering (PC) vs. dominant set clustering (DS) for different distance measures

Approximation Capacity of Yeast Gene Expression Data

Page 30: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 33

Influence of metric on AC §  Path-based distance §  Correlation dissimilarity §  Euclidean distance

Page 31: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 34

Consistency of clusterings

CC vs. PC (correlation distance)

PC & CC cluster objects (i,j) together PC & CC cluster objects (i,j) separately PC clusters objects (i,j) together, CC clusters separately PC clusters objects (i,j) separately, CC clusters together

CC & PC cluster objects (i,j) together CC & PC cluster objects (i,j) separately CC clusters objects (i,j) together, PC clusters separately CC clusters objects (i,j) separately, PC clusters together

CC vs. PC (path distance)

Page 32: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 35

Approximate sorting

§  Among all permutations π, find the one that orders the items best!

(This cost function counts the number of disagreements between the provided pairwise comparisons and the pairwise comparisons induced by a proposal ranking from the hypothesis class.)

1 1 11 1 1 1

1 11 1 1

11 1

“>” …

X = (xij)

RSorting(⇡|X) =X

i,j

xijI{⇡i>⇡j}

Page 33: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 36

Approximate Sorting §  Approximately minimize cost function

§  Approximation capacity

§  Approximate sorting algorithm: Gibbs sampler with

optimal β* or deterministic annealing up to β*

RSorting(⇡|X) =X

i,j

xijI{⇡i>⇡j}

max�

I� =1

nlog |T | + max

(1

n

nX

i=1

lognX

k=1

exp⇣��(E(1)

ik + E(1)ik )⌘

� 1

n

nX

i=1

log

nX

k=1

exp⇣��E(1)

ik

⌘ nX

k0=1

exp⇣��E(2)

ik0

⌘!)

1 1 1 1 1 1 1

1 1 1 1 1

1 1 1

Page 34: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 37

Prediction Performance of ASC

§  Experiments on synthetic data: preferences are derived from noise contaminated rank values (n=50)

Page 35: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 38

Robust chess rankings

§  Tournament data from the German chess league, years 2009-2011

Method Absolute error

Relative error

Joint global minimizer

0.29 14%

ASC 0.14 6.8%

Prediction error of chess ranks

Page 36: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 39

Dynamics of sorting algorithms Joint work with

L.M. Busse & M.H. Chehreghani

Information concentration of sorting algorithms

Run-time parameterizes the size of the approximation sets

=> Coupling of computational to statistical complexity

Cardinality of the approximation set is a function

of the noise level in the data

Sn

Page 37: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 40

Pareto-optimality of sorting algorithms

Performance vs. Robustness

Experimental setting: array size n=50 noise ~ Ber(0.1)

Page 38: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 41

Approximation set of selection sort SelectionSort Algorithm

t=1

t=2

t=3

CSelSortt=0

�CSelSortt = (n � t)!

�CSelSortt

Page 39: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 42

Information capacity of MergeSort vs. BubbleSort (Ludwig Busse 2012,

Diss. ETH #20600)

§  Experimental setting: sort n=500 items

Page 40: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 43

Conclusion §  Quantization: Noise quantizes mathematical

structures (hypothesis classes) => symbols §  These symbols can be used for coding! §  Optimal error free coding scheme determines

approximation capacity of a model class. ⇒ Bounds for robust optimization. ⇒ Quantization of hypothesis class measures

structure specific information in data.

Page 41: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 44

Future Work

§  Generalization: replace approximation sets based on cost functions by smoothed outputs of algorithms (“smoothed generalization”)

§  Model reduction in dynamical systems: quantize sets of ODEs or PDEs (systems biology)

§  Relate statistical complexity, i.e. the approximation capacity, to algorithmic or computational complexity.

Page 42: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 45

Low-Energy Architecture Trends

§  Novel low-power architectures operate near transistor threshold voltage (NTV) §  e.g., Intel Claremont §  1.5 mW @10 MHz (x86)

§  NTV promises 10x more energy efficiency at 10x more parallelism! §  105 times more soft errors (bits flip stochastically) §  Hard to correct in hardware à expose to programmer?

source: Intel

@

@ Thorsten Höffler

Page 43: What is the information content of an algorithm? - AofA 2013aofa2013.lsi.upc.edu/slides/Buhmann.pdf · 29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 3 Connectivity-based Cortex

29 May 2013 Joachim M. Buhmann AofA 2013, Menorca 46

Resilient Programming

§  Distorted computations §  Some parts of the program work

-  e.g., Conjugate Gradient while computing the new direction

§  Others cannot -  e.g., the target address for a jump-table

§  NTV is very intriguing for large-scale computing

§  Interesting model for developing algorithms, programming systems, and architectures

Source: MIT, CHRISTINE DANILOFF

@ Thorsten Höffler