Download - Fundamentals cig 4thdec

Transcript
Page 1: Fundamentals cig 4thdec

Fundamentals of Algorithms and Data-Stru tures in

Information-Geometri Spa es

Frank NIELSEN

É ole Polyte hnique, Fran e

Sony Computer S ien e Laboratories, In

MEXT-ISM Workshop on Information Geometry for Ma hine Learning

Brain S ien e Institute, RIKEN

4th De ember 2014

© 2014 Frank Nielsen 1/75

Page 2: Fundamentals cig 4thdec

Brief histori al review of Computational Geometry (CG)

Three resear h periods:

1. Geometri algorithms:

Voronoi/Delaunay, minimum spanning trees, data-stru tures for proximity

queries

2. Geometri omputing:

robustness, algebrai degree of predi ates, programs that work/s ale!

3. Computational topology:

simpli ial omplexes, ltrations, input=distan e matrix

→ paradigm of Topologi al Data Analysis (TDA)

Show asing libraries for CG software:

CGAL http://www. gal.org/

Geometry Fa tory http://geometryfa tory. om/

Gudhi https://proje t.inria.fr/gudhi/

Ayasdi http://www.ayasdi. om/

© 2014 Frank Nielsen 1.CG History 2/75

Page 3: Fundamentals cig 4thdec

Outline

Review of the basi algorithmi toolbox in omputational geometry:

Voronoi diagrams and dual Delaunay, spanning balls

Generalizations of those on epts and toolbox to information spa es:

Riemannian omputational information geometry

Dually ane onne tions omputational information geometry

Appli ations to lustering, learning mixtures, et .

What is a good/friendly geometri omputing spa e?

© 2014 Frank Nielsen 1.CG History 3/75

Page 4: Fundamentals cig 4thdec

Basi s of Eu lidean

Computational Geometry:

Voronoi diagrams and dual

Delaunay omplexes

© 2014 Frank Nielsen 2.Ordinary CG 4/75

Page 5: Fundamentals cig 4thdec

Eu lidean (ordinary) Voronoi diagrams

P = P1

, ...,Pn: n distin t point generators in Eu lidean spa e Ed

V (Pi) = X : DE (Pi ,X ) ≤ DE (Pj ,X ), ∀j 6= i

Voronoi diagram = ell omplex V (Pi)'s with their fa es

© 2014 Frank Nielsen 2.Ordinary CG 5/75

Page 6: Fundamentals cig 4thdec

Voronoi diagrams from bise tors and ∩ halfspa es

Bise tors

Bi(P ,Q) = X : DE (P ,X ) = DE (Q,X )

→ are hyperplanes in Eu lidean geometry

Voronoi ells as halfspa e interse tions:

V (Pi) = X : DE (Pi ,X ) ≤ DE (Pj ,X ), ∀j 6= i = ∩ni=1

Bi

+(Pi ,Pj)

DE (P ,Q) = ‖θ(P)− θ(Q)‖2

=√

∑di=1

(θi (P)− θi (Q))2

θ(P) = p: Cartesian oordinate system with θj(Pi ) = p(j)i .

⇒ Many appli ations of Voronoï diagrams: rystal growth,

odebook/quantization, mole ule interfa es/do king, motion planning, et .

© 2014 Frank Nielsen 2.Ordinary CG 6/75

Page 7: Fundamentals cig 4thdec

Voronoi diagrams and dual Delaunay simpli ial omplex

Empty sphere property, max min angle triangulation, et

Voronoi & dual Delaunay triangulation

→ non-degenerate point set = no (d + 2) points o-spheri al

Duality: Voronoi k-fa e ⇔ Delaunay (d − k)-simplex

Bise tor Bi(P ,Q) perpendi ular ⊥ to segment [PQ]

© 2014 Frank Nielsen 2.Ordinary CG 7/75

Page 8: Fundamentals cig 4thdec

Voronoi & Delaunay : Complexity and algorithms

Combinatorial omplexity: Θ(n⌈

d2

⌉) (→ quadrati in 3D)

mat hed for points on the moment urve: t 7→ (t, t2, .., td )

Constru tion: Θ(n log n + n⌈

d2

⌉), optimal

some output-sensitive algorithms but...

Ω(n log n + f ), not yet optimal output-sensitive algorithms.

© 2014 Frank Nielsen 2.Ordinary CG 8/75

Page 9: Fundamentals cig 4thdec

Modeling population spa es in information geometry

Population spa e Pθ(x)θ interpreted as a smooth manifold equipped with

the Fisher Information Matrix (FIM):

Riemannian modeling: metri length spa e with the FIM as metri tensor

(orthogonality), and the Levi-Civita metri onne tion for length

minimizing geodesi s

Dual ±1 ane onne tion modeling: dual geodesi s that des ribe

parallel transport, non-metri dual divergen es indu ed by dual potential

Legendre onvex fun tions. Dual ±α onne tions.

→ Algorithmi onsiderations of these two approa hes

Population spa e, parameter spa e, obje t-oriented geometry, et .

© 2014 Frank Nielsen 3.Information geometry 9/75

Page 10: Fundamentals cig 4thdec

Riemannian omputational

information geometry from

the viewpoint of omputing

© 2014 Frank Nielsen 4.Riemannian CIG 10/75

Page 11: Fundamentals cig 4thdec

Population spa es: Hotelling (1930) [12 & Rao (1945) [33

Birth of dierential-geometri methods in statisti s.

Fisher information matrix (non-degenerate positive denite) an be used

as a (smooth) Riemannian metri tensor g .

Distan e between two populations indexed by θ

1

and θ2

: Riemannian

distan e (metri length)

First appli ations in statisti s:

Fisher-Hotelling-Rao (FHR) geodesi distan e used in lassi ation:

Find the losest population to a given set of populations

Used in tests of signi an e (null versus alternative hypothesis), power

of a test: P(reje t H0

|H0

is false)→ dene surfa es in population spa es

© 2014 Frank Nielsen 4.Riemannian CIG 11/75

Page 12: Fundamentals cig 4thdec

Rao's distan e (1945, introdu ed by Hotelling 1930 [12)

Innitesimal squared length element:

ds2 =∑

i ,j

gij(θ)dθidθj = dθT I (θ)dθ

Geodesi and distan e are hard to expli itly al ulate:

ρ(p(x ; θ1

), p(x ; θ2

)) = min

θ(s)θ(0)=θ

1

θ(1)=θ2

1

0

(

ds

)T

I (θ)dθ

dsds

Rao's distan e not known in losed-form for multivariate normals

Advantages: Metri property of ρ + many tools of dierential

geometry [1: Riemannian Log/Exp tangent/manifold mapping

© 2014 Frank Nielsen 4.Riemannian CIG 12/75

Page 13: Fundamentals cig 4thdec

Extrinsi Computational Geometry on tangent planes

Tensor g = Q(x) ≻ 0 denes smooth inner produ t 〈p, q〉x = p⊤Q(x)qthat indu es a normed distan e:

dx(p, q) = ‖p − q‖x =√

(p − q)⊤Q(x)(p − q)

Mahalanobis metri distan e on tangent planes :

∆Σ(X1

,X2

) =√

(µ1

− µ2

)⊤Σ−1(µ1

− µ2

) =√

∆µ⊤Σ−1∆µ

Cholesky de omposition Σ = LL⊤

∆(X1

,X2

) = DE (L−1µ

1

, L−1µ2

)

CG on tangent planes = ordinary CG on transformed points x ′ ← L−1x .

Extrinsi vs intrinsi means [10

© 2014 Frank Nielsen 4.Riemannian CIG-1.Mahalanobis 13/75

Page 14: Fundamentals cig 4thdec

Mahalanobis Voronoi diagrams on tangent planes (extrinsi )

In statisti s, ovarian e matrix Σ a ount for both orrelation and dimension

(feature) s aling

Dual stru ture ≡ anisotropi Delaunay triangulation

⇒ empty ir umellipse property (Cholesky de omposition)

© 2014 Frank Nielsen 4.Riemannian CIG-1.Mahalanobis 14/75

Page 15: Fundamentals cig 4thdec

Riemannian Mahalanobis metri tensor (Σ−1, PSD)

ρ(p1

, p2

) =√

(p1

− p2

)⊤Σ−1(p1

− p2

), g(p) = Σ−1 =

[

1 −1−1 2

]

non- onformal geometry: g(p) 6= f (p)I © 2014 Frank Nielsen 4.Riemannian CIG-1.Mahalanobis 15/75

Page 16: Fundamentals cig 4thdec

Riemannian statisti al Voronoi diagrams

... for statisti al population spa es:

Lo ation-s ale 2D families have onstant non-positive urvature

(Hotelling, 1930): Riemannian statisti al Voronoi diagrams amount

to hyperboli Voronoi diagrams or Eu lidean diagrams (lo ation

families only like isotropi Gaussians)

Multinomial family has spheri al geometry on the positive orthant:

Spheri al Voronoi diagram

( ompute as stereographi proje tion ∝ Eu lidean Voronoi diagrams)

But for arbitrary families p(x |θ): Geodesi s not in losed forms → limited

omputational framework in pra ti e (ray shooting, et .)

© 2014 Frank Nielsen 4.Riemannian CIG-1.Mahalanobis 16/75

Page 17: Fundamentals cig 4thdec

Normal/Gaussian family and 2D lo ation-s ale families

Fisher Information Matrix (FIM):

I (θ) =

[

Ii ,j(θ) = Eθ

[

∂θilog p(x |θ) ∂

∂θjlog p(x |θ)

]]

FIM for univariate normal/multivariate spheri al distributions:

I (µ, σ) =

[

1

σ20

0

2

σ2

]

=1

σ2

[

1 0

0 2

]

, I (µ, σ) = diag

(

1

σ2

, ...,1

σ2

,2

σ2

)

→ amount to Poin aré metri

dx2+dy2

y2, hyperboli geometry in

upper half plane/spa e.

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 17/75

Page 18: Fundamentals cig 4thdec

Riemannian Poin aré upper plane metri tensor ( onformal)

osh ρ(p1

, p2

) = 1+‖p

1

− p2

‖22y

1

y2

, g(p) =

[

1

y20

0

1

y2

]

=1

y2I

onformal: g(p) = 1

y2I

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 18/75

Page 19: Fundamentals cig 4thdec

Matrix SPD spa es and hyperboli geometry

Symmetri Positive Denite matri es M: ∀x 6= 0, x⊤Mx > 0.

2D SPD(2) matrix spa e has dimension d = 3: A positive one.

SPD(2)

(a, b, c) ∈ R3 : a > 0, ab − c2 > 0

Can be peeled into sheets of dimension 2, ea h sheet orresponding to a

onstant value of the determinant of the elements [8

SPD(2) = SSPD(2)× R+

where SSPD(2) = a, b, c =√1− ab) : a > 0, ab − c2 = 1

Mapping M(a, b, c)→ H

2

:

(

x0

= a+b2

≥ 1, x1

= a−b2

, x2

= c)

in hyperboloid model [28

z = a−b+2ic2+a+b

in Poin aré disk [28.

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 19/75

Page 20: Fundamentals cig 4thdec

Riemannian manifolds: Choi e of equivalent models?

Many equivalent models of hyperboli geometry:

Conformal (good for visualization sin e we an measure angles) versus

non- onformal ( omputationally-friendly for geodesi s) models.

Convert equivalently to other models of hyperboli geometry: Poin aré

disk, upper half spa e, hyperboloid, Beltrami hemisphere, et .

Two questions:

Given a metri tensor g and its indu ed metri distan e ρg (p, q), whatare the equivalent metri tensors g ′ ∼ g su h that ρg (p, q) = ρg ′(p′, q′)?Is one metri tensor better for omputing spa e?

Metri s yielding straight geodesi s are fully hara terized in 2D but in

higher dimensions?

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 20/75

Page 21: Fundamentals cig 4thdec

Riemannian Poin aré disk metri tensor ( onformal)

→ often used in Human Computer Interfa es, network routing (embedding

trees), et .

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 21/75

Page 22: Fundamentals cig 4thdec

Riemannian Klein disk metri tensor (non- onformal)

re ommended for omputing spa e sin e geodesi s are straight line

segments

Klein is also onformal at the origin (so we an perform translation

from and ba k to the origin)

Geodesi s passing through O in the Poin aré disk are straight (so we an

perform translation from and ba k to the origin)

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 22/75

Page 23: Fundamentals cig 4thdec

Hyperboli Voronoi diagrams [25, 29

In arbitrary dimension, Hd

In Klein disk, the hyperboli Voronoi diagram amounts to a lipped

ane Voronoi diagram, or a lipped power diagram with e ient

lipping algorithm [5.

then onvert to other models of hyperboli geometry: Poin aré disk,

upper half spa e, hyperboloid, Beltrami hemisphere, et .

Conformal (good for visualization) versus non- onformal (good for

omputing) models.

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 23/75

Page 24: Fundamentals cig 4thdec

Hyperboli Voronoi diagrams [25, 29

Hyperboli Voronoi diagram in Klein disk = lipped power diagram.

Power distan e:

‖x − p‖2 − wp

→ additively weighted ordinary Voronoi = ordinary CG

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 24/75

Page 25: Fundamentals cig 4thdec

Hyperboli Voronoi diagrams [25, 29

5 ommon models of the abstra t hyperboli geometry

https://www.youtube. om/wat h?v=i9IUzNxeH4o (5 min. video)

ACM Symposium on Computational Geometry (SoCG'14)

© 2014 Frank Nielsen 4.Riemannian CIG-2.Hyperboli geometry 25/75

Page 26: Fundamentals cig 4thdec

Dually ane onne tion

omputational information

geometry

© 2014 Frank Nielsen 5.Dually at CIG 26/75

Page 27: Fundamentals cig 4thdec

Dually at spa e onstru tion from onvex fun tions F

Convex and stri tly dierentiable fun tion F (θ) admits a

Legendre-Fen hel onvex onjugate F ∗(η):

F ∗(η) = sup

θ(θ⊤η − F (θ)), ∇F (θ) = η = (∇F ∗)−1(θ)

Young's inequality gives rise to anoni al divergen e [15:

F (θ) + F ∗(η′) ≥ θ⊤η′ ⇒ AF ,F∗(θ, η′) = F (θ) + F ∗(η′)− θ⊤η′

Writing using single oordinate system, get dual Bregman

divergen es:

BF (θp : θq) = F (θp)− F (θq)− (θp − θq)⊤∇F (θq)

= BF∗(ηq : ηp) = AF ,F∗(θp , ηq) = AF∗,F (ηq : θp)

dual ane oordinate systems with geodesi s straight:

η = ∇F (θ)⇔ θ = ∇F ∗(η). Tensor g(θ) = g∗(η)

© 2014 Frank Nielsen 5.Dually at CIG 27/75

Page 28: Fundamentals cig 4thdec

Dual divergen e/Bregman dual bise tors [6, 24, 26

Bregman sided (referen e) bise tors related by onvex duality:

BiF (θ1, θ2) = θ ∈ Θ |BF (θ : θ1

) = BF (θ : θ1

)BiF∗(η

1

, η2

) = η ∈ H |BF∗(η : η1

) = BF∗(η : η1

)

Right-sided bise tor: → θ-hyperplane, η-hypersurfa e

HF (p, q) = x ∈ X | BF (x : p ) = BF (x : q ).

HF : 〈∇F (p)−∇F (q), x〉 + (F (p)− F (q) + 〈q,∇F (q)〉 − 〈p,∇F (p)〉) = 0

Left-sided bise tor: → θ-hypersurfa e, η-hyperplane

H ′F (p, q) = x ∈ X | BF ( p : x) = BF ( q : x)

H ′F : 〈∇F (x), q − p〉+ F (p)− F (q) = 0

hyperplane = autoparallel submanifold of dimension d − 1

© 2014 Frank Nielsen 5.Dually at CIG-1.bise tor 28/75

Page 29: Fundamentals cig 4thdec

Visualizing Bregman bise tors

Primal oordinates θ Dual oordinates ηnatural parameters expe tation parameters

p

qSource Space: Itakura-Saito

p(0.52977081,0.72041688) q(0.85824458,0.29083834)

D(p,q)=0.66969016 D(q,p)=0.44835617

p’

q’

Gradient Space: Itakura-Saito dual

p’(-1.88760873,-1.38808518) q’(-1.16516903,-3.43833618)

D*(p’,q’)=0.44835617 D*(q’,p’)=0.66969016

Bi(P ,Q) and Bi

∗(P ,Q) an be expressed in either θ/η oordinate systems

© 2014 Frank Nielsen 5.Dually at CIG-1.bise tor 29/75

Page 30: Fundamentals cig 4thdec

Spa es of spheres: 1-to-1

mapping between d-spheres

and (d + 1)-hyperplanes usingpotential fun tions

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 30/75

Page 31: Fundamentals cig 4thdec

Spa e of Bregman spheres and Bregman balls [6

Dual sided Bregman balls (bounding Bregman spheres):

Ball

rF (c , r) = x ∈ X | BF (x : c) ≤ r

Ball

lF (c , r) = x ∈ X | BF (c : x) ≤ r

Legendre duality:

Ball

lF (c , r) = (∇F )−1(BallrF∗(∇F (c), r))

Illustration for Itakura-Saito divergen e, F (x) = − log x

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 31/75

Page 32: Fundamentals cig 4thdec

Spa e of Bregman spheres: Lifting map [6

F : x 7→ x = (x ,F (x)), hypersurfa e in Rd+1

, potential fun tion

Hp : Tangent hyperplane at p, z = Hp(x) = 〈x − p,∇F (p)〉+ F (p)

Bregman sphere σ −→ σ with supporting hyperplane

Hσ : z = 〈x − c ,∇F (c)〉 + F (c) + r .

(// to Hc and shifted verti ally by r)

σ = F ∩ Hσ.

interse tion of any hyperplane H with F proje ts onto X as a Bregman

sphere:

H : z = 〈x , a〉+ b → σ : BallF (c = (∇F )−1(a), r = 〈a, c〉 − F (c) + b)

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 32/75

Page 33: Fundamentals cig 4thdec

Lifting/Polarity: Potential fun tion graph F

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 33/75

Page 34: Fundamentals cig 4thdec

Spa e of Bregman spheres: Algorithmi appli ations [6

Union/interse tion of Bregman d -spheres from representational

(d + 1)-polytope [6

Radi al axis of two Bregman balls is an hyperplane: Appli ations to

Nearest Neighbor sear h trees like Bregman ball trees or Bregman

vantage point trees [31.

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 34/75

Page 35: Fundamentals cig 4thdec

Bregman proximity data stru tures [31

Vantage point trees: partition spa e a ording to Bregman balls

Partitionning spa e with interse tion of Kullba k-Leibler balls

→ e ient nearest neighbour queries in information spa es

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 35/75

Page 36: Fundamentals cig 4thdec

Appli ation: Minimum En losing Ball [23, 32

To a hyperplane Hσ = H(a, b) : z = 〈a, x〉 + b in Rd+1

, orresponds a ball

σ = Ball(c , r) in Rdwith enter c = ∇F ∗(a) and radius:

r = 〈a, c〉 − F (c) + b = 〈a,∇F ∗(a)〉 − F (∇F ∗(a)) + b = F ∗(a) + b

sin e F (∇F ∗(a)) = 〈∇F ∗(a), a〉 − F ∗(a) (Young equality)

SEB: Find halfspa e H(a, b)− : z ≤ 〈a, x〉+ b that ontains all lifted points:

min

a,br = F ∗(a) + b,

∀i ∈ 1, ..., n, 〈a, xi 〉+ b − F (xi ) ≥ 0

→ Convex Program (CP) with linear inequality onstraints

F (θ) = F ∗(η) = 1

2

x⊤x : CP → Quadrati Programming (QP) [11 used in

SVM. Smallest en losing ball used as a primitive in SVM [34

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 36/75

Page 37: Fundamentals cig 4thdec

Smallest Bregman en losing balls [32, 22

Algorithm 1: BBCA(P, l).c1

← hoose randomly a point in P ;for i = 2 to l − 1 do

// farthest point from ci wrt. BF

si ← argmax

nj=1

BF (ci : pj);

// update the enter: walk on the η-segment [ci , psi ]η

ci+1

← ∇F−1(∇F (ci )# 1

i+1∇F (psi )) ;

end

// Return the SEBB approximation

return Ball(cl , rl = BF (cl : X )) ;

θ-, η-geodesi segments in dually at geometry.

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 37/75

Page 38: Fundamentals cig 4thdec

Smallest en losing balls: Core-sets [32

Core-set C ⊆ S: SOL(S) ≤ SOL(C) ≤ (1+ ǫ)SOL(S)

extended Kullba k-Leibler Itakura-Saito

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 38/75

Page 39: Fundamentals cig 4thdec

InSphere predi ates wrt Bregman divergen es [6

Impli it representation of Bregman spheres/balls: onsider d + 1 support

points on the boundary

Is x inside the Bregman ball dened by d + 1 support points?

InSphere(x ; p0

, ..., pd ) =

1 ... 1 1

p0

... pd x

F (p0

) ... F (pd ) F (x)

sign of a (d + 2)× (d + 2) matrix determinant

InSphere(x ; p

0

, ..., pd ) is negative, null or positive depending on whether

x lies inside, on, or outside σ.

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 39/75

Page 40: Fundamentals cig 4thdec

Smallest en losing ball in Riemannian manifolds [2

c = a#Mt b: point γ(t) on the geodesi line segment [ab] wrt M su h that

ρM(a, c) = t × ρM(a, b) (with ρM the metri distan e on manifold M)

Algorithm 2: GeoA

c1

← hoose randomly a point in P ;for i = 2 to l do

// farthest point from ci

si ← argmax

nj=1

ρ(ci , pj);

// update the enter: walk on the geodesi line segment

[ci , psi ]

ci+1

← ci#M1

i+1

psi ;

end

// Return the SEB approximation

return Ball(cl , rl = ρ(cl ,P)) ;

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 40/75

Page 41: Fundamentals cig 4thdec

Approximating the smallest en losing ball in hyperboli

spa e

Initialization First iteration

Se ond iteration Third iteration

Fourth iteration after 104 iterations

http://www.sony sl. o.jp/person/nielsen/infogeo/RiemannMinimax/

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 41/75

Page 42: Fundamentals cig 4thdec

Bregman dual regular/Delaunay triangulations

Embedded geodesi Delaunay triangulations+empty Bregman balls

Delaunay Exponential Del. Hellinger-like Del.

empty Bregman sphere property,

geodesi triangles: embedded Delaunay.

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 42/75

Page 43: Fundamentals cig 4thdec

Dually orthogonal Bregman Voronoi & Triangulations

Ordinary Voronoi diagram is perpendi ular to Delaunay triangulation:

Voronoi k-fa e ⊥ Delaunay d − k-fa e

Bi(P ,Q) ⊥ γ∗(P ,Q)

γ(P ,Q) ⊥ Bi

∗(P ,Q)

© 2014 Frank Nielsen 5.Dually at CIG-2.Spa e of spheres 43/75

Page 44: Fundamentals cig 4thdec

Syntheti geometry: Exa t

hara terization of the

Bayesian error exponent but

no losed-form known

© 2014 Frank Nielsen 6.Bayesian error exponent 44/75

Page 45: Fundamentals cig 4thdec

Bayesian hypothesis testing, MAP rule and probability of

error Pe

Mixture p(x) =

i wipi (x). Task = Classify x Whi h omponent?

Prior probabilities: wi = P(X ∼ Pi) > 0 (with

∑ni=1

wi = 1)

Conditional probabilities: P(X = x |X ∼ Pi ).

P(X = x) =

n∑

i=1

P(X ∼ Pi )P(X = x |X ∼ Pi ) =

n∑

i=1

wiP(X |Pi)

Best rule = Maximum a posteriori probability (MAP) rule:

map(x) = argmaxi∈1,...,n wipi(x)

where pi(x) = P(X = x |X ∼ Pi ) are the onditional probabilities.

For w

1

= w2

= 1

2

, probability of error

Pe =1

2

min(p1

(x), p2

(x))dx ≤ 1

2

p1

(x)αp2

(x)1−αdx , for α ∈ (0, 1).

Best exponent α∗

© 2014 Frank Nielsen 6.Bayesian error exponent 45/75

Page 46: Fundamentals cig 4thdec

Error exponent for exponential families

Exponential families have nite dimensional su ient statisti s: →Redu e n data to D statisti s.

∀x ∈ X , P(x |θ) = exp(θ⊤t(x)− F (θ) + k(x))

F (·): log-normalizer/ umulant/partition fun tion, k(x): auxiliary term

for arrier measure.

Maximum likelihood estimator (MLE): ∇F (θ) = 1

n

i t(Xi) = η

Bije tion between exponential families and Bregman divergen es:

log p(x |θ) = −BF∗(t(x) : η) + F ∗(t(x)) + k(x)

Exponential families are log- on ave

© 2014 Frank Nielsen 6.Bayesian error exponent 46/75

Page 47: Fundamentals cig 4thdec

Geometry of the best error exponent

On the exponential family manifold, Cherno α- oe ient [7:

cα(Pθ1

: Pθ2

) =

pαθ1

(x)p1−αθ2

(x)dµ(x) = exp(−J(α)F (θ1

: θ2

))

Skew Jensen divergen e [20 on the natural parameters:

J

(α)F

(θ1

: θ2

) = αF(θ1

) + (1− α)F(θ2

)− F(θ(α)12

)

Cherno information = Bregman divergen e for exponential families:

C (Pθ1

: Pθ2

) = B(θ1

: θ(α∗)12

) = B(θ2

: θ(α∗)12

)

Finding best error exponent α∗?

© 2014 Frank Nielsen 6.Bayesian error exponent 47/75

Page 48: Fundamentals cig 4thdec

Geometry of the best error exponent: binary hypothesis [17

Cherno distribution P∗:

P∗ = Pθ∗12

= Ge(P1

,P2

) ∩ Bim(P1

,P2

)

e-geodesi :

Ge(P1

,P2

) =

E(λ)12

| θ(E (λ)12

) = (1− λ)θ1

+ λθ2

, λ ∈ [0, 1]

,

m-bise tor:

Bim(P1

,P2

) :

P | F (θ1

)− F (θ2

) + η(P)⊤∆θ = 0

,

Optimal natural parameter of P∗:

θ∗ = θ(α∗)12

= argminθ∈ΘB(θ1

: θ) = argminθ∈ΘB(θ2

: θ).

→ losed-form for order-1 family, or e ient bise tion sear h.

© 2014 Frank Nielsen 6.Bayesian error exponent 48/75

Page 49: Fundamentals cig 4thdec

Geometry of the best error exponent: binary hypothesis

P∗ = Pθ∗12

= Ge(P1

,P2

) ∩ Bim(P1

,P2

)

pθ1

pθ2

pθ∗

12

m-bisector

e-geodesic Ge(Pθ1, Pθ2

)

η-coordinate system

Pθ∗

12

C(θ1 : θ2) = B(θ1 : θ∗12)

Bim(Pθ1, Pθ2

)

Binary Hypothesis Testing: Pe bounded using Bregman divergen e between

Cherno distribution and lass- onditional distributions.

© 2014 Frank Nielsen 6.Bayesian error exponent 49/75

Page 50: Fundamentals cig 4thdec

Clustering and Learning

nite statisti al mixtures

© 2014 Frank Nielsen 6.Bayesian error exponent 50/75

Page 51: Fundamentals cig 4thdec

α-divergen es

For α ∈ R 6= ±1, α-divergen es [9 on positive arrays [36 :

Dα(p : q).=

d∑

i=1

4

1− α2

(

1− α

2

pi +1+ α

2

qi − (pi)1−α

2 (qi )1+α

2

)

with

Dα(p : q) = D−α(q : p) and in the limit ases D−1

(p : q) = KL(p : q)and D

1

(p : q) = KL(q : p), where KL is the extended Kullba kLeibler

divergen e KL(p : q).=

∑di=1

pi log pi

qi+ qi − pi

α-divergen es belong to the lass of Csiszár f -divergen es

If (p : q).=

∑di=1

qi f(

pi

qi

)

with the following generator:

f (t) =

4

1−α2

(

1− t(1+α)/2)

, if α 6= ±1,t ln t, if α = 1,− ln t, if α = −1

Information monotoni ity

© 2014 Frank Nielsen 6.Bayesian error exponent 51/75

Page 52: Fundamentals cig 4thdec

Mixed divergen es [30

Dened on three parameters p, q and r :

Mλ(p : q : r).= λD(p : q) + (1− λ)D(q : r)

for λ ∈ [0, 1].

Mixed divergen es in lude:

the sided divergen es for λ ∈ 0, 1,

the symmetrized (arithmeti mean) divergen e for λ = 1

2

, or skew

symmetrized for λ 6= 1

2

.

© 2014 Frank Nielsen 7.Mixed divergen es 52/75

Page 53: Fundamentals cig 4thdec

Symmetrizing α-divergen es

Sα(p, q) =1

2

(Dα(p : q) + Dα(q : p)) = S−α(p, q),

= M1

2

(p : q : p),

For α = ±1, we get half of Jereys divergen e:

S±1

(p, q) =1

2

d∑

i=1

(pi − qi) logpi

qi

Centroids for symmetrized α-divergen e usually not in losed form.

How to perform enter-based lustering without losed form entroids?

© 2014 Frank Nielsen 7.Mixed divergen es 53/75

Page 54: Fundamentals cig 4thdec

Jereys positive entroid [16

Jereys divergen e is symmetrized α = ±1 divergen es.

The Jereys positive entroid c = (c1, ..., cd ) of a set h

1

, ..., hn of nweighted positive histograms with d bins an be al ulated

omponent-wise exa tly using the Lambert W analyti fun tion:

c i =ai

W(

ai

g i e)

where ai =∑n

j=1

πjhij denotes the oordinate-wise arithmeti weighted

means and g i =∏n

j=1

(hij )πj

the oordinate-wise geometri weighted

means.

The Lambert analyti fun tion W [4 (positive bran h) is dened by

W (x)eW (x) = x for x ≥ 0.

→ Jereys k-means lustering . But for α 6= 1, how to luster?

© 2014 Frank Nielsen 7.Mixed divergen es 54/75

Page 55: Fundamentals cig 4thdec

Mixed α-divergen es/α-Jereys symmetrized divergen e

Mixed α-divergen e between a histogram x to two histograms p and q:

Mλ,α(p : x : q) = λDα(p : x) + (1− λ)Dα(x : q),

= λD−α(x : p) + (1− λ)D−α(q : x),

= M1−λ,−α(q : x : p),

α-Jereys symmetrized divergen e is obtained for λ = 1

2

:

Sα(p, q) = M1

2

,α(q : p : q) = M1

2

,α(p : q : p)

skew symmetrized α-divergen e is dened by:

Sλ,α(p : q) = λDα(p : q) + (1− λ)Dα(q : p)

© 2014 Frank Nielsen 7.Mixed divergen es 55/75

Page 56: Fundamentals cig 4thdec

Mixed divergen e-based k-means lustering

k distin t seeds from the dataset with li = ri .

Input: Weighted histogram set H, divergen e D(·, ·), integer k > 0, real

λ ∈ [0, 1];

Initialize left-sided/right-sided seeds C = (li , ri )ki=1

;

repeat

//Assignment

for i = 1, 2, ..., k do

Ci ← h ∈ H : i = arg minj Mλ(lj : h : rj );end

// Dual-sided entroid relo ation

for i = 1, 2, ..., k do

ri ← arg minx D(Ci : x) =∑

h∈CiwjD(h : x);

li ← arg minx D(x : Ci) =∑

h∈CiwjD(x : h);

end

until onvergen e;

dierent from the -means lustering with respe t to the symmetrized

divergen es

© 2014 Frank Nielsen 7.Mixed divergen es 56/75

Page 57: Fundamentals cig 4thdec

Mixed α-hard lustering: MAhC(H, k , λ, α)Input: Weighted histogram set H, integer k > 0, real λ ∈ [0, 1], real α ∈ R;

Let C = (li , ri )ki=1

← MAS(H, k , λ, α);repeat

//Assignment

for i = 1, 2, ..., k do

Ai ← h ∈ H : i = arg minj Mλ,α(lj : h : rj);end

// Centroid relo ation

for i = 1, 2, ..., k do

ri ←(

h∈Aiwih

1−α

2

)2

1−α

;

li ←(

h∈Aiwih

1+α

2

)2

1+α

;

end

until onvergen e;

© 2014 Frank Nielsen 7.Mixed divergen es 57/75

Page 58: Fundamentals cig 4thdec

Coupled k-Means++ α-Seeding

Algorithm 3: Mixed α-seeding; MAS(H, k , λ, α)Input: Weighted histogram set H, integer k ≥ 1, real λ ∈ [0, 1], real α ∈ R;

Let C ← hj with uniform probability ;

for i = 2, 3, ..., k do

Pi k at random histogram h ∈ H with probability:

πH(h).=

whMλ,α(ch : h : ch)∑

y∈H wyMλ,α(cy : y : cy ), (1)

//where (ch, ch).= arg min(z ,z)∈C Mλ,α(z : h : z);

C ← C ∪ (h, h);end

Output: Set of initial luster enters C;→ Guaranteed probabilisti bound. Just need to initialize! No entroid

omputations

© 2014 Frank Nielsen 7.Mixed divergen es 58/75

Page 59: Fundamentals cig 4thdec

Learning MMs: A geometri hard lustering viewpoint

Learn the parameters of a mixture m(x) =∑k

i=1

wip(x |θi)Maximize the omplete data likelihood= lustering obje tive fun tion

max

W ,Λlc(W ,Λ) =

n∑

i=1

k∑

j=1

zi ,j log(wjp(xi |θj))

= max

Λ

n∑

i=1

kmax

j=1

log(wjp(xi |θj))

≡ min

W ,Λ

n∑

i=1

k

min

j=1

Dj(xi ) ,

where cj = (wj , θj) ( luster prototype) and Dj(xi ) = − log p(xi |θj)− logwj

are potential distan e-like fun tions.

further atta h to ea h luster a dierent family of probability distributions.

© 2014 Frank Nielsen 7.Mixed divergen es 59/75

Page 60: Fundamentals cig 4thdec

Generalized k-MLE for learning statisti al mixtures

Model-based lustering: Assignment of points to lusters:

Dwj ,θj ,Fj(x) = − log pFj

(x ; θj )− logwj

k-GMLE:

1. Initialize weight W ∈ ∆k and family type (F1

, ...,Fk ) for ea h luster

2. Solve minΛ∑

i minj Dj(xi ) ( enter-based lustering for W xed) with

potential fun tions: Dj(xi) = − log pFj(xi |θj)− logwj

3. Solve family types maximizing the MLE in ea h luster Cj by hoosing

the parametri family of distributions Fj = F (γj) that yields the best

likelihood: minF1

=F (γ1

),...,Fk=F (γk )∈F (γ)

i minj Dwj ,θj ,Fj(xi).

∀l , γl = maxj F∗j (ηl =

1

nl

x∈Cltj(x)) +

1

nl

x∈Clk(x).

4. Update weight W as the luster point proportion

5. Test for onvergen e and go to step 2) otherwise.

Drawba k = biased, non- onsistent estimator due to Voronoi support

trun ation.

© 2014 Frank Nielsen 8.k-GMLE 60/75

Page 61: Fundamentals cig 4thdec

Computing f -divergen es for

generi f : Beyond sto hasti

numeri al integration

© 2014 Frank Nielsen 9.Computing f -divergen es 61/75

Page 62: Fundamentals cig 4thdec

f -divergen es

If (X1

: X2

) =

x1

(x)f

(

x2

(x)

x1

(x)

)

dν(x) ≥ 0

Name of the f -divergen e Formula If (P : Q) Generator f (u) with f (1) = 0

Total variation (metri )

1

2

|p(x) − q(x)|dν(x) 1

2

|u − 1|Squared Hellinger

(√

p(x) −√

q(x))2dν(x) (√u − 1)2

Pearson χ2

P

∫ (q(x)−p(x))2

p(x)dν(x) (u − 1)2

Neyman χ2

N

∫ (p(x)−q(x))2

q(x)dν(x)

(1−u)2

u

Pearson-Vajda χkP

∫ (q(x)−λp(x))k

pk−1(x)dν(x) (u − 1)k

Pearson-Vajda |χ|kP∫ |q(x)−λp(x)|k

pk−1(x)dν(x) |u − 1|k

Kullba k-Leibler

p(x) logp(x)q(x)

dν(x) − log u

reverse Kullba k-Leibler

q(x) logq(x)p(x)

dν(x) u log u

α-divergen e4

1−α2(1 −

p1−α

2 (x)q1+α(x)dν(x)) 4

1−α2(1 − u

1+α

2 )

Jensen-Shannon

1

2

(p(x) log2p(x)

p(x)+q(x)+ q(x) log

2q(x)p(x)+q(x)

)dν(x) −(u + 1) log 1+u2

+ u log u

© 2014 Frank Nielsen 9.Computing f -divergen es 62/75

Page 63: Fundamentals cig 4thdec

f -divergen es and higher-order Vajda χkdivergen es

If (X1

: X2

) =

∞∑

k=0

f (k)(1)

k!χkP(X1

: X2

)

χkP(X1

: X2

) =

(x2

(x) − x1

(x))k

x1

(x)k−1

dν(x),

|χ|kP(X1

: X2

) =

∫ |x2

(x)− x1

(x)|kx1

(x)k−1

dν(x),

are f -divergen es for the generators (u − 1)k and |u − 1|k .

When k = 1, χ1

P(X1

: X2

) =∫

(x1

(x)− x2

(x))dν(x) = 0 (never

dis riminative), and |χ1

P |(X1

,X2

) is twi e the total variation distan e.

χkP is a signed distan e

© 2014 Frank Nielsen 9.Computing f -divergen es 63/75

Page 64: Fundamentals cig 4thdec

Ane exponential families

Canoni al de omposition of the probability measure:

pθ(x) = exp(〈t(x), θ〉 − F (θ) + k(x)),

onsider natural parameter spa e Θ ane (like multinomials).

Poi(λ) : p(x |λ) = λxe−λ

x!, λ > 0, x ∈ 0, 1, ...

NorI (µ) : p(x |µ) = (2π)−d2 e−

1

2

(x−µ)⊤(x−µ), µ ∈ Rd , x ∈ R

d

Family θ Θ F (θ) k(x) t(x) ν

Poisson log λ R eθ − log x! x νcIso.Gaussian µ R

d 1

2

θ⊤θ d2

log 2π − 1

2

x⊤x x νL

© 2014 Frank Nielsen 9.Computing f -divergen es 64/75

Page 65: Fundamentals cig 4thdec

Higher-order Vajda χkdivergen es

The (signed) χkP distan e between members X

1

∼ EF (θ1) and X2

∼ EF (θ2) ofthe same ane exponential family is (k ∈ N) always bounded and equal to:

χkP(X1

: X2

) =k∑

j=0

(−1)k−j

(

k

j

)

eF ((1−j)θ1

+jθ2

)

e(1−j)F (θ1

)+jF (θ2

)

For Poisson/Normal distributions, we get losed-form formula:

χkP(λ1 : λ2) =

k∑

j=0

(−1)k−j

(

k

j

)

eλ1−j1

λj2

−((1−j)λ1

+jλ2

),

χkP(µ1

: µ2

) =

k∑

j=0

(−1)k−j

(

k

j

)

e1

2

j(j−1)(µ1

−µ2

)⊤(µ1

−µ2

).

© 2014 Frank Nielsen 9.Computing f -divergen es 65/75

Page 66: Fundamentals cig 4thdec

f -divergen es: Analyti formula [14

λ = 1 ∈ int(dom(f (i))), f -divergen e (Theorem 1 of [3):

If (X1

: X2

)−s

k=0

f (k)(1)

k!χkP(X1

: X2

)

≤ 1

(s + 1)!‖f (s+1)‖∞(M −m)s ,

where ‖f (s+1)‖∞ = supt∈[m,M] |f (s+1)(t)| and m ≤ pq≤ M.

λ = 0 (whenever 0 ∈ int(dom(f (i)))) and ane exponential families,

simpler expression:

If (X1

: X2

) =

∞∑

i=0

f (i)(0)

i !I1−i ,i(θ1 : θ2),

I1−i ,i(θ1 : θ2) =

eF (iθ2+(1−i)θ1

)

e iF (θ2)+(1−i)F (θ1

).

© 2014 Frank Nielsen 9.Computing f -divergen es 66/75

Page 67: Fundamentals cig 4thdec

Designing onformal

divergen es: Finding

graphi al gaps!

© 2014 Frank Nielsen 10.Conformal divergen es 67/75

Page 68: Fundamentals cig 4thdec

Geometri ally designed divergen es

Plot of the onvex generator F .

q pp+q

2

B(p : q)

J(p, q)

tB(p : q)

F : (x, F (x))

(p, F (p))

(q, F (q))

© 2014 Frank Nielsen 10.Conformal divergen es 68/75

Page 69: Fundamentals cig 4thdec

Divergen es: skew Jensen & Bregman divergen es

F a smooth onvex fun tion, the generator.

Skew Jensen divergen es:

J ′α(p : q) = αF (p) + (1− α)F (q) − F (αp + (1− α)q),

= (F (p)F (q))α − F ((pq)α),

where (pq)γ = γp + (1− γ)q = q + γ(p − q) and(F (p)F (q))γ = γF (p) + (1− γ)F (q) = F (q) + γ(F (p)− F (q)).

Bregman divergen es:

B(p : q) = F (p)− F (q)− 〈p − q,∇F (q)〉,

lim

α→0

Jα(p : q) = B(p : q),

lim

α→1

Jα(p : q) = B(q : p).

Statisti al skewed Bhatta harrya divergen e:

Bhat(p1

: p2

) = − log

p1

(x)αp2

(x)1−αdν(x) = J ′α(θ1 : θ2)

for exponential families [21.

© 2014 Frank Nielsen 10.Conformal divergen es 69/75

Page 70: Fundamentals cig 4thdec

Total Bregman divergen es [13

Conformal divergen e, onformal fa tor ρ:

D ′(p : q) = ρ(p, q)D(p : q)

plays the rle of regularizer [35

Invarian e by rotation of the axes of the design spa e

tB(p : q) =B(p : q)

1+ 〈∇F (q),∇F (q)〉= ρB(q)B(p : q),

ρB(q) =1

1+ 〈∇F (q),∇F (q)〉.

For example, total squared Eu lidean divergen e:

tE (p, q) =1

2

〈p − q, p − q〉√

1+ 〈q, q〉.

© 2014 Frank Nielsen 10.Conformal divergen es 70/75

Page 71: Fundamentals cig 4thdec

Total skew Jensen divergen es [27

tB(p : q) = ρB(q)B(p : q), ρB(q) =

1

1+ 〈∇F (q),∇F (q)〉

tJα(p : q) = ρJ(p, q)Jα(p : q), ρJ(p, q) =

1

1+ (F (p)−F (q))2

〈p−q,p−q〉

Jensen-Shannon divergen e, square root is a metri :

JS(p, q) =1

2

d∑

i=1

pi log2pi

pi + qi+

1

2

d∑

i=1

qi log2qi

pi + qi

But the square root of the total Jensen-Shannon divergen e is not a metri .

© 2014 Frank Nielsen 10.Conformal divergen es 71/75

Page 72: Fundamentals cig 4thdec

Summary: Geometri Computing in Information Spa es

Lo ation-s ale families, spheri al normal, symmetri positive denite

matri es → hyperboli geometry.

Hyperboli geometry: CG ane onstru tions in Klein disk

Spa e of spheres in dually ane onne tion geometry

Syntheti geometry for hara terizing the best error exponent in Bayes

error

Conformal divergen es: total Bregman/total Jensen divergen es

Clustering using pair of entroids for lusters using mixed divergen es for

symmetrized alpha divergen es

Learning statisi al mixtures maximizing the omplete likelihood as a

sequen e of geometri lustering problems: k-GLME

In sear h of losed-form solutions: Jereys entroid using Lambert W

fun tion, f -divergen e approximation for ane exponential families.

© 2014 Frank Nielsen 10.Conformal divergen es 72/75

Page 73: Fundamentals cig 4thdec

Computational Information Geometry (Edited books)

[19 [18

http://www.springer. om/engineering/signals/book/978-3-642-30231-2

http://www.sony sl. o.jp/person/nielsen/infogeo/MIG/MIGBOOKWEB/

http://www.springer. om/engineering/signals/book/978-3-319-05316-5

http://www.sony sl. o.jp/person/nielsen/infogeo/GTI/Geometri TheoryOfInformation.html

© 2014 Frank Nielsen 11.Referen es 73/75

Page 74: Fundamentals cig 4thdec

Geometri S ien es of Information (GSI) 2015

O tober 28-30th 2015. Deadline 1st Mar h 2015

http://www.gsi2015.org/

© 2014 Frank Nielsen 11.Referen es 74/75

Page 75: Fundamentals cig 4thdec

Thank you!

© 2014 Frank Nielsen 11.Referen es 75/75

Page 76: Fundamentals cig 4thdec

Mar Arnaudon and Frank Nielsen.

On approximating the Riemannian 1- enter.

Comput. Geom. Theory Appl., 46(1):93104, January 2013.

Mar Arnaudon and Frank Nielsen.

On approximating the Riemannian 1- enter.

Computational Geometry, 46(1):93 104, 2013.

N.S. Barnett, P. Cerone, S.S. Dragomir, and A. Sofo.

Approximating Csiszár f -divergen e by the use of Taylor's formula with integral remainder.

Mathemati al Inequalities & Appli ations, 5(3):417434, 2002.

D. A. Barry, P. J. Culligan-Hensley, and S. J. Barry.

Real values of the W -fun tion.

ACM Trans. Math. Softw., 21(2):161171, June 1995.

Jean-Daniel Boissonnat and Christophe Delage.

Convex hull and Voronoi diagram of additively weighted points.

In Gerth St¸lting Brodal and Stefano Leonardi, editors, ESA, volume 3669 of Le ture Notes in Computer

S ien e, pages 367378. Springer, 2005.

Jean-Daniel Boissonnat, Frank Nielsen, and Ri hard No k.

Bregman Voronoi diagrams.

Dis rete and Computational Geometry, 44(2):281307, April 2010.

Herman Cherno.

A measure of asymptoti e ien y for tests of a hypothesis based on the sum of observations.

Annals of Mathemati al Statisti s, 23:493507, 1952.

Pas al Chossat and Olivier P. Faugeras.

Hyperboli planforms in relation to visual edges and textures per eption.

PLoS Computational Biology, 5(12), 2009.

Andrzej Ci ho ki, Sergio Cru es, and Shun-i hi Amari.

Generalized alpha-beta divergen es and their appli ation to robust nonnegative matrix fa torization.

© 2014 Frank Nielsen 11.Referen es 75/75

Page 77: Fundamentals cig 4thdec

Entropy, 13(1):134170, 2011.

P. Thomas Flet her, Conglin Lu, Stephen M. Pizer, and Sarang C. Joshi.

Prin ipal geodesi analysis for the study of nonlinear statisti s of shape.

IEEE Trans. Med. Imaging, 23(8):9951005, 2004.

Bernd Gärtner and Sven S hönherr.

An e ient, exa t, and generi quadrati programming solver for geometri optimization.

In Pro eedings of the sixteenth annual symposium on Computational geometry, pages 110118. ACM, 2000.

Harold Hotelling.

Meizhu Liu, Baba C. Vemuri, Shun-i hi Amari, and Frank Nielsen.

Shape retrieval using hierar hi al total Bregman soft lustering.

Transa tions on Pattern Analysis and Ma hine Intelligen e, 34(12):24072419, 2012.

F. Nielsen and R. No k.

On the hi square and higher-order hi distan es for approximating f -divergen es.

Signal Pro essing Letters, IEEE, 21(1):1013, 2014.

Frank Nielsen.

Legendre transformation and information geometry.

Te hni al Report CIG-MEMO2, September 2010.

Frank Nielsen.

Jereys entroids: A losed-form expression for positive histograms and a guaranteed tight approximation for

frequen y histograms.

Signal Pro essing Letters, IEEE, PP(99):11, 2013.

Frank Nielsen.

Generalized bhatta haryya and herno upper bounds on bayes error using quasi-arithmeti means.

Pattern Re ognition Letters, 42:2534, 2014.

Frank Nielsen.

Geometri Theory of Information.

© 2014 Frank Nielsen 11.Referen es 75/75

Page 78: Fundamentals cig 4thdec

Springer, 2014.

Frank Nielsen and Rajendra Bhatia, editors.

Matrix Information Geometry (Revised Invited Papers). Springer, 2012.

Frank Nielsen and Sylvain Boltz.

The Burbea-Rao and Bhatta haryya entroids.

IEEE Transa tions on Information Theory, 57(8):54555466, 2011.

Frank Nielsen and Sylvain Boltz.

The Burbea-Rao and Bhatta haryya entroids.

IEEE Transa tions on Information Theory, 57(8):54555466, August 2011.

Frank Nielsen and Ri hard No k.

On approximating the smallest en losing Bregman balls.

In Pro eedings of the Twenty-se ond Annual Symposium on Computational Geometry, SCG '06, pages 485486,

New York, NY, USA, 2006. ACM.

Frank Nielsen and Ri hard No k.

On the smallest en losing information disk.

Information Pro essing Letters (IPL), 105(3):9397, 2008.

Frank Nielsen and Ri hard No k.

The dual Voronoi diagrams with respe t to representational Bregman divergen es.

In International Symposium on Voronoi Diagrams (ISVD), pages 7178, 2009.

Frank Nielsen and Ri hard No k.

Hyperboli Voronoi diagrams made easy.

In 2013 13th International Conferen e on Computational S ien e and Its Appli ations, pages 7480. IEEE, 2010.

Frank Nielsen and Ri hard No k.

Hyperboli Voronoi diagrams made easy.

In International Conferen e on Computational S ien e and its Appli ations (ICCSA), volume 1, pages 7480, Los

Alamitos, CA, USA, mar h 2010. IEEE Computer So iety.

Frank Nielsen and Ri hard No k.

© 2014 Frank Nielsen 11.Referen es 75/75

Page 79: Fundamentals cig 4thdec

Total jensen divergen es: Denition, properties and k-means++ lustering.

CoRR, abs/1309.7109, 2013.

Frank Nielsen and Ri hard No k.

Visualizing hyperboli Voronoi diagrams.

In Pro eedings of the Thirtieth Annual Symposium on Computational Geometry, SOCG'14, pages 90:9090:91,

New York, NY, USA, 2014. ACM.

Frank Nielsen and Ri hard No k.

Visualizing hyperboli Voronoi diagrams.

In Symposium on Computational Geometry, page 90, 2014.

Frank Nielsen, Ri hard No k, and Shun-i hi Amari.

On lustering histograms with k-means by using mixed α-divergen es.

Entropy, 16(6):32733301, 2014.

Frank Nielsen, Paolo Piro, and Mi hel Barlaud.

Bregman vantage point trees for e ient nearest neighbor queries.

In Pro eedings of the 2009 IEEE International Conferen e on Multimedia and Expo (ICME), pages 878881, 2009.

Ri hard No k and Frank Nielsen.

Fitting the smallest en losing Bregman ball.

In Ma hine Learning, volume 3720 of Le ture Notes in Computer S ien e, pages 649656. Springer Berlin

Heidelberg, 2005.

Calyampudi Radhakrishna Rao.

Information and the a ura y attainable in the estimation of statisti al parameters.

Bulletin of the Cal utta Mathemati al So iety, 37:8189, 1945.

Ivor W. Tsang, Andras Ko sor, and James T. Kwok.

Simpler ore ve tor ma hines with en losing balls.

In Pro eedings of the 24th International Conferen e on Ma hine Learning (ICML), pages 911918, New York,

NY, USA, 2007. ACM.

Baba Vemuri, Meizhu Liu, Shun-i hi Amari, and Frank Nielsen.

Total Bregman divergen e and its appli ations to DTI analysis.

© 2014 Frank Nielsen 11.Referen es 75/75

Page 80: Fundamentals cig 4thdec

IEEE Transa tions on Medi al Imaging, pages 475483, 2011.

Huaiyu Zhu and Ri hard Rohwer.

Measurements of generalisation based on information geometry.

In StephenW. Ella ott, JohnC. Mason, and IainJ. Anderson, editors, Mathemati s of Neural Networks, volume 8

of Operations Resear h/Computer S ien e Interfa es Series, pages 394398. Springer US, 1997.

© 2014 Frank Nielsen 11.Referen es 75/75