SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data...

27
SOM for non vectorial data SOMbrero References SOMbrero: an R package for numeric and non-numeric self-organizing maps Nathalie Villa-Vialaneix with J. Boelaert, L. Bendhaïba, M. Olteanu [email protected] http://www.nathalievilla.org WSOM 2014 - Mittweida, Germany - July 4th Nathalie Villa-Vialaneix | SOMbrero 1/13

Transcript of SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data...

Page 1: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

SOMbrero: an R package for numeric andnon-numeric self-organizing maps

Nathalie Villa-Vialaneixwith J. Boelaert, L. Bendhaïba, M. Olteanu

[email protected]://www.nathalievilla.org

WSOM 2014 - Mittweida, Germany - July 4th

Nathalie Villa-Vialaneix | SOMbrero 1/13

Page 2: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Outline

1 a short review of Self-Organizing Maps for non vectorial data

2 SOMbrero

Nathalie Villa-Vialaneix | SOMbrero 2/13

Page 3: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Outline

1 a short review of Self-Organizing Maps for non vectorial data

2 SOMbrero

Nathalie Villa-Vialaneix | SOMbrero 3/13

Page 4: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Basics on stochastic SOM[Kohonen, 2001]

x

x

x

(xi)i=1,...,n ⊂ Rd are affected to a unit C(xi) ∈ {1, . . . ,U}

the grid is equipped with a “distance” between units: d(u, u′)and observations affected to close units are close in Rd

every unit u corresponds to a prototype, pu (x) in Rd

x

x

x

Iterative learning (representation step): all prototypes inneighboring units are updated with a gradient descent like step:

pt+1u ←− pt

u + µ(t)Ht(d(C(xi), u))(xi − ptu)

Nathalie Villa-Vialaneix | SOMbrero 4/13

Page 5: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Basics on stochastic SOM[Kohonen, 2001]

x

x

x

Iterative learning (affectation step): xi is picked at random within(xk )k and affected to best matching unit:

C(xi) = arg minu‖xi − pu‖

2

x

x

x

Iterative learning (representation step): all prototypes inneighboring units are updated with a gradient descent like step:

pt+1u ←− pt

u + µ(t)Ht(d(C(xi), u))(xi − ptu)

Nathalie Villa-Vialaneix | SOMbrero 4/13

Page 6: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Basics on stochastic SOM[Kohonen, 2001]

x

x

x

Iterative learning (representation step): all prototypes inneighboring units are updated with a gradient descent like step:

pt+1u ←− pt

u + µ(t)Ht(d(C(xi), u))(xi − ptu)

Nathalie Villa-Vialaneix | SOMbrero 4/13

Page 7: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Extensions to non vectorial data 1KORRESP [Cottrell et al., 1993]

Data: contingency table T = (nij)ij with p rows and q columnstransformed into a numeric dataset X:

X =

columns rows

columns

rows

column profile

row profile

with∀ i = 1, . . . , p and ∀ j = 1, . . . , q, xij =

nijni.×

√nn.j

X =

columns rows

columns

rows

augmentedcolumn profile

augmented rowprofile

column profile

row profile

affectation uses reduced profilerepresentation uses augmented profilealternatively process row profiles and column profiles

Nathalie Villa-Vialaneix | SOMbrero 5/13

Page 8: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Extensions to non vectorial data 1KORRESP [Cottrell et al., 1993]

Data: contingency table T = (nij)ij with p rows and q columnstransformed into a numeric dataset X:

X =

columns rows

columns

rows

augmentedcolumn profile

augmented rowprofile

with∀ i = 1, . . . , p and ∀ j = q + 1, . . . , q + p, xij = xk(i)+p,j withk(i) = arg maxk=1,...,q xik

X =

columns rows

columns

rows

augmentedcolumn profile

augmented rowprofile

column profile

row profile

affectation uses reduced profilerepresentation uses augmented profilealternatively process row profiles and column profiles

Nathalie Villa-Vialaneix | SOMbrero 5/13

Page 9: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Extensions to non vectorial data 1KORRESP [Cottrell et al., 1993]

Data: contingency table T = (nij)ij with p rows and q columnstransformed into a numeric dataset X:

X =

columns rows

columns

rows

augmentedcolumn profile

augmented rowprofile

column profile

row profile

affectation uses reduced profilerepresentation uses augmented profilealternatively process row profiles and column profiles

Nathalie Villa-Vialaneix | SOMbrero 5/13

Page 10: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Extensions to non vectorial data 2Relational SOM

[Hammer and Hasenfuss, 2010, Olteanu and Villa-Vialaneix, 2014]

Data: described by a dissimilarity matrix D = (δ(xi , xj))i, j=1,...,n

((xi)i not necessarily vectorial)

Adaptations of the SOM algorithm:prototypes: expressed as (symbolic) convex combination of(xi)i : pu ∼

∑ni=1 γuixi , γui ≥ 0 and

∑i γui = 1

distance computation: ‖xi − pu‖2 replaced by

(Dγu)i −12γT

u Dγu

in reference to a pseudo-Euclidean framework [Goldfarb, 1984]

representation: replaced by an update of (γu)u:

γt+1u ← γt

u + µ(t)Ht(d(C(xi), u))(1i − γ

tu

)with 1il = 1 if l = i and 0 otherwise.

Nathalie Villa-Vialaneix | SOMbrero 6/13

Page 11: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Extensions to non vectorial data 2Relational SOM

[Hammer and Hasenfuss, 2010, Olteanu and Villa-Vialaneix, 2014]

Data: described by a dissimilarity matrix D = (δ(xi , xj))i, j=1,...,n

((xi)i not necessarily vectorial)Adaptations of the SOM algorithm:

prototypes: expressed as (symbolic) convex combination of(xi)i : pu ∼

∑ni=1 γuixi , γui ≥ 0 and

∑i γui = 1

distance computation: ‖xi − pu‖2 replaced by

(Dγu)i −12γT

u Dγu

in reference to a pseudo-Euclidean framework [Goldfarb, 1984]

representation: replaced by an update of (γu)u:

γt+1u ← γt

u + µ(t)Ht(d(C(xi), u))(1i − γ

tu

)with 1il = 1 if l = i and 0 otherwise.

Nathalie Villa-Vialaneix | SOMbrero 6/13

Page 12: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Extensions to non vectorial data 2Relational SOM

[Hammer and Hasenfuss, 2010, Olteanu and Villa-Vialaneix, 2014]

Data: described by a dissimilarity matrix D = (δ(xi , xj))i, j=1,...,n

((xi)i not necessarily vectorial)Adaptations of the SOM algorithm:

prototypes: expressed as (symbolic) convex combination of(xi)i : pu ∼

∑ni=1 γuixi , γui ≥ 0 and

∑i γui = 1

distance computation: ‖xi − pu‖2 replaced by

(Dγu)i −12γT

u Dγu

in reference to a pseudo-Euclidean framework [Goldfarb, 1984]

representation: replaced by an update of (γu)u:

γt+1u ← γt

u + µ(t)Ht(d(C(xi), u))(1i − γ

tu

)with 1il = 1 if l = i and 0 otherwise.

Nathalie Villa-Vialaneix | SOMbrero 6/13

Page 13: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Extensions to non vectorial data 2Relational SOM

[Hammer and Hasenfuss, 2010, Olteanu and Villa-Vialaneix, 2014]

Data: described by a dissimilarity matrix D = (δ(xi , xj))i, j=1,...,n

((xi)i not necessarily vectorial)Adaptations of the SOM algorithm:

prototypes: expressed as (symbolic) convex combination of(xi)i : pu ∼

∑ni=1 γuixi , γui ≥ 0 and

∑i γui = 1

distance computation: ‖xi − pu‖2 replaced by

(Dγu)i −12γT

u Dγu

in reference to a pseudo-Euclidean framework [Goldfarb, 1984]

representation: replaced by an update of (γu)u:

γt+1u ← γt

u + µ(t)Ht(d(C(xi), u))(1i − γ

tu

)with 1il = 1 if l = i and 0 otherwise.

Nathalie Villa-Vialaneix | SOMbrero 6/13

Page 14: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Outline

1 a short review of Self-Organizing Maps for non vectorial data

2 SOMbrero

Nathalie Villa-Vialaneix | SOMbrero 7/13

Page 15: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

What is it?

SOMbrero is an R package implementing stochastic variantsof SOM for non vectorial data (see yasomi for batch versions)

first release: March 2013; latest release: November 2013(version 0.4-1)

depends on R (version ≥ 3.0) http://www.r-project.org

and on several packages available on CRAN:

wordcloud, igraph, RColorBrewer, scatterplot3d, knitr,shiny

available at http://sombrero.r-forge.r-project.org(licence GPL) and can be installed from inside R using

install.packages("SOMbrero",repos="http://R-Forge.R-project.org")

Nathalie Villa-Vialaneix | SOMbrero 8/13

Page 16: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Features

1 3 algorithms available through one function trainSOMnumeric SOM (input: (n × p)-matrix with n observations of pvariables)KORRESP (input: (p × q)-contingency table)relational SOM (input: (n × n)-dissimilarity matrix for nindividuals)

2 many graphics3 super-clustering (HC on prototypes) with associated graphics4 quality measures (quantization error, topographic error) withquality

Nathalie Villa-Vialaneix | SOMbrero 9/13

Page 17: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Features

1 3 algorithms available through one function trainSOM2 many graphics available through one function plot with two

main arguments what (prototypes, observations, additionalvariable) and type (color, 3d, barplot, poly.dist, words,pie...)

3 super-clustering (HC on prototypes) with associated graphics4 quality measures (quantization error, topographic error) withquality

Nathalie Villa-Vialaneix | SOMbrero 9/13

Page 18: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Features

1 3 algorithms available through one function trainSOM2 many graphics3 super-clustering (HC on prototypes) with associated graphics

through functions superClass and plot

4 quality measures (quantization error, topographic error) withquality

Nathalie Villa-Vialaneix | SOMbrero 9/13

Page 19: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Features

1 3 algorithms available through one function trainSOM2 many graphics3 super-clustering (HC on prototypes) with associated graphics4 quality measures (quantization error, topographic error) withquality

Nathalie Villa-Vialaneix | SOMbrero 9/13

Page 20: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Start with SOMbrero

3 datasets corresponding to the three algorithms (iris,presidentielles2002 and lesmis, a graph from « LesMisérables »)

comprehensive (HTML) vignettes included in the package andavailable on the websiteWeb User Interface (made with shiny) for using the packageeven if you do not know R programming language (included inthe package or available online athttp://shiny.nathalievilla.org/sombrero but can bevery slow)

Tested on an historian and approved!

Nathalie Villa-Vialaneix | SOMbrero 10/13

Page 21: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Start with SOMbrero

3 datasets corresponding to the three algorithms (iris,presidentielles2002 and lesmis, a graph from « LesMisérables »)comprehensive (HTML) vignettes included in the package andavailable on the website

Web User Interface (made with shiny) for using the packageeven if you do not know R programming language (included inthe package or available online athttp://shiny.nathalievilla.org/sombrero but can bevery slow)

Tested on an historian and approved!

Nathalie Villa-Vialaneix | SOMbrero 10/13

Page 22: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Start with SOMbrero

3 datasets corresponding to the three algorithms (iris,presidentielles2002 and lesmis, a graph from « LesMisérables »)comprehensive (HTML) vignettes included in the package andavailable on the websiteWeb User Interface (made with shiny) for using the packageeven if you do not know R programming language (included inthe package or available online athttp://shiny.nathalievilla.org/sombrero but can bevery slow)

Tested on an historian and approved!

Nathalie Villa-Vialaneix | SOMbrero 10/13

Page 23: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

The demo...

Let’s go into SOMbrero...

disclaimer: as is standard during demos, something nasty might happen

and nothing would work due to some weird technical issues... and the

speaker would look like an idiot!

Licence CC BY-SA 2.5, Soljaguar

Nathalie Villa-Vialaneix | SOMbrero 11/13

Page 24: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Conclusion

SOMbrero

is easy to use (with a simple graphical interface)

can be used with various data

contains many tools for interpreting the results

... has unfortunately been implemented by girls so defaultcolors may not be suited for men (but they can easily changethem)

Perspectives

speed up the code

more quality criteria

more options (i.e., Gaussian neighborhood, weightedobservations...)

Nathalie Villa-Vialaneix | SOMbrero 12/13

Page 25: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Conclusion

SOMbrero

is easy to use (with a simple graphical interface)

can be used with various data

contains many tools for interpreting the results

... has unfortunately been implemented by girls so defaultcolors may not be suited for men (but they can easily changethem)

Perspectives

speed up the code

more quality criteria

more options (i.e., Gaussian neighborhood, weightedobservations...)

Nathalie Villa-Vialaneix | SOMbrero 12/13

Page 26: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Thank you for your attention...

... questions?

Nathalie Villa-Vialaneix | SOMbrero 13/13

Page 27: SOMbrero: an R package for numeric and non-numeric self ...€¦ · SOM for non vectorial data SOMbrero References Extensions to non vectorial data 1 KORRESP [Cottrell et al., 1993]

SOM for non vectorial data SOMbrero References

Cottrell, M., Letrémy, P., and Roy, E. (1993).Analyzing a contingency table with Kohonen maps: a factorial correspondence analysis.In Cabestany, J., Mary, J., and Prieto, A. E., editors, Proceedings of International Workshop on Artificial NeuralNetworks (IWANN 93), Lecture Notes in Computer Science, pages 305–311. Springer Verlag.

Goldfarb, L. (1984).A unified approach to pattern recognition.Pattern Recognition, 17(5):575–582.

Hammer, B. and Hasenfuss, A. (2010).Topographic mapping of large dissimilarity data sets.Neural Computation, 22(9):2229–2284.

Kohonen, T. (2001).Self-Organizing Maps, 3rd Edition, volume 30.Springer, Berlin, Heidelberg, New York.

Olteanu, M. and Villa-Vialaneix, N. (2014).On-line relational and multiple relational SOM.Neurocomputing.Forthcoming.

Nathalie Villa-Vialaneix | SOMbrero 13/13