Families of Triangular Norm Based Kernel Function and Its Application to Kernel k-means

/ 22SCIS&ISIS20162016.08.27

Kazushi Okamoto The University of Electro-Communications

/ 22SCIS&ISIS20162016.08.27

Introduction

• A kernel method is a fundamental and important pattern analysis approach based on a kernel function

• It is used in machine learning tasks such as classification, clustering, and dimension reduction

• A kernel function corresponds to a similarity measure between two data • mapping each data to a high-dimensional feature space • inner product on that space

SCIS&ISIS20162016.08.27 / 22

Existing Kernel Functions

linear kernel Klin(x,y) =

Kpol(x,y) =

xiyi + l

Krbf(x,y) = exp

i=1(xi yi)2

Kint(x,y) =

min{xi, yi}

polynomial kernel

RBF kernel

intersection kernel

minimum and product operations are one of triangular norms (generalization of intersection operations)

SCIS&ISIS20162016.08.27 / 22

A function is a positive semi-definite kernel if and only ifK

Positive Semi-Definite Kernel

K : ⌦⇥ ⌦ −! R 8x,y 2 ⌦

8x1,x2, · · · ,xm 2 ⌦

K(x,y) = K(y,x)

cicjK(xi,xj) 0 8ci, cj 2 R

Kernel calculation means inner product on the feature space

K(x,y) = (x) · (y)

8m 2 N+

A set is assumed real-valued vector, graph, and string⌦

(quadratic form)

SCIS&ISIS20162016.08.27 / 22

Additive Kernel

K : Rd ⇥ Rd −! R 8x,y 2 Rd

K(x,y) =

Ki(xi, yi)

Ki : R⇥ R −! R

8KiIf is positive semi-definite, then the additive kernel is also positive semi-definite kernel, since

cjckK(x,y) =

Ki(xi, yi)

cjckKi(xi, yi) 0

SCIS&ISIS20162016.08.27 / 22

Triangular norm (t-norm)

A function is called t-norm if and only ifT

T (x, 1) = x

T (x, y) = T (y, x)

T (x, T (y, z)) = T (T (x, y), z)

T (x, y) T (x, z)

T : [0, 1]⇥ [0, 1] −! [0, 1] 8x, y, z 2 [0, 1]

According to fuzzy logic, t-norms represent intersection operations

SCIS&ISIS20162016.08.27 / 22

Example of t-norms

Hamacher t-norm (p = 0.4)

0 0.2 0.4 0.6 0.8 1 0 0.2

0.4 0.6

Dubois t-norm (p = 0.4)

0 0.2 0.4 0.6 0.8 1 0 0.2

0.4 0.6

Hamacher t-norm

Dubois t-norm Tdb(x, y) =xy

max {x, y, p}

Th(x, y) =xy

p+ (1 p)(x+ y xy)

p 2 [0, 1]

SCIS&ISIS20162016.08.27 / 22

all principal minors of is

Is t-norm Positive Semi-Definite on [0, 1] ?

T (x1, x1) T (x1, x2) · · · T (x1, xm)

T (x2, x1) T (x2, x2) · · · T (x2, xm)

T (xm, x1) T (xm, x2) · · · T (xm, xm)

Condition of positive semi-definite

8ci, cj 2 R 8m 2 N+

()A is positive semi-definite

cicjT (xi, xj)

SCIS&ISIS20162016.08.27 / 22

✓T (x1, x1) T (x1, x2)

T (x2, x1) T (x2, x2)

|T (x1, x1)| 0

|T (x2, x2)| 0

T (x1, x1) T (x1, x2)

T (x2, x1) T (x2, x2)

cicjT (xi, xi) = c

2iT (xi, xi) 0

all principal minors of are 0

SCIS&ISIS20162016.08.27 / 22

T (x1, x1) T (x1, x2)

T (x2, x1) T (x2, x2)

= T (x1, x1)T (x2, x2) T

2(x1, x2)

T (x, y) Ta(x, y) = xy

8x, yz · T (x, y) T (x, zy)

T (0, x2) = 0 < x1 < x2 = T (1, x2) x1 = T (w, x2)

T (x1, x2)

T (x2, x2)T (w, T (x2, x2)) T (w, T (x1, x2))

T (x2, x2)

T (x1, x2)≥ T (w, T (x2, x2))

T (w, T (x1, x2))=

T (T (w, x2), x2)

T (T (w, x2), x1)

T (x1, x2)

T (x1, x1)

SCIS&ISIS20162016.08.27 / 22

Kernel k-means

A partition partition algorithm minimizing the objective function

J = min

||(x) µ||2,

x1,x2, · · · ,xn 2 Rd

||(x) µi||2 = ||(x) 1

= K(x,x) 2

|Ci|2X

Kernel trick

/ 22SCIS&ISIS20162016.08.27

Conditions of Clustering Experiment

• each clustering process was terminated when the number of iterations reached 1,000, or the difference between the latest and current objective function values was less than 10-4

• one partition that minimized the objective function was determined within 100 attempts using different initial partitions

• number of clusters was determined depending on the data set

SCIS&ISIS20162016.08.27 / 22

Applied Kernel Functions

linear kernel Klin(x,y) =

Krbf(x,y) = exp

i=1(xi yi)2

RBF kernel

Kt(x,y) =

Ti(xi, yi)t-norm kernel

Applied non-parameterized t-norms

Tmp(x, y) =2

✓cot

⇡x+ cot

◆Mizumoto product

Tl(x, y) = min{x, y}logical product

SCIS&ISIS20162016.08.27 / 22

Applied Parameterized t-norms

Tdm(x, y) =

r�1xx

⇣1yy

Tdb(x, y) =xy

max {x, y, p}Dubois t-norm

Dombi t-norm

Tf (x, y) = logp

✓1 +

x 1)(p

◆Frank t-norm

Th(x, y) =xy

p+ (1 p)(x+ y xy)

Hamacher t-norm

Ts2(x, y) =1

q1xp +

Ts3(x, y) = 1 pp

p+ (1 y)

p (1 x)

p(1 y)

Schweizer t-norm 2

Schweizer t-norm 3

SCIS&ISIS20162016.08.27 / 22

Evaluation Measure: Adjusted Rand Index (ARI)

nijC2ab

(a+ b) ab

ni·C2

n·jC2

U = {u1, u2, · · · , uM} V = {v1, v2, · · · , vN}

nij = |ui \ vj |

ni· =

nijn·j =

nij n =

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Four Data Sets Used to Numerical Experiment

Data Set BData Set A

Data Set C Data Set D

SCIS&ISIS20162016.08.27 / 22

Best ARI Values for Each Kernel and Data Set

Data Set A Data Set B Data Set C Data Set D

linear kernel 0.4535 0.5767 0.4650 -0.0054

RBF kernel 0.4880 0.5767 0.7611 0.1375

t-norm kernel (logical product) 0.0240 0.5146 0.2990 -0.0037

t-norm kernel (Mizumoto product) 0.4997 0.5528 0.4650 -0.0050

t-norm kernel (Dombi t-norm) 0.5237 0.5612 0.4717 0.0462

t-norm kernel (Dubois t-norm) 0.5117 0.5853 0.4757 0.0315

t-norm kernel (Frank t-norm) 0.4880 0.5767 0.4688 -0.0049

t-norm kernel (Hamacher t-norm) 0.4880 0.5767 0.4650 -0.0046

t-norm kernel (Schweizer t-norm 2) 0.5237 0.5767 0.4717 0.0477

t-norm kernel (Schweizer t-norm 3) 0.5237 0.5767 0.4717 0.0445

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Data Set A

linear kernel

RBF kernel (σ=8.52)

correct cluster

t-norm kernel (Dombi, p=1.98)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Data Set B

linear kernel

correct cluster

t-norm kernel (Dubois, p=0.76)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Data Set C

linear kernel

correct cluster

t-norm kernel (Dubois, p=0.38)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Data Set D

linear kernel

correct cluster

t-norm kernel (Dombi, p=8.95)

/ 22SCIS&ISIS20162016.08.27

Conclusion

• The concept of the t-norm based additive kernel is proposed • Numerical experiment

• ARI values obtained by the proposal were almost the same or higher than those by the linear kernel with all of the data sets

• the proposal slightly improved the ARI values for some data sets compared with the RBF kernel

• the proposed method maps data to a higher dimensional feature space than the linear kernel but the dimension is lower than that of the RBF kernel.

• The t-norm kernel with the Dubois t-norm had a low calculation cost compared with the RBF kernel

Families of Triangular Norm Based Kernel Function and Its Application to Kernel k-means

Science

Transcript of Families of Triangular Norm Based Kernel Function and Its Application to Kernel k-means

Triangular Perspectives

Kernel Architecture : UNIX Kernel

Triangular diagram

Propiedad triangular

Triangular deportiva

ℓ -Norm Multiple Kernel Learning

p-Norm Multiple Kernel Learning Making Learning with ...ml.informatik.uni-kl.de/publications/PhDthesis.pdf · stelle ich den minimalen Wert der Schranken mit den geometrischen Eigenschaften

Teste Triangular

Triangular Cooperation

On the Convergence Rate of ℓp-Norm Multiple Kernel Learning∗

µT-Kernel Specification - TRON · μT-Kernel μT-Kernel

Betonelemente Balkonbrüstungen Norm 520 + 521 · Betonelemente Balkonbrüstungen Norm 520 + 521 . Betonelemente Brüstungstrog Norm 530 . Betonelemente Brüstungstrog Norm 531

p-NormMultikernelLearningApproachfor ... › journals › cin › 2012 › 601296.pdf2.1. p-Norm Multiple Kernel Support Vector Regression. In this section, the idea of p-norm multiple

Triangular Composition

Triangular trade

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING … · is referred to as the L1-norm kernel discriminant analysis (L1-KDA) method in the paper. By utilizing the kernel trick as

Torre Triangular

Linux Kernel Security Overview - Linux Kernel Developernamei.org/presentations/linux-kernel-security-kca09.pdf · Linux Kernel Security Overview Kernel Conference Australia ... Labeled

Cooperación Triangular, Buenas Prácticas, Triangular ...

Use of the Zero-Norm with Linear Models and Kernel Methods · 2017. 7. 22. · Applications The zero norm is directly related to some optimization problems in learning, for ex-ample