Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid Histogram of Visual Words...

26
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid Histogram of Visual Words Image Descriptor SO YEON KIM, KYUNG-AH SOHN DEPARTMENT OF INFORMATION AND COMPUTER ENGINEERING, AJOU UNIVERSITY

Transcript of Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid Histogram of Visual Words...

Mobile Phone Spam Image Detection based on Graph Partitioning with Pyra-mid Histogram of Visual Words Image Descriptor

SO YEON KIM, KYUNG-AH SOHN

DEPARTMENT OF INFORMATION AND COMPUTER ENGINEERING, AJOU UNIVERSITY

ContentsIntroduction

• Motivation and Challenges

Methods• Feature extraction• Database construction• Image classification and evaluation

Results• Dataset• Performance comparison in 5-fold cross validation• Averaged performance comparison in optimal parameter• Misclassified samples in best-performed cluster

Conclusion• Summary and future works

IntroductionMotivation and challenges

Introduction Instead of text spams, image spams are rapidly increasing in mobile phone.

Introduction We want to predict the image spams in our mobile phone.

66 spam images405 non-spam images

377 images for training (80%)

94 images for test (20%)

Too small to train the predictive model Some image spams in e-mail have similar features to mobile phone spams

MethodsFeature extractionDatabase constructionImage classification and evaluation

Methods

Input image

SIFT feature extraction Concatenation of spatial histogramsBag of visual words

Feature vector

Feature vector

RGB histogram feature extraction

… …

Feature extraction◦ RGB histogram◦ Pyramid Histogram of Visual Words (PHOW) – color mode: gray / RGB / opponent

Methods Database construction

◦ K-means clustering◦ Elkan k-means clustering algorithm◦ K-means++ algorithm for initializing centroids

Mobile phone spam images

E-mail spam images

euclideandistance ma-

trixEmai

l im

age

Phone image

K-meansClustering

Most similaremail images Phone + Email

Spam Image Dataset

Phone im-ages

Feature vector

Feature vector

Methods Database construction

◦ Spectral clustering

Mobile phone spam images

E-mail spam images

Feature vector

Feature vector

euclideandistance matrix

Emai

l im

age

Phone image

…similarity matrix

Emai

l im

age

Email image

Methods Database construction

◦ Spectral clustering

Phone + EmailSpam Image Dataset

Phone im-ages

Spectral clustering(normalized cut)

Methods Image classification and Evaluation

SVM classification

spam

hamPhone + EmailSpam Image Dataset

Training set

Test set

e-mail phone

5-fold cross validation

80%

20%

ResultsDatasetPerformance comparison in 5-fold cross validationAveraged performance comparison in optimal parameterMisclassified samples in best-performed cluster

Results Dataset

◦ Similar sub-set of e-mail spam images from Image Spam Hunter dataset.

Phone E-mail Total

Spam

RGB histogram

66

12 78PHOW-gray 201 267PHOW-RGB 20 86

PHOW-opponent 324 390

Non-spam 405 - 405

Similar sub-set from spectral clustering

Results Performance comparison in 5-fold cross validation

◦ Evaluation measure

Predicted

Spam Non-spam

ActualSpam TP FN

Non-spam FP TN

Confusion matrix

Results Performance comparison in 5-fold cross validation

◦ RGB-histogram feature

Results Performance comparison in 5-fold cross validation

◦ PHOW feature (gray mode)

Results Performance comparison in 5-fold cross validation

◦ PHOW feature (RGB color mode)

Results Performance comparison in 5-fold cross validation

◦ PHOW feature (opponent color mode)

Results Sample e-mail spam images

◦ Those are correctly grouped in the same cluster with PHOW descriptor but in a different one with RGB histogram feature.

PHOW descriptor considers not only color distribution but geometric in-formation of images

Results Averaged performance comparison in optimal parameter

PHOW descriptors outperform than RGB his-togram feature

The color mode of PHOW descriptor doesn’t affect the performance significantly

Results Averaged performance comparison in k-means clustering

RGBHistogram

PHOW(gray)

PHOW(RGB)

PHOW(opponent)

random

10%

Accuracy 73.47% 95.12% 95.54% 94.27% 72.25%

Sensitivity 42.42% 92.42% 92.42% 87.91% 32.03%

Specificity 78.52% 95.56% 96.05% 95.31% 78.81%

F-score 30.73% 84.19% 85.49% 81.15% 24.14%

Results Averaged performance comparison in spectral clustering

RGB

histogramPHOW(gray)

PHOW(RGB)

PHOW(opponent) random

σ=0.3 σ=0.6 10%

Accuracy 81.75% 96.39% 96.82% 96.39% 72.25%

Sensitivity 30.55% 95.45% 87.91% 84.95% 32.03%

Specificity 90.12% 96.54% 98.27% 98.27% 78.81%

F-score 32.31% 88.28% 88.48% 86.76% 24.14%

Results

(a) False positives (FP) (b) False negatives (FN)

Misclassified samples in best-performed cluster

ConclusionSummary and future works

Conclusion We proposed a mobile phone spam image filtering system using a large set of e-mail spam im-ages to solve the problem of insufficient phone spam image data.

The performances on phone spam image classification with RGB histogram and PHOW descrip-tor with various color modes (gray, RGB, opponent) are compared.

PHOW descriptor which considers both geometric and color information can improve the per-formance.

An advanced clustering technique such as spectral clustering has positive impact on improve-ment.

Thank you !

Q & A