Ca Foscari 30123 Venezia e visual narrative of Venice: an ...

—

Ca’ FoscariDorsoduro 324630123 Venezia

UniversitàCa’FoscariVenezia Master’s Degree programme — Second Cycle

(D.M. 270/2004)in Informatica — Computer Science

Final �esis

�e visual narrative of Venice: ananalysis of the touristic photographs insocial media

SupervisorCh. Prof. Andrea Torsello

CandidateEric BoscaroMatriculation number 835651

Academic Year2014/2015

3

Dedicated to my family

Abstract

"The popularity and di�usion of Social media have been growing constantlyin the last years, making the automatic understanding of the giant amountof data produced fundamental in discovering recurrent patterns and otherimportant information. While a huge body of work can be found in the liter-ature on the topic of extracting ‘mood’ information about a topic from textualinformation, very little work has been done on the problem of automaticallyanalyzing the visual content of images in social media. In this thesis theimages retrieved from social media are used to analyze how Venice is rep-resented in touristic photographs in di�erent times of the year and in itsdi�erent areas (sestieri). To this end, using techniques mutuated from theobject classi�cation literature, we built a classi�er able to distinguish newphotos’ category, and analyzed the variation of class distribution in spaceand time, thus providing an quantitative characterization of the visual nar-rative of Venice in social media."

i

Contents

1 Introduction 11.0.1 Structure of the thesis . . . . . . . . . . . . . . . . . 1

2 The State of Art 3

3 The Bag of Words model 73.1 The Bag of Words model . . . . . . . . . . . . . . . . . . . . 73.2 The BoW model in Computer Vision . . . . . . . . . . . . . . 93.3 BoW: Feature detection and description . . . . . . . . . . . . 9

3.3.1 SIFT . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3.2 Standard Sift descriptor . . . . . . . . . . . . . . . . 143.3.3 Dense SIFT . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 BoW: Codebook Formation . . . . . . . . . . . . . . . . . . . 183.5 BoW: Learning ad classi�cation . . . . . . . . . . . . . . . . 19

3.5.1 SVM: Support Vector Machines . . . . . . . . . . . . 193.5.2 Non-linear SVM . . . . . . . . . . . . . . . . . . . . . 223.5.3 Multi-class SVM . . . . . . . . . . . . . . . . . . . . 22

4 Construction of the classi�er 254.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

iii

iv CONTENTS

4.2 The classi�er . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Feature extraction from the images . . . . . . . . . . . . . . 334.4 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5 Construction of the Bag of keypoints . . . . . . . . . . . . . 364.6 Classi�cation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.6.1 The parameters of SVM . . . . . . . . . . . . . . . . 374.6.2 The Varies class problem . . . . . . . . . . . . . . . 394.6.3 Final considerations . . . . . . . . . . . . . . . . . . 41

4.7 Display of the results . . . . . . . . . . . . . . . . . . . . . . 424.7.1 Considerations on the classi�er results . . . . . . . . 47

5 Analysis of the results 495.1 The year analysis . . . . . . . . . . . . . . . . . . . . . . . . 49

5.1.1 General Quantitative Analysis . . . . . . . . . . . . . 505.1.2 General Normalized Analysis . . . . . . . . . . . . . 535.1.3 Districts focused Analysis . . . . . . . . . . . . . . . 555.1.4 Analysis of categories densities . . . . . . . . . . . . 615.1.5 The distribution of Varies . . . . . . . . . . . . . . . 65

5.2 A case of study: Carnival 2015 . . . . . . . . . . . . . . . . . 67

6 Conclusions 716.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Appendices 73

A 75A.1 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75A.2 Normalized rate of the districts . . . . . . . . . . . . . . . . . 77

CONTENTS v

A.3 Numerical results of the Carnival 2015 analysis . . . . . . . . 79

vi CONTENTS

List of Figures

3.1 Sift Keypoint in an image . . . . . . . . . . . . . . . . . . . . 103.2 Sift Descriptors: spatial histogram of the image gradient . . 113.3 Canonical SIFT descriptor and spatial binning functions . . . 123.4 Geometry of the Dsift descriptors . . . . . . . . . . . . . . . 153.5 The SVM hyperplane example . . . . . . . . . . . . . . . . . 19

4.1 First category, Lagoon landscape . . . . . . . . . . . . . . . 274.2 Second category, Townscape . . . . . . . . . . . . . . . . . . 274.3 Third category, Art . . . . . . . . . . . . . . . . . . . . . . . 284.4 Fourth category, Folklore . . . . . . . . . . . . . . . . . . . . 284.5 Fifth category, Food . . . . . . . . . . . . . . . . . . . . . . . 294.6 Sixth category, Variessl . . . . . . . . . . . . . . . . . . . . . 294.7 The �ow chart of the classi�cation construction phases and

recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8 A singular feature . . . . . . . . . . . . . . . . . . . . . . . . 334.9 Test of the rate of correctness, changing parameter K . . . . 354.10 Example of a class Histogram . . . . . . . . . . . . . . . . . 364.11 LinearSvm, test of parameter C . . . . . . . . . . . . . . . . 37

vii

viii LIST OF FIGURES

4.12 Comparison of the Confusion Matrix with parameter M=1and M=2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.13 Test 1, correctly classi�ed Folklore image . . . . . . . . . . . 424.14 Test 2, correctly classi�ed Food image . . . . . . . . . . . . . 434.15 Test 3, correctly classi�ed Townscape image . . . . . . . . . 444.16 Test 4, incorrectly classi�ed Townscape image . . . . . . . . 454.17 Test 5, incorrectly classi�ed Folklore image . . . . . . . . . . 46

5.1 Comparison: Images Retrieved and Touristic A�uence in Veniceon years 2014-2015 . . . . . . . . . . . . . . . . . . . . . . . 52

5.2 Quantitative representation of the categories over the monthsof the years 2014-2015 . . . . . . . . . . . . . . . . . . . . . . 53

5.3 Normalized representation of the data categories over themonths of the years 2014-2015 . . . . . . . . . . . . . . . . . 54

5.4 Normalized representation of Folklore over the months of theyears 2014-2015 . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.5 Normalized representation of Lagoon Landscape over the monthsof the years 2014-2015 . . . . . . . . . . . . . . . . . . . . . . 57

5.6 Normalized representation of Townscape over the months ofthe years 2014-2015 . . . . . . . . . . . . . . . . . . . . . . . 58

5.7 Normalized representation of Art over the months of the years2014-2015 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.8 Normalized representation of Food over the months of theyears 2014-2015 . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.9 Heat Map of Lagoon Landscape category, years 2014-2015 . 615.10 Heat Map of Townscape category, years 2014-2015 . . . . . . 625.11 Heat Map of Art category, years 2014-2015 . . . . . . . . . . 62

LIST OF FIGURES ix

5.12 Heat Map of Folklore category, years 2014-2015 . . . . . . . 635.13 Heat Map of Food category, years 2014-2015 . . . . . . . . . 645.14 Heat map of Varies vs the other classes, 2014 . . . . . . . . . 655.15 Heat map of Varies vs the other classes, 2015 . . . . . . . . . 655.16 Folklore category, Carnival vs After Carnival . . . . . . . . . 685.17 Heat map of the Carnival vs After Carnival periods, 2015 . . 695.18 Lagoon and Food, Carnival vs After Carnival . . . . . . . . . 70

A.1 Normalized Rate of the categories over the Venetian Districts,2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

A.2 Normalized Rate of the categories over the Venetian Districts,2015 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

x LIST OF FIGURES

List of Tables

4.1 Test on the parameters C and γ on the SVC with rbf kernel . 394.2 Votes and probabilities of Test 1 . . . . . . . . . . . . . . . . 424.3 Votes and probabilities of Test 2 . . . . . . . . . . . . . . . . 434.4 Votes and probabilities of Test 3 . . . . . . . . . . . . . . . . 444.5 Votes and probabilities of Test 4 . . . . . . . . . . . . . . . . 454.6 Votes and probabilities of Test 5 . . . . . . . . . . . . . . . . 46

5.1 Quantitative data of the year 2014 . . . . . . . . . . . . . . . 505.2 Quantitative data of the year 2015 . . . . . . . . . . . . . . . 505.3 Touristic presence in Venice during the years 2014 and 2015 515.4 Normalized distribution of the data over the months of the

year 2015 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.5 Normalized distribution of the data over the months of the

year 2015 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.6 Normalized category rates, Carnival . . . . . . . . . . . . . . 675.7 Normalized category rates, after Carnival . . . . . . . . . . . 68

A.1 Category results Carnival 2015 . . . . . . . . . . . . . . . . . 79A.2 Category results after Carnival 2015 . . . . . . . . . . . . . . 79

xi

xii LIST OF TABLES

Chapter 1

Introduction

In this work we will show how to construct a classi�er on the pictures ofVenice obtained from Social media using an approach inherited from textclassi�cation called “Bag of Visual Words”, and how the results can be ana-lyzed to discover interesting patterns in the behavior of the tourists visitingthe lagoon city.

1.0.1 Structure of the thesis

The thesis is organized as follows:

1. In chapter 2 we present a summary of the historical approaches to theproblem, focusing then on the concept of “Bag of Words” and similarstrategies,

2. In chapter 3 we provide to the reader an overview of the “Bag of Word”model, how it can adapted to computer vision problems focusing inparticular on the algorithm used as “SIFT” for Feature extraction, “K-means” for Clustering and “SVM” for Classi�cation,

1

2 CHAPTER 1. INTRODUCTION

3. In chapter 4 we discuss the classi�cation algorithm constructed, de-scribing in detail every issues discovered and the performance resultsof each phase,

4. In chapter 5 we present the the results of classi�er applied on the caseof study of the districts of Venice during the year 2014 and 2015, andin particular on the Carnival of 2015,

5. In chapter 6 we report some concluding considerations and discusssome possible future developments.

Chapter 2

The State of Art

Extracting meaningful information from images and using them to learn“scene categories” has always been an important topic of research since thebirth of machine learning and computer vision research �elds, in this chap-ter an overview of the main di�erent methods will be done, focusing then onthe “Bag of Keypoints” approaches and in particular on the work of Csurkaand Dance [9].

The e�orts in the �rst years of research were focused on the categoriza-tion of speci�c patters in images like “faces” [23] or peoples [24], then asin the notable work of Scheneiderman, Kanade [26](cars and pedestrian) andFergus, Perona, Zisserman [12] many di�erent elements were included in thecategorization.

From these years the e�ort on the topic grown exponentially and manydi�erent the techniques for visual categorization of images were implementedhowever the great majority of those works can be approximated to four mainapproaches: the �rst presented is Fine-Grained Recognition, in this set ofmethods the objective is to distinguish between particular subset of cate-

3

4 CHAPTER 2. THE STATE OF ART

gories like car models [28] or dog breeds [16], the key is to localize importantdetails and represent them as global clues, since describing overall featureslike shapes and colors cannot capture little di�erences, to distinguish be-tween di�erent classes is needed a dictionary of “�ne grain” informations.Many di�cult problems such as distinguish between �ower species [22] orfungus [10] can addressed using this methods, however in this case of studysuch “�ne grain” distinction is not necessary since the categories chosen arequite di�erent one from the other so a di�erent approach has been preferred.

Another technique called Annotation-based uses articulated input fromhumans such as asking question about the objects and clicking on particu-lar attributes in a method called “human-in-the-loop” [32] to create humancustomized feature vectors to perform visual classi�cation. This method isused in cases of study of very detailed class categorization and usually thehuman performing the tasks are domain experts, for this reason this kind ofapproach is not ideal in a case of completely automatic learning where theadaptability of the algorithm is a key factor.

The idea behind Template-based approach is to create feature response-map by matching images with a large set of image templates randomly gen-erated, it has been show how using those prede�ned templates as “�lters”, aclassi�er that performs well can be created for tasks such as object recogni-tion [18] or body parts recognition [20].

The �nal approach considered the state of art for solving a general VisualCategorization problem is Bag of Visual Words or Bag of Keypoints, motivatedby Bag of Words learning methods used with success for text-classi�cationby Joachims[15] and Tong [30], it has been adapted to solve the problem ofimage categorization: this approach consists of using clustering on descrip-tors of particular images patches to successively obtain an histogram of the

5

number of occurrences of images patterns and using it as feature vector forthe classi�cation process. The method, that will be explained in detail in thenext chapters has been used as base to perform visual categorization usingthen di�erent classi�cation approaches: in the work of Sudderth, Toralba,Freeman and Willsky [29] after the “Bag of keypoints” construction phase aTransformed Dirichlet model is used to categorize, Sivic, Russle, Efros, Zis-serman and Freeman [27] instead use a Latent Semantic Analysis and LatentDirichlet Allocation models while Fei-Fei and Perona [11] utilize a BayesianHierarchical Method.

In more recent works a Bag of Visual Words model paired with a Pyra-mid match kernel introduced by Grauman and Darell [13] showed promisingresults [33], this particular algorithm based on the idea of mapping the BoWfeatures to multi-dimensional multi-resolution histograms, has the main ad-vantages of a linear complexity (the other methods are generally quadraticor worst) and the fact that the multi-resolution histograms have the ability tocapture co-occurring features; despite those great advantages the more solid,state of art solution of SVM has been chosen to focus more on the analysisof the results of the model on social media and Venice, and less on buildingexperimental solutions for the classi�er.

The work in this thesis is an extension of previously cited paper of Csurkaand Dance, after the image feature extraction and histogram constructionphases a modi�ed version of the Support Vector Machines Algorithm is usedto adapt the classi�cation on the selected photos’ categories of Venice ex-tracted from social networks: such dataset in fact contains pictures takenwith many di�erent cameras (usually mobile phones) so orientation, lightand scale may change in a relevant way, while at the same time many of theimages may retrieved may be useless for the analysis and must be removed,

6 CHAPTER 2. THE STATE OF ART

therefore a method to handle the high data variability and the removal ofundesirable elements has also been studied and implemented.

Chapter 3

The Bag of Words model

In this chapter will be explained in details the Bag of Words model for textcategorization, and how this model can be used also in the computer vision�eld of study to correctly classify images.

3.1 The Bag of Words model

Any text such as a sentence or a document, in the Bag of Words model (orBoW model) is represented as Bag or Multiset of its elements consideringin particular the multiplicity of each word, while at the same time ignoringword order and grammar; the frequency of the occurrences of every word canbe used as feature vector in a classi�er to perform document categorization.For example considering two simple text sentences:

1. Leo wants to win the Oscar, Matt wants the same

2. Matt eventually wins the Oscar

7

8 CHAPTER 3. THE BAG OF WORDS MODEL

The correspondent dictionary is constructed as a list of the words exclud-ing repetitions:

[ Leo , wants , to , win , the ,Oscar , Matt , same , e v e n t u a l l y , wins ]

Since the dictionary is composed by 10 distinct words, each sentence can berepresented by a vector composed by 10 elements:

( 1 ) [ 1 , 2 , 1 , 1 , 2 , 1 , 1 , 1 , 0 , 0 ]( 2 ) [ 0 , 0 , 0 , 0 , 1 , 1 , 1 , 0 , 1 , 1 ]

where each elements of the vector is the number of occurrences of thatparticular word in the sentence, for example the second element in the �rstvector corresponds to the word “wants” and its value is 2 because the wordappears two times in the �rst sentence.This representation is also called Histogram and it is used with success asa feature vectors in many application such as text categorization or e-mail�ltering[25].

3.2. THE BOW MODEL IN COMPUTER VISION 9

3.2 The BoWmodel in Computer Vision

In computer vision a Bag of Visual Words (or Keypoints) is the set of theoccurrences of local image features over the vocabulary of local features,therefore an image is considered in the same way as a text document and theconsequent problem is to de�ne what a word in the image context is.To ful�ll this purpose three main phases are usually followed: feature detec-tion, feature description and codebook formation.

3.3 BoW: Feature detection and description

Starting from an initial set of measured data, the act of feature extractionbuilds a new set of derived values called features which are informative,non redundant used to facilitate the subsequent learning and generalizationphases of machine learning algorithms. It is used also to reduce the numberof informative elements and reduce repetitiveness, in fact, in the case of aBag of Visual Words model and in particular in this thesis where the datasetis formed by pictures, using all the pixels of the whole image set as trainingdata is computationally impracticable and may also cause a considerable lossin the informative capacity of the dataset.

In this work project the features are extracted using a dense version ofthe Scale-invariant feature transform algorithm, and the relative feature de-scription is based on the produced SIFT descriptors; SIFT will be describedin details in the next section.


3.3.1 SIFT: Scale Invariant Feature Transform

Feature generation algorithm published by David Lowe in 1999 [19] the ap-proach transforms the image in a large set of features vectors with the prop-erty of invariance to image translation, scaling, and rotation.

Therefore a SIFT feature is a selected region of the image, called key-

point with associated a descriptor vector: in particular a SIFT keypoint is acircular region of the image with an orientation and it’s composed by fourparameters: the coordinates x and y of the center, the scale (radius of the cir-cle) and the orientation (angle expressed in radiant). By searching for blobs(keypoint’s structure) at multiple location and scale, the SIFT detector is in-variant to translation, rotation and scaling of the image

Figure 3.1: Sift Keypoint in an image

To search the “best” keypoints of an image, a Gaussian Scale Space is con-structed which is basically a set of image convoluted with a DoG (Di�erenceof Gaussian) �lter with di�erent levels of σ, the best keypoints obtainableare the one corresponding with the points of maxima and minima of thefunction. However, in the “dense” version of the algorithm used in this the-sis work the detection phase of the algorithm is not performed because thelocations of the keypoints are selected every �xed number of pixels, metriccalled binsize.

3.3. BOW: FEATURE DETECTION AND DESCRIPTION 11

A SIFT descriptor is a three-dimensional spatial histogram of the imagegradients, samples are weighted by the gradient norm and accumulated in the3-D feature vector formed by the pixel location and the gradient orientation:the spatial coordinates are quantized in four bins each (x and y), orientationsin eight bins, so in more practice terms the resulting SIFT Descriptor of apoint is a 128-dimensional vector (8x4x4=128 bins). In the end an additionalGaussian function is applied to weight the gradient to give less importanceto the ones farther away from the keypoint center.

Image x

0

1...

9

x

y

Descriptor Geometry

y

0

1...

9

0

1...

90

1...

9

Figure 3.2: Sift Descriptors: spatial histogram of the image gradient

The gradient vector computed at the scale σ can be denoted as:

J(x, y) = ∇Iσ(x, y) =[∂Iσ∂x

∂Iσ∂x

]The Sift descriptor is a three-dimensional spatial histogram of the distri-

bution of J(x, y), to describe how it is constructed it’s convenient to showit in the canonical frame. In this special frame, the axis of the descriptorscoincide with the one of the image and each spatial bins have the size of 1.The histogram hasNθ×Nx×Ny bins, (usually 8x4x4), as it is showed in thefollowing �gure:

Bins are indexed by the triplet (t, i, j) and their centers are located in:


x

y

yj

xi

i

j

Nx/2

Ny/2

w(x-xi)

w(y-yj)

1

1

Figure 3.3: Canonical SIFT descriptor and spatial binning functions

θt =2π

Nθ

t, t = 0, ..., Nθ − 1,

xi = i− Nx − 1

2, i = 0, ..., Nx − 1,

yj = j − Ny − 1

2, j = 0, ..., Ny − 1.

The histogram is constructed by using trilinear interpolation, i.e. by ap-plying a weight to the contributions of the binning functions:

w(z) = max(0, 1− z),

wang(z) =+∞∑

k=−∞

w

(Nθ

2πz +Nθk

)The gradient vector �eld is then transformed in a 3-d density map of

weighted contributions:

f(θ, x, y) = |J(x, y)|δ(θ − ∠J(x, y))

The histogram is localized in the keypoint support by a Gaussian win-dow of standard deviation σwin. The histogram can be retrieved using the


following formula:

h(t, i, j) =

∫gσwin

(x, y)wang(θ − θt)w(x− xi)w(y − yj)f(θ, x, y)dθdxdy

=

∫gσwin

(x, y)wang(∠J(x, y)− θt)w(x− xi)w(y − yj)J(x,y)dxdyb

In practice, the descriptors are not computed in the canonical frame butdirectly in the image one, so using a hat notation to distinguish between thequantities in the canonical frame and the one relative to the image frame, thetwo frames are related by an a�nity:

x = Ax+ T, x =

[cx

y

], x =

[cx

y

]

Then all the descriptors quantities can be computed directly in the imageframe, the image at scale σ is in relation with the canonical image (in thesame scale):

I0x =I0(x), x = Ax+ T

Iσx =IAσ(x), x = Ax+ T

Where generalizing the previous de�nitions:

IAσ(x) = (gAσ ∗ I0)(x), gAσ(x) =1

2π|A|σ2exp (−1

2

x>A−>A−1x

σ2)

Deriving can be shown how the gradient �elds are in relation:

J(x) = J(x)A, J(x) = (∇IAσ)(x), x = Ax+ T.


Therefore the descriptors can be computed in the image or canonicalframe shifting between the following two formulas:

h(t, i, j) =

∫gσwin

(x)wang(∠J(x)− θt)wij(x)|J(x)|dx

=

∫gAσwin

(x− T )wang(∠J(x)A− θt)wij(A−1(x− T ))|J(x)A|dx

De�ning the product between two binning functions as:

wij(x) = w(x− xi)w(y − yj)

3.3.2 Standard Sift Descriptor

Considering a SIFT keypoint centered in T, with scale σ and orientation θ, thea�ne transformation (A,T) can be reduced to the similarity transformation:

x = mσR(θ)x+ T

where R(θ) is a counter-clockwise rotation of θ radians, mσ is the de-scriptor magni�cation factor which expresses the di�erence of scale of thedescriptor bin and the keypoint σ. The standard SIFT descriptor computesthe gradient of the image at the scale of the keypoints, which is equivalentto a smooth of σ = 1/m in the canonical frame; since the default Gaussianwindow size has a standard deviation of σwin = 2, the resulting formula is:

h(t, i, j) = mσ

∫gσwin

(x− T )wang(∠J(x)− θ − θt)wij(R(θ)>x− T

mσ)|J(x)|dx

σwin = mσ ˆσwin

J(x) = ∇(gmσσ ∗ I)(x) = ∇Iσ(x).


3.3.3 Dense SIFT

The Dense version of the Scale Invariant Feature Transform algorithm com-putes the descriptors in a dense grid of locations with a �xed scale and orien-tation, and depending on the density of the grid the algorithm can be fasterthan the “original” version, because several simpli�cation can be applied.

Image Domain

Sampling Step

Bin size

Figure 3.4: Geometry of the Dsift descriptors

In the case of computing descriptors di�ering only by their location and withnull orientation the histogram can be computed as:

x = mσx+ T

h(t, i, j) = mσ

∫gσwin

(x− T )wang(∠J(x)− θt)w(x− TXmσ

− xi)·

· w(y − Tymσ

− yj)|J(x)|dx

Since a lot of di�erent values of T are sampled, the histogram formulacan be expressed as a convolution between separable components.Translating by xij = mσ(xi, yi)

> and using the symmetry of the binningand windowing functions:


T ′ = T +mσ

[xi

yi

],

h(t, i, j) = mσ

∫gσwin

(T ′ − x− xij)wang(∠J(x)− θt)w(T ′x − xmσ

)·

· w(T ′y − ymσ

)|J(x)|dx

De�ning then the kernels for x and y components as:

ki(x) =1√

2πσwinexp

(−1

2

(x− xi)2

σ2win

)w( x

mσ

),

kj(y) =1√

2πσwinexp

(−1

2

(y − yj)2

σ2win

)w( y

mσ

)

Getting as result the following simpli�ed formulas for the histogram andthe gradient vector:

h(t, i, j) =(kikj ∗ J t)(T +mσ

[xi

yi

]),

J t(x) = wang(∠J(x)− θt)|J(x)|

The main advantages of the SIFT algorithm are its good recall rates or ac-curacy when its descriptors are used to compare images, robustness to occlu-sion, rotations and scale, and the fact that slightly di�erent implementationsof SIFT like DSIFT (not the original LOWE’s version) are free to use and in-cluded in various machine learning libraries. Comparing the algorithm with


a more modern one like SURF, SIFT gives a comparably good accuracy atthe cost of a slowly computational time but since real-time execution speedis not required in this project SIFT is a solid choice. Another advantage ofSIFT is the greater descriptor’s dimension (128 vs 64) which may bring moreinformations in the next steps of the classi�cation algorithm. More informa-tions about SIFT and its variants can be found in the research done by Wu[14].


3.4 BoW: Codebook Formation

This phase is the �nal step in a Bag of Word model and is dedicated to the cre-ation of the image’s domain words called codewords and the correspondingdictionary called codebook, to perform the task a simple solution is to performK-means (theory in Appendix A) clustering over the images patches descrip-tors extracted in the previous phase, the centroids (cluster centers) obtainedat the end of the clustering algorithm corresponds to the codewords and thatset is the codebook or images’ dictionary. Finally each patches descriptor ofan images is assigned to the the nearest codeword, in a similar manner to thetextual case, an image can be represented by an histogram of the codewords.

3.5. BOW: LEARNING AD CLASSIFICATION 19

3.5 BoW: Learning and classi�cation

The feature histograms constructed with the Bag of Word model can be usedas feature vector in many computer vision applications, in the case of thisthesis work the purpose of image categorization is performed using a modelused with success also in text categorization, Support Vector Machines.

3.5.1 SVM: Support Vector Machines

The Support Vector Machines or kernel machines are a set of supervisedlearning methods, for the classi�cation or regression of patterns created byVapnik in 1995 [8]. The main idea is to construct an hyperplane in a high orpotentially in�nite number of dimensions trying to �nd a good separationsbetween the data, and that is achieved by the plane which has the higherdistance between the training data point of any di�erent class because ingeneral bigger is the “margin” hyperplane lower is the generalization errorof the classi�er.

Maximum Margin

Optimal M

argin Hyperplane

X1

X2

Figure 3.5: The SVM hyperplane example


Given some training data D, composed by a group of points in the followingform:

D = {(Xi, Yi)‖Xi ∈ Rp, Yi ∈ {−1,+1}}ni=1

where Xi is the feature vector and Yi is the label of the class and can beeither -1 or 1.The objective is to discover the hyperplane that can divide the elements hav-ing Yi = −1 and those having Yi = 1 with the maximum possible margin.

An hyperplane can be written as the set of points X satisfying:

w · x− b = 0

Where w i the normal vector to the plane and · is the dot product.Considering data linearly separable, two hyperplane can be selected in a waythat there are no points between them and then their distance can be maxi-mized:

w ·Xi − b ≥ 1 for the �rst class

w ·Xi − b ≤ −1 for the second class

and can be compressed into the formula:

Yi(w ·Xi − b) ≥ 1for all1 ≤ i ≤ n

So the optimization problem becomes:Minimize in ‖w‖2

Yi(w ·Xi + b) ≥ 1

for any i=1...n


Including KKT multiplier [17] and substituting the object function with

1

2‖w2‖

for mathematical convenience, the resulting problem with added constraintsbecomes, in its primal form:

argminw,b maxα≥0

{1

2‖w‖2 −

n∑i=1

αi[Yi(w ·Xi − b)− 1]

}

In 1995 Corinna and Vapnik suggested a modi�ed maximum margin ver-sion called “Soft Margin” that allows for mislabeled examples: the non-negativeslack variable ξ is introduced to measure the degree of misclassi�cation ofthe data Xi.So the problem becomes:

argminw,b,ξ

{1

2‖w‖2 + C

n∑i=1

ξi

}subject to (for any i=1,..,n)

Yi(w ·Xi − b) ≥ 1− ξi, ξi ≥ 0


3.5.2 Non-linear SVM

If the problem leads to non-linear separable data using a “Kernel Trick” [17]SVM can be used as a non-linear classi�er: the data can, in fact, be mappedinto a richer feature space including non linear features; in the obtained spacecan be constructed an hyperplane that can be used to separate the data. In amore formal way:

x 7→ φ(x)

where φ is a non-linear kernel function, so the resulting objecting function becomes:

f(x) = w · φ(x) + b

The most used kernel are:

• Linear, [x, x′]

• Polynomial, (γ[x, x′] + r)2

• Radial basis function (Rbf), exp (−γ|x, x′|2)

• Sigmoid, tanh(γ[x, x′] + r)

3.5.3 Multi-class SVM

In case of classi�cation with multiple di�erent labels (like in the case of thisthesis work) the main approaches are:

• Building binary classi�ers which distinguish between one of the la-bel and the rest (one-vs-all approach), the class then is assigned to theclassi�er which has the highest output function;


• Building binary classi�ers between every class pairs (one-vs-one ap-proach): every classi�er assigns the vote to one of the two classes andthen the one with the maximum number of votes determines the �nalclassi�cation results.

The SVM has many di�erent advantages: �rst of all, the theoretical stepsbehind results into a sound geometrical interpretation, so understandinghow SVM operates is simple; the ability of use di�erent Kernels correspondsto a great adaptability of the algorithm, giving good performance results tomany di�erent kinds of problems; the algorithm also has the property of notconverging to local minima when non-linear kernels are used, called “con-vexity property”, so the solution obtained is always the global one.On the other hand, SVM considerable limitations are the speed and size inboth the training and testing phases which may be considerable, and the re-quirement of a good knowledge about the addressed problem to chose theright kernel and the optimal kernel parameters.

Chapter 4

Construction of the classi�er

4.1 The Dataset

To retrieve a large set of images and photos taken in Venice with attachedgeographic coordinates, the social media Instagram has been chosen: themedia library from which to select the images is huge, the API is free andsimple to use, as a “cons” the images must be subjected to a �rst level of�ltering to remove a remarkable amount of useless pictures for the project,but once this problem is sorted out the remaining ones are characterizedby a lot of variability for each category which could lead to a good level ofadaptability of the �nal classi�cation algorithm.

In practice, to obtain the image set a wrapper in Python language of theInstagram API has been used, in particular one method of the API calledmedia search given as input a geographical coordinate and a date and returnsa JSON structure containing all the media information of a set of retrievedimages, the most important to the analysis are:

1. The univocal id of the image

25

26 CHAPTER 4. CONSTRUCTION OF THE CLASSIFIER

2. The url of the image in low resolution

3. Longitude and latitude of the picture

4. Tags, Comments, other informations

This method has been called along a grid of coordinates to cover the rect-angular area corresponding to the geographical zone of Venice to get themaximum coverage possible without overlapping, on the other hand anotherpossible solution would have been to simply select a coordinate in the cen-ter of Venice and a radius big enough to contain all the city area, but thissolution have been rejected because of the api limitation on the maximumnumber of results per request that would have excluded an important partof the dataset.

To recap the dataset is composed by a set of images of Venice, obtainedfrom the users’ photos uploaded in the social “Instagram”, since there haven’tbeen any kind of preprocessing �ltering operation, the photos are very di�er-ent from each other with consistent variations in the subject pictured, scale,orientation and luminescence.The categories in our analysis are selected with the objective of discover apattern in the type of photos taken in the various districts of Venice, but dueto the high variance in the data they must also be general enough to be ableto e�ectively generalize; the chosen categories are the following:

4.1. THE DATASET 27

1. Lagoon landscape, photos taken near or inside the lagoon, are charac-terized by water, boats and “bricole”.

Figure 4.1: First category, Lagoon landscape

2. Townscape, photos shotted in the city regarding bridges, monuments,squares, churches.

Figure 4.2: Second category, Townscape


3. Art, photos taken inside museums, art galleries

Figure 4.3: Third category, Art

4. Folklore, photos of Carnival events, masks, or Venetian folklore like“Gondole” or “ Murano Glass blowers”

Figure 4.4: Fourth category, Folklore

4.1. THE DATASET 29

5. Food, photos of Venetian gastronomic specialties or drinks

Figure 4.5: Fifth category, Food

6. Varies, this category contains each photo that doesn’t belong to any ofthese categories.

Figure 4.6: Sixth category, Variessl


As previously stated the dataset contains a lot of di�erent images, butinside each category can be found some recurrent patterns (e.g. water in thebottom part of the image in lagoon landscape, buildings in the townscape )with the exception of the last category: this class contains a very di�erent setof images that don’t belong to any of the others classes, so they are basicallyuseless in the analysis and must be removed from the data set.

4.2. THE CLASSIFIER 31

4.2 The classi�er

The main phases of the classi�er are described in the �ow chart:

Figure 4.7: The �ow chart of the classi�cation construction phases and recog-nition

The training steps can be summarized as:

• Selection of a meaningful dataset with assigned the class label used fortraining the classi�er


• Detection of meaningful parts of images, and constructing an appro-priate description

• Discovering and assignment of those descriptors to a “vocabulary” ofclusters (Form Codebook)

• Construction of a “Bag of keypoints”, counting the number of descrip-tors assigned to each cluster (Create Class Histograms)

• The bag of keypoints is used as a feature vector to train a multi-classclassi�er, determining the category of the image

In the testing phase, the feature extraction is the same as the trainingbut the clustering phase is not performed again, the extracted keypoints areassigned to the nearest Codeword (or centroid) and the histogram of the im-ages is then created and used as �nal feature in the classi�cation phase tolabel the test image.

In the next section the phases will be described in detail.

4.3. FEATURE EXTRACTION FROM THE IMAGES 33

4.3 Feature extraction from the images

To select the image patches and the appropriate descriptors, the Dense Siftalgorithm (described in chapter 3) implemented by the Vlfeat library [4] hasbeen chosen: the algorithm selects a grid of keypoints each one separatedby a �xed number of pixels, and for each of those keypoints a set of 128 Siftdescriptors is computed. Sift descriptors are a solid choice for image recogni-tion as stated by Mikolajczyk in his work [21] because of their properties ofscale, orientation and distortion invariance and with respect to other descrip-tors SIFT has a high number of components which means a more completerepresentation.To the set of descriptors the location x and y of the keypoints has been added,with the purpose of adding a location component in the features: a recurrentpattern in the photos belonging to the same category can be easily found byperforming a small analysis on the dataset, in fact, it is very likely that somesimilar patches or areas of images of the same category are also located inthe same or near location, so adding two parameter to the feature vector todescribe that fact is necessary to capture a meaningful information.

1 2 128 x y

Sift descriptors Keypoint location

Figure 4.8: A singular feature

The main problem in this phase is selecting the distance between a key-point chosen for the feature extraction and the next one, this parametercalled “binsize” has been chosen experimentally trying to balance the amount


of information and the number of keypoints: in fact selecting a too little bin-size value leads to having overlapping informations from the sift descriptorsand consequent redundant informations, on the other size a binsize too bigmeans less keypoints are selected from the photo and signi�cantly reduc-ing the amount of valuable categorization information from the images; anexperimentally good binsize is 5.

This process has been applied to a subset of 100 images for each categoryand the resulting descriptors have been concatenated creating a descriptorsmatrix of dimensions ∼4 millions x 130 element.An important consideration in this phase is the fact that since the size ofthe pictures retrieved is not standardized, it may change a lot from phototo photo depending on the camera used, is necessary to keep trace of thedescriptor-category relation, that is a range information of the the corre-spondence between a set of keypoints and the belonging class (e.g. keypointsfrom 1 to 627 000 correspond to the �rst category and so on... ).

4.4 Clustering

Clustering over the descriptors matrix is performed, giving as results theKeypoints Dictionary : the cluster’s centers obtained in the last step of theclustering algorithm are in fact the set of feature vectors that form the dic-tionary of the classes, they are called Keypoints because of the derivationfrom the Keywords of text categorization, they however may not have anunderstandable and repeatable meaning such “boats” for lagoon landscapephotos, or “masks” for folklore ones, so selecting an ideal set of keypoints isnot obvious, for this reason the objective is to select a set that performs thewith best categorization rate on our dataset.

4.4. CLUSTERING 35

Having zero information on the distribution of the keypoints in the “key-space” choosing K-means as clustering method is a good option, in fact, be-cause of its quick speed of execution, it can be tested with many di�erentvalues of it’s parameter K in a relatively small amount of time; the resultingclusters lead also to quite good results as we will see in the next paragraph.The main problem in the clustering phase is how to chose the number ofcluster K, tests however have been executed with an increasing number ofclusters (from k=100 to k=1200) in 5Fold-Crossvalidation, with Rbf kernel ofparameters C=2000,gamma=2e-07 and the results are the following:

Figure 4.9: Test of the rate of correctness, changing parameter K

The mean rate of the �ve iteration of the Crossvalidation is showed inthe �gure, the rate increases from K=100 up to K=600 from where the rate re-mains basically constant around 0.81 with little �uctuations, so a good valueof the parameter is between the range K=600 and K=1200. Picking a K valuebigger than 1200 is either computationally too costly because of the timetaken by the clustering algorithm which takes more than a day, and either


not optimal because it means also an increase in the dimension of the fea-tures which will eventually lead to a decrease in the correctness rate. For thenext tests a number of clusters parameter of 1000 has been chosen.

4.5 Construction of the Bag of keypoints

Construction of the Bag of keypoints : each descriptors taken from the setis assigned to the nearest centroid and for each category the number of as-signments descriptors-centroid is counted, in this way a “histogram” of eachclass is built giving a representation of how much particular descriptors arelinked to which centroids for every classes.

C1 C2 C3 C1000C999C998

Figure 4.10: Example of a class Histogram

4.6 Classi�cation

To perform classi�cation the histograms computed in the previous phase arenormalized dividing each component by the number of images to be able touse them as training vectors for the classi�cation algorithm chosen: SupportVector Machines (SVM).

4.6. CLASSIFICATION 37

In practice, to classify a new image, a similar process is used to computethe features: dense sift is applied, but because the centroids have alreadybeen computed the clustering phase is not necessary, the descriptors are as-sociated directly to the nearest centroid and the consequent histogram of theimage is computed by counting the number of descriptors in the exactly likein the previous case. The image-histogram is then given as feature vector tothe SVM algorithm, that performs the classi�cation.

4.6.1 The parameters of SVM

Two main versions of SVM have been implemented, using two di�erent func-tions of the Scikit Learn Python library [1] :

1. The �rst one “LinearSVM”, implements the multi-class classi�cationusing a linear kernel and an one-vs-all approach,a number of modelsequal to the number of classes are trained, the label is assigned to theclass which classi�es the test with the largest margin.The only parameter for this model is c, tests have been in 5-fold cross-validation trying orders of the parameter c, the optimal value has beendiscovered in c=2e-07:

Figure 4.11: LinearSvm, test of parameter C


2. The second version called “SVC” implements the multi-class classi�-cation using an one-vs-one approach: if C is the set of classes,

‖C‖ ∗ (‖C‖ − 1)/2

models are trained using every di�erent pair of classes, then the labelis assigned to the class that obtains the maximum number of assign-ments from the classi�ers. This implementation of SVM allows theuse of non-linear kernels, in particular for the purpose of this thesis’project the Radial basis function kernel or RBF has been chosen, itsmathematical formulation is

exp (−γ | x− x′ |2)

because as stated by Hsu, Chang and Lin [7] the RBF is a good choiceof kernel because it can handle the case of non-linear relation betweenclass labels and attributes and it requires the setting of only two pa-rameters (less than the polynomial kernel) to give generally good re-sults, the �rst it’s the penalty parameter c, the second is γ. A loose gridsearch test using increasing values of the two parameters has been per-formed, the best result in the rate of correct results (considering all theclasses) is obtained with the parameter of c set to 2e3 and γ to 2e-07.


The results of the entire test on the parameters is shown in the nexttable:

C/Gamma 2,00E-15 2,00E-13 2,00E-11 2,00E-09 2,00E-07 2,00E-05 2,00E-03 2,00E-01 2,00E+01 2,00E+03 2,00E+052,00E-07 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,172,00E-05 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,172,00E-03 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,17 0,172,00E-01 0,17 0,17 0,17 0,17 0,37 0,17 0,17 0,17 0,17 0,17 0,172,00E+01 0,17 0,17 0,17 0,43 0,70 0,17 0,17 0,17 0,17 0,17 0,172,00E+03 0,17 0,17 0,44 0,65 0,70 0,17 0,17 0,17 0,17 0,17 0,172,00E+05 0,17 0,41 0,64 0,64 0,68 0,17 0,17 0,17 0,17 0,17 0,17

Table 4.1: Test on the parameters C and γ on the SVC with rbf kernel

4.6.2 The Varies class problem

A severe problem encountered is that an important rate of the image re-trieved from the social “Instagram” belongs to the “Varies” category: beingcomposed of photos of the most di�erent subjects that couldn’t be assigned toany classes, it is the class with the higher percentage of miss-classi�cations,in fact, about half of the images that should belong to “Varies” are assignedto other classes.Since the aforementioned class’ images are basically useless in the generalanalysis a method to remove them must be implemented: the solution cho-sen use the class membership probabilities estimates as explained by Chihand Jen Lin [6] to perform a modi�ed classi�cation process. The calcula-tion of these probabilities by the implementation done by Scikit library [1]are strictly related to the “SVC” one-vs-one methods, and gives an estimateof the probabilities of a particular element of belonging to the classes (themaximum probability may di�er from the maximum decision voted class),the modi�ed version multiply the probability of the “Varies” category by aparameter called M, and then the �nal assignment is done by assigning the


label to the class with the maximum probability.In this way many images that were miss-classi�ed by not having a strong“membership” with a particular class are now likely to be assigned to the“Varies” class. On the other hands some images that were correctly assignednow might be miss-classi�ed to “Varies” and then removed, but for the pur-pose of the analysis it’s better to keep the maximum amount of correctlyclassi�ed elements of the main categories even if it means to exclude a littleportion of the images that before was rightful classi�ed.

To discover a good value for the M parameter, tests have been made ona dataset of 100 images for each classes for a total of 600 images , using the“SVC” method with parameters K=1000, C=2000, gamma=2e-07, kernel=’rbf’,type=’onevsone’: an optimal value of the M parameter as been detected is2, in the following schema can be seen how the confusion matrix changesvaring from an M=1 in the �rst case to M=2 in the second:

Figure 4.12: Comparison of the Confusion Matrix with parameter M=1 andM=2

In the �rst case the confusion matrix of the classi�cation with the pa-rameter M set to 1 shows a classi�cation rate of 0.68 considering also the“Various” class elements, while removing its elements and considering onlythe sets belonging to the other �ve categories lead to a classi�cation rate of


0.79; in the case with M=2, many images that were belonging to “Various”but miss-classi�ed to other classes are now correctly assigned, on the otherhand the inverse happens but only a negligible amount of times: in fact thegeneral rate(“Various” class included) remains 0.68, while on the other handthe rate considering only the elements of the �ve classes grows to 84% ofcorrectness.

4.6.3 Final considerations

Even if the performance of the one-vs-one classi�er with rbf kernel is onlyslightly better than the one-vs-all linear one, it has been preferred and used asthe �nal classi�er in all the subsequent analysis because of the implementa-tion of the Varies elimination phase which uses the membership probabilitiesto exclude that category of elements.


4.7 Display of the results

In this section we will show the output of the classi�cation algorithm givingsome particular pictures, �rst of all some correct results will be presented fol-lowing with some misclassi�cation cases and particular results. All the dis-played tests are performed with the one-vs-one version of the classi�cationalgorithm with a “rbf” kernel with parameters K=1000, C=2000, gamma=2e-07.

The image in the �rst test is the following:

Figure 4.13: Test 1, correctly classi�ed Folklore image

The results of both votes and probabilities are in the following table:

Category Lagoon Landscape Townscape Art Folklore Food VariesVotes 4 3 2 5 0 1Probabilities 0.218 0.177 0.155 0.279 0.032 0.276

Table 4.2: Votes and probabilities of Test 1

This picture contains elements of both the categories folklore (the maskedpeople and the gondola) and lagoon landscape (a good degree of water) andthat fact is re�ected on the votes of the one-vs-one algorithm (Folklore has5 votes, Lagoon has 4), and on the other hand looking at the probabilities

4.7. DISPLAY OF THE RESULTS 43

it is important to notice that the second biggest class is now Varies since ithas been multiplied by the factor M, but the maximum one remains Folklorewhich is the correct category.

The second test presented is:

Figure 4.14: Test 2, correctly classi�ed Food image

The pictures belonging to the food category is characterized by very stan-dard patterns, usually a white plate with the food in the center, so it is oneof the classes with the best classi�cation rate:



From the table can be easily seen how the class Food has the maximumnumber of votes and at the same time the probabilities of belonging to thatclass is 0.93 which means a really strong membership rate.


Another test of the performance of the classi�er is the next Townscapepicture that contains also typical elements of the lagoon landscape class:

Figure 4.15: Test 3, correctly classi�ed Townscape image



The presence of elements of both classes is observed on both votes andprobabilities, in fact the membership of the picture is disputed by the cate-gories Lagoon Landscape and Townscape and their corresponding high votesand probabilities. The picture is �nally assigned to Townscape.


In the fourth test an error in the classi�cation algorithm due to the Mfactor is showed, given as input the following picture:

Figure 4.16: Test 4, incorrectly classi�ed Townscape image

The output given is:



It is important to notice in the table that the votes alone would have cor-rectly labeled the image assigning it to the category Townscape but sincethe probability of that class is not strongly bigger than the others it has been“wrongly” assigned to the Varies category which probability has been dou-bled thanks to the M factor.


The last case presented is a typical misclassi�cation error, the picture ofa gondola considered a member of the Folklore category is wrongly assignedto the Lagoon Landscape category:

Figure 4.17: Test 5, incorrectly classi�ed Folklore image



In this case the votes shows the con�ict between the categories LagoonLandscape (5) and Folklore (4) being those the ones with more votes, on theother hand the probabilities don’t indicate the same in fact the prob of the�rst class is way higher (0.583) than the Folklore one (0.08).


4.7.1 Considerations on the classi�er results

Taking into account the type of dataset, important re�ections can be made:the Art and Food categories have a rate of correct classi�cation higher thanthe others thanks to the fact that the type of pictures that falls into thosecategories are quite standard, while on the other hand folklore pictures aremisclassi�ed with more ease to Townscape or Lagoon categories due to thepresence in the picture of recurrent patches belonging to both the categories,like a monument on the background of some masked people or water in afolklore picture regarding a Gondola.For this reason a practical extension to this classi�er is to allow the assign-ment of a picture to more than one class, by allowing multi-class categoriza-tion the previously described problem is avoided, however new issues ariseregarding especially the training phase that must be modi�ed to considerthe multiple label assignment, and a new method to select the �nal classor classes labels must be chosen wisely since the algorithm must assign thepicture to a not known a priori number of categories.

Chapter 5

Analysis of the results

5.1 The year analysis

The main purpose of this thesis is to make a touristic analysis on the imagestaken by the people in Venice during the year and loaded on social media,so pictures taken all over the years 2014-2015 are retrieved by the algorithmevery 5 days, classi�ed, grouped by month and then assigned to the nearestVenice district, with this method a conspicuous dataset of about 90 000 im-ages is �nally created.For the analysis three main methodologies have been followed: �rst of alla general analysis of the images’ category is performed by looking how thedistribution of the categories varies in both quantities and normalized rateover the months of the year but excluding the districts localization, then thesecond analysis is focused on the Venetian district with the purpose of dis-covering patterns in the distribution of particular categories over the months,lastly an analysis of the two year dataset is performed using as medium some“Heat Maps”showing the density of data categories over the map Venice.

49

50 CHAPTER 5. ANALYSIS OF THE RESULTS

5.1.1 General Quantitative Analysis

In the next table the quantitative distribution of the categories over the monthsof the years are shown:

Table 5.1: Quantitative data of the year 2014

CAT/MON JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DECLAGOON LANDSCAPE 160 169 189 195 206 200 238 232 255 267 252 184TOWNSCAPE 489 386 452 596 598 707 693 618 633 696 750 612ART 177 244 120 220 228 246 221 230 263 233 255 241FOLKLORE 111 159 124 148 159 197 183 174 168 218 237 189FOOD 159 169 178 208 215 237 207 213 230 267 249 197

Table 5.2: Quantitative data of the year 2015

CAT/MON JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DECLAGOON LANDSCAPE 225 245 215 249 259 311 345 294 303 292 310 221TOWNSCAPE 682 765 750 921 914 951 1055 1047 1302 1441 1209 1313ART 250 298 204 237 359 304 373 374 536 588 466 469FOLKLORE 183 293 236 241 391 324 332 272 473 534 450 574FOOD 227 231 251 270 330 336 284 284 468 396 313 367

Comparing just the quantity of photos obtained from the year 2014 (34102)to the one of the year 2015 (56274) a substantial increment is evident, due tothe fact that while the method for image retrieval didn’t changed, the num-ber of global users of Instagram has more than doubled from 2014 to 2015[3]. Adding now the months in the the quantitative analysis provides someinteresting results: ignoring at �rst the categorization and focusing only onthe quantities of photos obtained monthly can be noticed that there is a rela-tion with the quantity of photos taken and particular events of the Venetianlife, for example in February takes place the last week of the carnival cele-bration where an increased rate of pictures are taken and the di�erence in

5.1. THE YEAR ANALYSIS 51

the total quantity with respect to the months of January and March is re-markable; another case regards the months of September and October andtheir considerable amount of images related, they are in fact the two monthswith the most number of photos obtained, this case is strictly related with the“Mostra del Cinema” and “Biennale” particular events that also takes placein one of the periods with the most touristic presence in Venice, by lookingat the touristic statistical data, courtesy of “turismovenezia” [2]:

Touristic A�luence JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC2015 313.880,00 394.867,00 484.211,00 591.777,00 666.488,00 656.027,00 728.112,00 722.032,00 700.002,00 692.113,002014 299.660,00 358.883,00 503.144,00 600.901,00 644.397,00 640.148,00 681.859,00 678.655,00 624.553,00 637.994,00 413.559,00 341.556,00

Table 5.3: Touristic presence in Venice during the years 2014 and 2015

Can be highlighted, how taking in consideration particular events thatlead to an increase of the daily rate of the people’ pictures in particular peri-ods like the previously mentioned Carnival, Mostra del cinema or Biennalethere is a connection between the number of tourists visiting Venice and thetotal number of photos retrieved.


To better understand this fact a comparison with the graphs of the quan-tity of pictures retrieved and the statistical touristic presence of both theyears 2014 and 2015 have been presented:

(a) Quantity of image - 2014 (b) Touristic presence - 2014

(c) Quantity of image - 2015 (d) Touristic presence - 2015

Figure 5.1: Comparison: Images Retrieved and Touristic A�uence in Venice onyears 2014-2015


5.1.2 General Normalized Analysis

In the next picture is shown the same graph of the last paragraph with inaddiction the distinct categories discovered:

Figure 5.2: Quantitative representation of the categories over the months ofthe years 2014-2015

The �rst consideration is the important amount of Townscape picture, infact being the city area the �eld of analysis is obvious that the greater partof images that regards monuments, bridges or outside pictures that fall intothis category.

However for the analysis is more relevant to look at the normalized rateof the categories, calculated by the following formula:Considering α singular element of the dataset, and C = {1, 2, ..., 6} set ofcategories,

Ni =

∑αj∈Ci

αj∑α

Ni is the resulting normalized rate of the the category Ci.


The resulting normalized rate of Venice along the year 2015 are displayedin the following tables and graph:

CAT/MON JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DECLAGOON LANDSCAPE 0,15 0,15 0,18 0,14 0,15 0,13 0,15 0,16 0,16 0,16 0,14 0,13TOWNSCAPE 0,45 0,34 0,43 0,44 0,43 0,45 0,45 0,42 0,41 0,41 0,43 0,43ART 0,16 0,22 0,11 0,16 0,16 0,16 0,14 0,16 0,17 0,14 0,15 0,17FOLKLORE 0,10 0,14 0,12 0,11 0,11 0,12 0,12 0,12 0,11 0,13 0,14 0,13FOOD 0,15 0,15 0,17 0,15 0,15 0,15 0,13 0,15 0,15 0,16 0,14 0,14

Table 5.4: Normalized distribution of the data over the months of the year2015

CAT/MON JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DECLAGOON LANDSCAPE 0,14 0,13 0,13 0,13 0,11 0,14 0,14 0,13 0,10 0,09 0,11 0,08TOWNSCAPE 0,44 0,42 0,45 0,48 0,41 0,43 0,44 0,46 0,42 0,44 0,44 0,45ART 0,16 0,16 0,12 0,12 0,16 0,14 0,16 0,16 0,17 0,18 0,17 0,16FOLKLORE 0,12 0,16 0,14 0,13 0,17 0,15 0,14 0,12 0,15 0,16 0,16 0,19FOOD 0,14 0,13 0,15 0,14 0,15 0,15 0,12 0,13 0,15 0,12 0,11 0,12

Table 5.5: Normalized distribution of the data over the months of the year2015

Figure 5.3: Normalized representation of the data categories over the monthsof the years 2014-2015


As expected the dominant category is Townscape/Monument always witharound 40% of presence in both years, while the others categories all haverates of 0.20 or less in every months: to analyze the other categories withthe highest rate a distinction must be made for the two years, in 2014 in factArt and Lagoon landscape were the classes with the highest rate (excludingTownscape) with an exceptionally high peak in February for Art, while onthe other hand in 2015 way less Lagoon Landscape pictures where retrieved,instead the new highest rates categories are Art and Folklore.

5.1.3 District focused Analysis

This paragraph of the analysis is centered on the Venetian districts, focusedin particular on discovering patterns on the photos’ categories over the monthsof the year, for this purpose the normalized data of the categories are takeninto analysis to �nd particular trends with respect to the other categories,and such normalized categories are preferred to the quantized ones becausethe information they carry are not in�uenced by the variance of the generalnumber of photos taken: for example there isn’t a correspondence betweenan increase on the quantized number of folklore pictures on the district ofCanareggio in the month of March and it’s relative normalized rate if thequantized number of all the other categories increase as well; while on theother hand a change on the relative rate of one category is always a mean-ingful information.


The normalized categories rate of all the Venice Districts (Canareggio,Castello, Dorsoduro, San Marco, San Polo, Santa Croce) divided by monthscan be found in the “appendix”, in the next paragraphs the most interestingresults discovered will be presented.

The �rst pattern discovered regards the Folklore category by looking at itsdistribution in years 2014 and 2015 some interesting details can be noticed:

Figure 5.4: Normalized representation of Folklore over the months of theyears 2014-2015

In almost every months of both years, the district with the highest rateof Folklore pictures is San Marco and that is certainly due to the fact thatis one of the most famous touristic places “Piazza San Marco” is placed inthat particular district where masks, gondole and other venetian folklore el-ements can be found during every periods of the year. A general increase canhowever be notices in all the districts during the months of February that cor-responds to the last week of the Venetian Carnival where the rate of folklorerelated picture is obviously higher with respect to the nearest months, how-ever in contrasts with the predictions February it’s not the month with thehighest rate in general, it is in fact overcomed by November in 2014 and bysome months in 2015.


Another interesting consideration can be stated about the category La-

goon Landscape, the expectation was to �nd an higher rate on the more ex-ternal districts and an important feedback is in fact con�rmed by the results:

Figure 5.5: Normalized representation of Lagoon Landscape over the monthsof the years 2014-2015

The districts with the highest rate in both 2014 and 2015 are Castello andDorsoduro and, in fact, both districts are bordering the lagoon (Castello isnear Murano, Dorsoduro is facing Giudecca and its canal), while the otherdistricts located in more centrals positions have considerable lower rates ofLagoon pictures with the extreme of Santa Croce in February with zero pho-tos retrieved. In 2015 the highest rates of Lagoon Landscape were obtainedin Summer (with maximum in July) but the same trend isn’t con�rmed by the2014 data where the highest rates were discovered in the months of Marchand September. It is also important to notice that the mean rate of that cate-gory of pictures has decreased by 0.06 from 2014 to 2015 indicating a relevantdecrease in the number of lagoon pictures taken by the people. Further in-vestigations are however required to �nd the cause.


The opposite type of lagoon landscape’s photos are obviously Townscape’s,as stated before it’s the dominant category comprehending more or less halfof the pictures obtained, but we should notice a decrease in the rate of pic-ture in correspondence with the districts that were characterized by a higherrate of lagoon picture:

Figure 5.6: Normalized representation of Townscape over the months of theyears 2014-2015

As expected, the greatest rate of Townscape pictures in both 2014 and2015 is found in the central districts of Santa Croce and San Polo, the oneswith the minimum rates of lagoon pictures, con�rming the presence of aninverse correspondence between the number of Lagoon and Townscape pic-tures.


Regarding the remaining categories of Art and Food, the correspondinggraphs are the following:

Figure 5.7: Normalized representation of Art over the months of the years2014-2015

Figure 5.8: Normalized representation of Food over the months of the years2014-2015

There are no clear trends in the monthly distribution those two cate-gories, since there isn’t a district of Venice with an important majority orminority rate of those kind of pictures, but some considerations can be doneon some outliers results: in both years can be observed a signi�cant incre-ment on the rate of Art pictures in February followed by a decrease on the


next month moving from up to above the mean rate which is around 0.17, onthe other hand regarding the Food category particular results are the rate ofthe months of September and October 2014 of the district of San Polo whichgrow to an important rate of 0.27 signi�cantly higher than the mean of Foodin 2014 of 0.15.


5.1.4 Analysis of categories densities

In this last section Heat Maps constructed using the yearly data and the coor-dinates of each pictures category are used to show a correspondence betweenthe points with high density (highlighted in red in the heat maps) and actualplaces on the Venetian territory, however, since the amount of data obtainedin 2015 is signi�cantly greater than the year before and the scale in the twoheatmaps is the same, the maps of year 2015 will always present an increasednumber of high density places with respect to 2014.

Heat Maps of Lagoon Landscape

Figure 5.9: Heat Map of Lagoon Landscape category, years 2014-2015

As expected for the Lagoon Landscape category the majority of high den-sity areas are found around the “Canale della Giudecca”, “Canal Grande” andnearby islands, in particular some places that are typically pictured with agreat amount of lagoon elements are highlighted and easily found in themap, like Rialto, San Marco, Zattere, Isola della Giudecca and San Giorgio.As opposed the remaining city area shows a signi�cantly less density of thiscategory of photos, and that fact is a proof of the goodness of the classi�er.


Heat Maps of Townscape

Figure 5.10: Heat Map of Townscape category, years 2014-2015

The results for the Townscape category are quite understandable, thepoint of highest density are all over the central city area of Venice, wherethe most of tourists are concentrated.

Heat Maps of Art

Figure 5.11: Heat Map of Art category, years 2014-2015

Considering the Art pictures is interesting to look at the di�erence be-tween the highest density area between 2014 and 2015: in the �rst year theonly area with a considerable density where the area of Biennale, Arsenaleand Guggenheim, while on the on the other hand considering the data of


2015, in addiction to the places previously retrieved many other interestingcultural locations are discovered like Palazzo Grassi, San Giorgio or Ca Gius-tinian.

Heat Maps of Folklore

Figure 5.12: Heat Map of Folklore category, years 2014-2015

Folklore category presents interesting results: in both the heat maps butin particular in the 2015 one can be seen how the majority of picture takenbelonging to this category are situated in the central and mostly touristicarea of Venice, in addiction can be noticed a “Folklore Line”, a high densityof Folklore photos retrieved are from a set of places (squares) aligned alongone of the most direct path that leads tourists from Piazzale Roma to SanMarco passing through Campo San Polo, Campo San Silvestro and Rialto.

Heat Maps of Food

The last category analyzed is Food, and from the resulting Heat Maps canbe noticed that the points with the highest density corresponds with im-portant restaurants or “osterie” typical of the Venetian gastronomical life:notable examples discovered are“Al timon”, “Tonolo”, “Al paradiso Perduto”.


Figure 5.13: Heat Map of Food category, years 2014-2015

It is also interesting to notice that some zones with a remarkable presence ofnotable restaurants were not discovered in the analysis because of their po-sition in the city, in fact, the majority of the places discovered are positionedaround touristic routes since the analysis is mostly based on pictures takenby tourists that may miss “hidden” places out of the standard routes.


5.1.5 The distribution of Varies

Interesting considerations can be done on the distribution of the picturesassigned at the Varies class and excluded in the previous analysis, for thispurpose a comparison on the density of the Varies category versus the den-sity of all the others assembled together is performed:

Figure 5.14: Heat map of Varies vs the other classes, 2014

Figure 5.15: Heat map of Varies vs the other classes, 2015

The density of Varies’ category pictures is distributed quite uniformlyover the territory of Venice following with a strong relation the density ofthe set of the other categories, there are however some areas where this re-lation isn’t respected: in both years but with more evidence in 2015 in thearea around “Tronchetto” (on the left side of the images) can be noticed howthe Varies density is signi�cantly higher than the density of the other classes


meaning that in that particular zone a great part of the pictures regard sub-jects not usually found in the other parts of Venice and that is true since thatzone without place of touristic interests, being mainly an open area with justpiers and customs where people usually wait before being boarded.

5.2. A CASE OF STUDY: CARNIVAL 2015 67

5.2 A case of study: Carnival 2015

In this section a particular case of study is presented to give an additionalproof of the potentiality of this work of thesis: pictures from the two weeksof the Carnival (from 1/2/2015 to 14/2/2015) were retrieved and classi�ed bythe algorithm while at the same time the exact procedure has been performedon the same days of the next month (from 1/3/2015 to 14/3/2015) ; the pur-pose of this type of analysis is to assess if some interesting di�erences in thecategories’ rate can be discovered by the algorithm, analyzing two periodsclose in time (trying to exclude the seasonal rate variation) di�ering only onthe presence of the Carnival festivities.In total after the remotion of the “Varies” images a number of 4650 picturewere analyzed for the Carnival period, while the number of picture regard-ing the subsequent period is a total of 3451, this decrease is motivated by thefact that during the festivities events an higher rate of pictures is taken bythe tourists as con�rmed by the graph of the annual quantity of images inthe 2014-2015 years analysis ( the table of the quantities can be found in theappendix).

Relevant to the analysis are, however, the tables containing the normal-ized rates of both periods:

Category/Sestiere Canareggio Castello Dorsoduro San Marco San Polo Santa Croce Mean rateLagoon 0,12 0,12 0,12 0,07 0,05 0,08 0,09Townscape 0,46 0,35 0,49 0,40 0,46 0,49 0,44Art 0,15 0,17 0,12 0,13 0,17 0,13 0,15Folklore 0,13 0,20 0,13 0,22 0,14 0,19 0,17Food 0,14 0,15 0,14 0,17 0,18 0,10 0,14

Table 5.6: Normalized category rates, Carnival


Category/Sestiere Canareggio Castello Dorsoduro San Marco San Polo Santa Croce Mean rateLagoon 0,13 0,18 0,14 0,12 0,08 0,06 0,11Townscape 0,44 0,39 0,38 0,45 0,46 0,53 0,44Art 0,16 0,10 0,15 0,14 0,12 0,10 0,13Folklore 0,14 0,15 0,19 0,15 0,13 0,11 0,14Food 0,13 0,18 0,14 0,13 0,20 0,20 0,16

Table 5.7: Normalized category rates, after Carnival

The focus of this case of study centered on Carnival is the Folklore cat-egory: the expected results were to �nd a substantial increase in the rateof that type of pictures and that fact has been con�rmed by the results ob-tained, the mean rate of Folklore during the Carnival (0.17) is signi�cantlyhigher than the one of two week later (0.14) and it is more evident comparingthe rate over the singular districts:

Figure 5.16: Folklore category, Carnival vs After Carnival

Almost all the districts present rates of Folklore pictures signi�cantlyhigher during the Carnival in particular San Marco grows from 0.15 of the

5.2. A CASE OF STUDY: CARNIVAL 2015 69

non-Carnival to 0.22. Exception to this trend is the district of Dorsoduro andCanareggio, the �rst in particular goes in contrast and its rate of Folklore’sphotos retrieved by the algorithm is signi�cantly bigger in the non-Carnivalperiod (0.19) than in the festivity one (0.13).The growth during Carnival is also con�rmed by the Heat Maps showing thedensity of Folklore pictures in the two time periods:

Figure 5.17: Heat map of the Carnival vs After Carnival periods, 2015

It is evident that the density of Folklore is higher during Carnival andexactly like discovered in the annual analysis the “Folklore line” is found inthe heat maps connecting the main places from Piazzale Roma and Ferroviaall the way to San Marco in a path of high density of Folklore pictures; inthe period after Carnival the density are instead distributed more uniformlyaround the territory of Venice and not mainly along the discussed line show-ing how tourists look more frequently for folkloristic location in zones lessvisited of Venice with respect to Carnival.It is important to notice also the relative grow of the rate of artistic pictureduring the Carnival, that is a consequence to the fact that during Carnivalmany cultural activities are organized in museums, art galleries leading toan important quantity of Art photos taken.


Considering the other categories, in the period after the Carnival, thedecline of Folklore and Art is opposed to the growth of Lagoon Landscapeand Food (the mean rate of Townscape remains the same ):

Figure 5.18: Lagoon and Food, Carnival vs After Carnival

As con�rmed from the graph the rate of both Lagoon Landscape andFood pictures are higher in almost all the districts in the period after Car-nival showing a shift from the prevalent picture’s subject of folklore to othertopics in the non-holidays periods.

Chapter 6

Conclusions

This thesis work has shown how the enormous set of data produced by usersin social media, can be re�ned with success using a consolidated method inComputer Vision such as a Bag of Visual Words model to e�ciently discoverinformations on the touristic �ows and activities regarding a certain area,successively the touristic informations about trends and frequented placesobtained using this method can be used to calibrate funds and touristic in-vestments focusing on the real activities performed by the people while at thesame time avoiding the waste of resources on areas of with limited interestor not followed by a signi�cant amount of peoples.

In this project the analysis is focused on the city area of Venice but withthe exactly technique and methodology can be used on every di�erent placesin the world, by just changing the data images used for training and eventu-ally the number and type of categories.

71

72 CHAPTER 6. CONCLUSIONS

6.1 Future work

An interesting extension of this project is the implementation of a Moodanalysis system: by taking into consideration the description and the com-ments related to the pictures, by using a Natural Language processing tech-nique many informations about the opinions of the people on the subjectcan be retrieved, with this layer implemented into the project in addictionto discovering what the people are taking pictures at, we can also �nd out ifthe mood about that subject is positive or negative.

Appendices

73

Appendix A

A.1 K-means

The K-means algorithm is one of the major feature-based clustering algo-rithm: the fundamental idea is to minimize the total intra-cluster variance,so given a set of “observations” x1,...,xn, where xi is a m-dimensional vectorand K the chosen number of clusters, the goal of the algorithm is to �nd anassignment of the observations to cluster and centroid vectors of the clus-ters µk (k=1,...,K), such that the distance of the data from the centroid of theassigned group is minimum.

Given R = (rij) a matrix nxK such that rij = 1 if xi is assigned tocluster j and 0 otherwise. Kmeans algorithm minimizes the distortion metric

J =∑i

∑k

rik‖xi − µk‖2

iterating the following steps:

• Minimize J with respect to R, keeping �xed centroids:

rij =

1 if j = argmink‖xi − µk‖2

0 otherwise

75

76 APPENDIX A.

• Minimize J with respect to µk, keeping �xed assignments:

µk =

∑i rikxi∑i rik

The algorithm is usually stopped after the no assignments changes fromthe last iteration and the previous or if the distortion function doesn’t changemore than a chosen parameter of convergence.

The main advantage of K-means is the fast convergence in the mean caseand that is a factor that limits its various disadvantages: in fact the algorithmdoes not ensure the optimal solution and it requires as parameter the numberK of clusters, but thanks to it’s quickness it can be executed an exhaustivenumber of times and then the better solution retrieved can be chosen.

A.2. NORMALIZED RATE OF THE DISTRICTS 77

A.2 Normalized rate of the districts

(a) Norm. Rate Canareggio - 2014 (b) Norm. Rate Castello - 2014

(c) Norm. Rate Dorsoduro - 2014 (d) Norm. Rate San Marco - 2014

(e) Norm. Rate San Polo - 2014 (f) Norm. Rate Santa Croce - 2014

Figure A.1: Normalized Rate of the categories over the Venetian Districts, 2014

78 APPENDIX A.

(a) Norm. Rate Canareggio - 2015 (b) Norm. Rate Castello - 2015

(c) Norm. Rate Dorsoduro - 2015 (d) Norm. Rate San Marco - 2015

(e) Norm. Rate San Polo - 2015 (f) Norm. Rate Santa Croce - 2015

Figure A.2: Normalized Rate of the categories over the Venetian Districts, 2015

A.3. NUMERICAL RESULTS OF THE CARNIVAL 2015 ANALYSIS 79

A.3 Numerical results of theCarnival 2015 anal-ysis

Category Canareggio Castello Dorsoduro San Marco San Polo Santa CroceLagoon 140 132 109 46 26 32Townscape 515 374 452 264 230 191Art 165 181 113 88 85 52Folklore 150 214 115 148 72 73Food 153 162 127 112 91 38

Table A.1: Category results Carnival 2015

Category Canareggio Castello Dorsoduro San Marco San Polo Santa CroceLagoon 99 159 87 66 32 19Townscape 324 334 239 246 177 158Art 116 90 95 78 46 30Folklore 106 128 115 81 50 33Food 97 152 85 72 77 60

Table A.2: Category results after Carnival 2015

80 APPENDIX A.

Bibliography

[1] 1.4. Support Vector Machines — scikit-learn 0.17 documentation.http://scikit-learn.org/stable/modules/svm.

html, 2016. Online; accessed 2016-02-08.

[2] Statistics of Touristic presence during the months of theyears 2014-2015. http://www.turismovenezia.it/

Dati-statistici-314698.html, 2016. Online; accessed2016-02-08.

[3] Statistics on Instagram monthly active users 2015. http:

//www.statista.com/statistics/253577/

number-of-monthly-active-instagram-users/,2016. Online; accessed 2016-02-08.

[4] VLFeat - documentation > c API. http://www.vlfeat.org/

api/dsift.html, 2016. Online; accessed 2016-02-08.

[5] Steve Branson, Catherine Wah, Florian Schro�, Boris Babenko, PeterWelinder, Pietro Perona, and Serge Belongie. Visual recognition withhumans in the loop. In Kostas Daniilidis, Petros Maragos, and NikosParagios, editors, Computer Vision – ECCV 2010, number 6314 in Lecture

81

82 BIBLIOGRAPHY

Notes in Computer Science, pages 438–451. Springer Berlin Heidelberg.DOI: 10.1007/978-3-642-15561-1_32.

[6] Chih-Jen Lin, Ting-Fan Wu, and Ruby C. Weng. Probability estimatesfor multi-class classi�cation by pairwise coupling.

[7] {and} Chih-Jen Lin Chih-Wei Hsu, Chih-Chung Chang. A practicalguide to support vector classi�cation.

[8] Corinna Cortes and Vladimir Vapnik. Support-vector networks.20(3):273–297.

[9] Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski,and Cédric Bray. Visual categorization with bags of keypoints. In In

Workshop on Statistical Learning in Computer Vision, ECCV, pages 1–22.

[10] Jia Deng, Alexander C. Berg, Kai Li, and Li Fei-Fei. What does classi-fying more than 10,000 image categories tell us? In Kostas Daniilidis,Petros Maragos, and Nikos Paragios, editors, Computer Vision – ECCV

2010, number 6315 in Lecture Notes in Computer Science, pages 71–84.Springer Berlin Heidelberg. DOI: 10.1007/978-3-642-15555-0_6.

[11] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learningnatural scene categories. In IEEE Computer Society Conference on Com-

puter Vision and Pattern Recognition, 2005. CVPR 2005, volume 2, pages524–531 vol. 2.

[12] R. Fergus, P. Perona, and A. Zisserman. Object class recognition byunsupervised scale-invariant learning. In 2003 IEEE Computer Society

Conference on Computer Vision and Pattern Recognition, 2003. Proceed-

ings, volume 2, pages II–264–II–271 vol.2.

BIBLIOGRAPHY 83

[13] K. Grauman and T. Darrell. The pyramid match kernel: discriminativeclassi�cation with sets of image features. In Tenth IEEE International

Conference on Computer Vision, 2005. ICCV 2005, volume 2, pages 1458–1465 Vol. 2.

[14] Zhiming Cui Jian Wu. A comparative study of SIFT and its variants.13(3).

[15] Thorsten Joachims. Text Categorization with Support Vector Machines:

Learning with Many Relevant Features.

[16] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-fei Li. L.: Novel dataset for �ne-grained image categorization. In First

Workshop on Fine-Grained Visual Categorization, CVPR (2011.

[17] H. W. Kuhn and A. W. Tucker. Nonlinear programming. The Regentsof the University of California.

[18] Li-jia Li, Hao Su, Li Fei-fei, and Eric P. Xing. Object bank: A high-levelimage representation for scene classi�cation & semantic feature sparsi-�cation. In J. D. La�erty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel,and A. Culotta, editors, Advances in Neural Information Processing Sys-

tems 23, pages 1378–1386. Curran Associates, Inc.

[19] David G. Lowe. Distinctive image features from scale-invariant key-points. 60(2):91–110.

[20] S. Maji, L. Bourdev, and J. Malik. Action recognition from a distributedrepresentation of pose and appearance. In 2011 IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages 3177–3184.

84 BIBLIOGRAPHY

[21] K. Mikolajczyk and C. Schmid. A performance evaluation of local de-scriptors. 27(10):1615–1630.

[22] Maria-Elena Nilsback and Andrew Zisserman. A visual vocabulary for�ower classi�cation. In Proceedings of the 2006 IEEE Computer Society

Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR’06, pages 1447–1454. IEEE Computer Society.

[23] E. Osuna, R. Freund, and F. Girosi. Training support vector machines:an application to face detection. In , 1997 IEEE Computer Society Con-

ference on Computer Vision and Pattern Recognition, 1997. Proceedings,pages 130–136.

[24] Constantine Papageorgiou, Theodoros Evgeniou, and Tomaso Poggio.A trainable pedestrian detection system. In In Proceedings of Intelligent

Vehicles, pages 241–246.

[25] Wen Pu, Ning Liu, Shuicheng Yan, Jun Yan, Kunqing Xie, and ZhengChen. Local word bag model for text categorization. In Seventh IEEE

International Conference on Data Mining, 2007. ICDM 2007, pages 625–630.

[26] H. Schneiderman and T. Kanade. A statistical method for 3d objectdetection applied to faces and cars. In IEEE Conference on Computer

Vision and Pattern Recognition, 2000. Proceedings, volume 1, pages 746–751 vol.1.

[27] J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman. Dis-covering objects and their location in images. In Tenth IEEE Interna-

BIBLIOGRAPHY 85

tional Conference on Computer Vision, 2005. ICCV 2005, volume 1, pages370–377 Vol. 1.

[28] Michael Stark. Fine-grained categorization for 3d scene understanding.

[29] Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S.Willsky. Describing visual scenes using transformed objects and parts.77(1):291–330.

[30] Simon Tong and Daphne Koller. Support vector machine active learningwith applications to text classi�cation. 2:45–66.

[31] P. Viola and M. Jones. Rapid object detection using a boosted cascadeof simple features. In Proceedings of the 2001 IEEE Computer Society

Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001,volume 1, pages I–511–I–518 vol.1.

[32] C. Wah, S. Branson, P. Perona, and S. Belongie. Multiclass recognitionand part localization with humans in the loop. In 2011 IEEE International

Conference on Computer Vision (ICCV), pages 2524–2531.

[33] Jianchao Yang, Kai Yu, Yihong Gong, and T. Huang. Linear spatial pyra-mid matching using sparse coding for image classi�cation. In IEEE Con-

ference on Computer Vision and Pattern Recognition, 2009. CVPR 2009,pages 1794–1801.

Ca Foscari 30123 Venezia e visual narrative of Venice: an ...

Documents

Transcript of Ca Foscari 30123 Venezia e visual narrative of Venice: an ...