[IEEE 2011 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP)...

2
OntoGen Extension for Exploring Image Collections Nenad TomaŃ sev , Bla Ń z Fortuna , Dunja Mladeni´ c Artiſcial Intelligence Laboratory Jo Ń zef Stefan Institute Ljubljana, Slovenia Email: [email protected], [email protected], [email protected] Abstract—OntoGen is a semi-automatic and data-driven ontol- ogy editor focusing on editing of topic ontologies. It utilizes text mining tools to make the ontology-related tasks simpler to the user. This focus on building ontologies from textual data is what we are trying to bridge. We have successfully extended OntoGen to work with image data and allow for ontology construction and editing on collections of labeled or unlabeled images. Browsing large heterogenous image collections efſciently is certainly a challenging task - and we feel that semiautomatic ontology construction, as described in this paper, makes this task easier. I. I NTRODUCTION Ontologies are formal hierarchical knowledge represen- tations. The individual entities are mapped onto concepts, which are connected by various semantic relations. The tree- like structure results from concepts being divided into sub- concepts, providing more speciſc information about the data, if needed. The use of ontologies in digital applications is in- creasing, since they allow for multi-level views of the data and hence, multi-level reasoning and inference. Domain ontologies are structural representations of some speciſc knowledge domains. OntoGen is a tool which provides for semiautomatic on- tology construction. [6] [2] [3] [4] [5] [1] It relies on text mining to extract relevant concepts from textual data and help the domain experts when creating ontologies. This paper proposes exploration of image collections by semi-automatic construction of ontology from images and we wish to achieve that by extending OntoGen so that it can be used to create image collection ontologies. II. ONTOGEN EXTENSION FOR IMAGE COLLECTIONS Since OntoGen was built to work on textual data, we decided to extract such data from images that could be easily transformed into the form of representation which is often used in text processing - bag of words. When working with images, it is possible to deduce some typical features which can be viewed as visual words, so consequently we decided to base our approach on the bag of visual words representation [9]. Essentially, this means representing an image conceptually as a ſxed-length frequency histogram. A. Image preprocessing Images contain two types of information which may be relevant for the task at hand. Some information is contained in the way the image is colored and some in, loosely speaking, depicted edges and textures. The former can help us differen- tiate all the blue cars from all the red cars, while the latter is useful in differentiating a blue car from a river. We extract both types of information from each image. For color information, we calculate a simple color distribution for some predeſned number of bins. As local features we use SIFT features [8], since they are known to be quite robust and are frequently used in image mining applications. We only take SIFT features for some keypoints, instead of the alternative approach where they are sampled from a grid imposed on an image. After the SIFT features have been extracted for the entire image collection, we take a random sample S sif t of features from the pool of all the computed features. k-means cluster- ing [7] is then performed on S sif t and cluster centroids are taken to represent visual words. This collection of represen- tative features is called the codebook [9]. By mapping the individual features from each image onto the codebook, we obtain a ſxed-length image representation. The dimensionality of the codebook is arbitrary, but we had good results with codebook sizes 400-1000. The second part of the hybrid representation, color distribu- tion histogram, is then appended to the codebook representa- tion. Both parts are normalized separately so that all individual probabilities in each part sum to one, i.e. to form a discrete probability density function. This combined representation is then loaded into OntoGen in a bag-of-visual-words format. B. Use cases So, let us look at at the ontology creation process. At ſrst, when the collection is loaded, there is only the generic root concept, which contains all the data. Each concept can be branched into several sub-concepts. The branching is per- formed by clustering the data from the current concept into a speciſed number of clusters which become child-nodes. This can be seen in Figure 2. The tree diagram is interactive and any particular node can be selected and examined by a single click. The most relevant features (visual words) of the concepts are written inside the node. Several images which are closest to the concept centroid are taken as representative images which help the user in taking a quick look to see what the concept represents, as given in Figure 3. In this particular case, the concept obviously comprises images depicting skyscrapers. 375 978-1-4577-1481-8/11/$26.00 ©2011 IEEE

Transcript of [IEEE 2011 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP)...

Page 1: [IEEE 2011 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP) - Cluj-Napoca, Romania (2011.08.25-2011.08.27)] 2011 IEEE 7th International Conference

OntoGen Extension for Exploring Image CollectionsNenad Tomasev∗, Blaz Fortuna∗, Dunja Mladenic∗

∗Arti cial Intelligence LaboratoryJozef Stefan InstituteLjubljana, Slovenia

Email: [email protected], [email protected], [email protected]

Abstract—OntoGen is a semi-automatic and data-driven ontol-ogy editor focusing on editing of topic ontologies. It utilizes textmining tools to make the ontology-related tasks simpler to theuser. This focus on building ontologies from textual data is whatwe are trying to bridge. We have successfully extended OntoGento work with image data and allow for ontology construction andediting on collections of labeled or unlabeled images. Browsinglarge heterogenous image collections ef ciently is certainly achallenging task - and we feel that semiautomatic ontologyconstruction, as described in this paper, makes this task easier.

I. INTRODUCTION

Ontologies are formal hierarchical knowledge represen-tations. The individual entities are mapped onto concepts,which are connected by various semantic relations. The tree-like structure results from concepts being divided into sub-concepts, providing more speci c information about the data,if needed. The use of ontologies in digital applications is in-creasing, since they allow for multi-level views of the data andhence, multi-level reasoning and inference. Domain ontologiesare structural representations of some speci c knowledgedomains.

OntoGen is a tool which provides for semiautomatic on-tology construction. [6] [2] [3] [4] [5] [1] It relies on textmining to extract relevant concepts from textual data and helpthe domain experts when creating ontologies.

This paper proposes exploration of image collections bysemi-automatic construction of ontology from images and wewish to achieve that by extending OntoGen so that it can beused to create image collection ontologies.

II. ONTOGEN EXTENSION FOR IMAGE COLLECTIONS

Since OntoGen was built to work on textual data, wedecided to extract such data from images that could be easilytransformed into the form of representation which is often usedin text processing - bag of words. When working with images,it is possible to deduce some typical features which can beviewed as visual words, so consequently we decided to baseour approach on the bag of visual words representation [9].Essentially, this means representing an image conceptually asa xed-length frequency histogram.

A. Image preprocessing

Images contain two types of information which may berelevant for the task at hand. Some information is contained inthe way the image is colored and some in, loosely speaking,

depicted edges and textures. The former can help us differen-tiate all the blue cars from all the red cars, while the latter isuseful in differentiating a blue car from a river.

We extract both types of information from each image. Forcolor information, we calculate a simple color distribution forsome prede ned number of bins. As local features we useSIFT features [8], since they are known to be quite robust andare frequently used in image mining applications. We only takeSIFT features for some keypoints, instead of the alternativeapproach where they are sampled from a grid imposed on animage.

After the SIFT features have been extracted for the entireimage collection, we take a random sample Ssift of featuresfrom the pool of all the computed features. k-means cluster-ing [7] is then performed on Ssift and cluster centroids aretaken to represent visual words. This collection of represen-tative features is called the codebook [9]. By mapping theindividual features from each image onto the codebook, weobtain a xed-length image representation. The dimensionalityof the codebook is arbitrary, but we had good results withcodebook sizes 400-1000.

The second part of the hybrid representation, color distribu-tion histogram, is then appended to the codebook representa-tion. Both parts are normalized separately so that all individualprobabilities in each part sum to one, i.e. to form a discreteprobability density function. This combined representation isthen loaded into OntoGen in a bag-of-visual-words format.

B. Use cases

So, let us look at at the ontology creation process. Atrst, when the collection is loaded, there is only the generic

root concept, which contains all the data. Each concept canbe branched into several sub-concepts. The branching is per-formed by clustering the data from the current concept into aspeci ed number of clusters which become child-nodes. Thiscan be seen in Figure 2. The tree diagram is interactive andany particular node can be selected and examined by a singleclick. The most relevant features (visual words) of the conceptsare written inside the node.

Several images which are closest to the concept centroidare taken as representative images which help the user intaking a quick look to see what the concept represents, asgiven in Figure 3. In this particular case, the concept obviouslycomprises images depicting skyscrapers.

375978-1-4577-1481-8/11/$26.00 ©2011 IEEE

Page 2: [IEEE 2011 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP) - Cluj-Napoca, Romania (2011.08.25-2011.08.27)] 2011 IEEE 7th International Conference

Fig. 1. The pipeline. OntoGen working on image data. Both the imagesthemselves and their bag-of-visual-words representation are loaded.

Fig. 2. Branching concepts while creating an ontology

A more detailed view of each concept can be producedby invoking the Document Atlas component, which performsmultidimensional scaling to project the data onto a plane,while trying to preserve similarities between points [4]. Eachpoint can be selected and the image displayed on the screen. Adensity map is also drawn at the back. An example is shown inFigure 4. This feature is particularly useful, since it allows usto easily observe groups of similar images. This informationis very useful when splitting the nodes.

The mentioned UI features allow for fast and robust semi-automatic ontology construction from image collections. Theyform a basic framework which we wish to further extend andimprove in the future.

The Figures 2,3,4 were generated by testing the system ona subset of ImageNet data (http://www.image-net.org/), whichis a large public collection of various image data sets.

Fig. 3. Representative concept data points

Fig. 4. Document atlas used to visualize a collection of images

III. FUTURE WORK

Given that OntoGen was designed to work with textual data,we would like to combine the two modes of work into asingle hybrid mode capable of handling multimedia data. Thechallenges lie in designing an easy and intuitive UI.

ACKNOWLEDGMENT

This work was supported by the bilateral project betweenSlovenia and Romania “Understanding Human Behavior forVideo Survailance Applications,” the Slovenian ResearchAgency and the ICT Programme of the EC PlanetData (ICT-NoE-257641).

REFERENCES

[1] B. Fortuna, M. Grobelnik, and D. Mladenic, “Semi-automatic construc-tion of topic ontology,” in Semantics, Web and Mining, Joint InternationalWorkshop, EWMF and KDO, 2005.

[2] ——, “Semi-automatic data-driven ontology construction system,” inProceedings of the 9th International multi-conference Information SocietyIS, 2006.

[3] ——, “System for semi-automatic ontology construction,” in Demo atESWC, 2006.

[4] ——, “System for semi-automatic ontology construction,” in Poster atWWW, 2006.

[5] ——, “Visualization of text document corpus,” Informatica 29, pp. 497–502, 2006.

[6] ——, “Ontogen: semi-automatic ontology editor,” in Proceedings of the2007 conference on Human interface: Part II. Springer-Verlag, 2007,pp. 309–318.

[7] J. Han, Data Mining: Concepts and Techniques. San Francisco, CA,USA: Morgan Kaufmann Publishers Inc., 2005.

[8] D. Lowe, “Object recognition from local scale-invariant features,” inICCV, 1999, pp. 1150–1157.

[9] Z. Zhang and R. Zhang, Multimedia Data Mining: a Systematic Intro-duction to Concepts and Theory. Chapman and Hall, 2008.

376