Post on 08-Jul-2020
Research ArticleDual-Layer Density Estimation for Multiple ObjectInstance Detection
Qiang Zhang123 Daokui Qu123 Fang Xu123 Kai Jia123 and Xueying Sun24
1State Key Laboratory of Robotics Shenyang Institute of Automation Chinese Academy of SciencesNo 114 Nanta Street Shenhe District Shenyang 110016 China2University of Chinese Academy of Sciences No 19A Yuquan Road Beijing 100049 China3SIASUN Robot amp Automation Co Ltd No 16 Jinhui Street Hunnan New District Shenyang 110168 China4Department of Information Service and Intelligent Control Chinese Academy of Sciences No 114 Nanta StreetShenhe District Shenyang 110016 China
Correspondence should be addressed to Qiang Zhang zhangqiangsiacn
Received 8 May 2016 Revised 19 July 2016 Accepted 1 August 2016
Academic Editor Luis Paya
Copyright copy 2016 Qiang Zhang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
This paper introduces a dual-layer density estimation-based architecture for multiple object instance detection in robot inventorymanagement applications The approach consists of raw scale-invariant feature transform (SIFT) feature matching and key pointprojectionThe dominant scale ratio and a reference clustering threshold are estimated using the first layer of the density estimationA cascade of filters is applied after feature template reconstruction and refined feature matching to eliminate false matchesBefore the second layer of density estimation the adaptive threshold is finalized by multiplying an empirical coefficient for thereference value The coefficient is identified experimentally Adaptive threshold-based grid voting is applied to find all candidateobject instances Error detection is eliminated using final geometric verification in accordance with Random Sample Consensus(RANSAC) The detection results of the proposed approach are evaluated on a self-built dataset collected in a supermarket Theresults demonstrate that the approach provides high robustness and low latency for inventory management application
1 Introduction
With the development of robotics humanoid robots havebeen introduced in innumerable applications Among theavailable functionalities of the humanoid robot specificobject detection has attracted increasing attention in recentyears Inventory management autosorting and pick-and-place system are typical applications Unlike single-objectdetection multiple-instance detection is a more challengingtask In this paper we focus on the goal of multiple objectinstance detection for robot inventory management andpropose an effective approach to achieve this goal
Multiple object instance detection is a complex technol-ogy that encounters a variety of difficulties First diversitiesof species shapes colors and sizes of objects make it difficultto accomplish the fixed goal Moreover target objects appeardifferent in different environments For example changes inscale orientation and illumination increase uncertainty and
ambiguity for identification Additionally multiple instancescan affect the verification procedure
There are two representative types of techniques for objectinstance detection the training and learning-based approachand the template-based approach The latter approachincludes an extensive range of template forms such as edgeboxes [1] patches [2] and local features Local feature match-ing based object detection method has received considerableattention from researchers because of its notable advantagesin overcoming a portion of the deficiencies caused by scalerotation and illumination changes Scale-invariant featuretransform (SIFT) [3] was proposed by Lowe in 2004 and hasbeen widely applied in many situations due to its robustnessA new approach called PCA-SIFT [4] was proposed tosimplify the calculations and decrease storage space Themain concept of PCA-SIFT is dimension reduction In 2005Mikolajczyk and Schmid proposed the gradient location andorientation histogram (GLOH) [5] The GLOH is a SIFT-like
Hindawi Publishing CorporationJournal of SensorsVolume 2016 Article ID 6937852 12 pageshttpdxdoiorg10115520166937852
2 Journal of Sensors
descriptor that uses a log-polar transformation hierarchyrather than four quadrants The original high dimensionalityof its descriptor can be reduced using PCA In 2008 Bayet al developed a prominent method known as speeded uprobust features (SURF) [6] based on improvements in theconstruction of SIFT features In [5 7] the performances oflocal feature descriptors such as SIFT PCA-SIFTGLOH andSURF were compared According to [5 7] PCA-SIFT andSURF have advantages in terms of speed and illuminationchanges whereas SIFT and GLOH are invariant to rotationscale changes and affine transformations
Feature matching is a basic procedure in object detectionFeature matching is typically performed by comparing thesimilarity of two feature descriptors In fact raw matchesoften contain a large number of mistakes thus false matchelimination is necessaryThe classical approaches are the ratiotest [3] bidirectional matching algorithm [8] and RANSAC[9] In addition a remarkable method based on scale restric-tion [10 11] was proposed This method first estimates adominant scale ratio using statistics after prematchingThenfeatures are reextracted from the high-resolution image atan adjusted Gaussian smoothing parameter according to thedominant scale ratio After refined matching feature pairsthat do not conform to a certain scale ratio restriction arerejected This method is adopted in our work due to its highperformance In addition we provide a new approach to gain-ing access to the dominant scale ratio In 2010 Arandjelovicand Zisserman [12] considered that theHellinger kernel leadsto superior matching results compared to Euclidean distancein SIFT feature matching
Lin et al [13] used a key point coordinate clusteringmethod for duplicate object detection Regions of interestare detected using an adaptive window search Wu et al[14] reported an improved graph-based method to locateobject instances In [15] Collet et al proposed a scalableapproach known as MOPED The framework first clustersmatched features and generates hypothesis models Potentialinstances can be found after an iterative process for poserefinement However the key point coordinates obtainedfrom the clustering results in [13ndash15] might be unreliablebecause the key points are sparsely distributed Alternativelyapproaches based on Hough voting were proposed andapplied in [16ndash18] The Hough voting based approach locatespossible instances according to feature mapping and densityestimation Specifically themethod in [16] appliesmean-shiftin the voting step Similarly grid voting was adopted in [19]Although Hough voting is an effective approach for multipleobject instance detection the clustering radius formean-shiftor grid voting should be preset by experience which leads tolow adaptability and accuracy
In this paper we present a new architecture that improvesmultiple object instance detection accuracy by consideringthe adaptive selection of the optimal clustering threshold anda cascade of filters for false feature match elimination Thecontributions of our work are as follows
(i) We propose an architecture for multiple objectinstance detection based on dual-layer density esti-mationThe first layer calculates an optimal clustering
threshold for the second layer and applies a constraintfor the next scale restriction-based false match elim-ination The second layer aims to detect all candidateobject instances The proposed strategy can reducethe possibility of mismatch and improve detectionaccuracy Compared to traditional methods whichneed to set the threshold manually the proposedadaptive clustering threshold computation methodleads to stronger environmental flexibility and higherrobustness
(ii) We introduce a new method to compute and verifythe value of the dominant scale ratio between thetraining image and query image Rather than usinga histogram statistical method for matched featuresthe value is derived from the first layer of the densityestimationThen the value is tested by an approximateone which is obtained based on the homographymatrix According to our experiments the proposedmethod is more robust for dominant scale ratioestimation compared to the conventional methods
The remainder of this paper is organized as follows Sec-tion 2 describes the proposed architecture according to ourparticular application background Details of the proposedmethod are discussed in Section 3 A variety of experimentsare designed to evaluate our approach The experimentalmethodology results and discussions are presented in Sec-tion 4 Finally Section 5 summarizes our contributions andpresents conclusions
2 Framework Overview
In this section we provide an introduction to the backgroundof our work and briefly explain the proposed architecture
Our work develops a service robot for a supermarketThepurpose of the robot is to count the goods before the start ofbusiness and provide feedback to the staff to ensure adequatesupplies Because no standard database exists for our specificapplication we created a database for 70 types of man-madeproducts to evaluate our algorithm
The lighting conditions in the supermarket are generallyuniform and thus we collected training images for each itemunder same lighting conditions One image was obtainedfrom the front and another 24 were captured from 24different directionsThe frontal object image serves for objectrecognition and all 25 sequence images were used to build asparse 3D model for recovering pose of the identified objectAll training images were captured at the distance which isapproximately equal to the minimum safe distance betweenthe robot and shelves This sampling method can ensurethat the training image has more details To validate ourarchitecture the training database was divided into three setsbased on the density of textures The set with the highestdensity of textures contains 20 types of products the set withamediumdensity of textures has 30 types and the set with thelowest density of textures includes 20 types For each objectthere were 2 to 40 instances in the scene image
Our proposed method is based on local features whichcan provide information about scale and rotation and SIFT
Journal of Sensors 3
SURF and PCA-SIFT are three alternatives According to[5 7] SIFT has better performance in scale and rotationchange than SURF and PCA-SIFT thus SIFT is used inour work although it is time-consuming The proposedframework is based on SIFT feature extraction and featurematching by considering the specific application backgroundThe framework consists of two phases the offline trainingphase and the online detection phase A graphic illustrationof the proposed approach is shown in Figure 1 To make ouralgorithm more explicit we make selected arrangements inadvance First the term key point refers to a point with 2Dcoordinates and the point is detected by SIFT theory Theterm descriptor represents a 128-dimensional SIFT featurevector The term feature consists of a description vector andthe scale orientation and coordinate of the SIFT point
In the offline phase as shown in Figure 1(a) an initialvalue of the Gaussian smoothing parameter is given inadvance The SIFT features are extracted from the trainingimages for certain objects Reference vectors between all keypoints and the object center are computed to locate the objectcentroid All features are stored in a retrieval structure toreduce time overhead during detection On the other handwe created a sparse 3D model for each object with a standardStructure fromMotion algorithm [20] and each 3D point wasassociated with a corresponding SIFT descriptor
The online detection phase is a dual-layer density esti-mation-basedmethodThe first layer exists for two purposesto compute the dominant scale ratio between the trainingimage and query image (Figures 1(b)ndash1(e)) and to calculatea reference clustering threshold for the second layer ofdensity estimation (Figures 1(f)ndash1(i)) At the beginning offeature extraction for the query image an initial value ofthe Gaussian smoothing parameter is given the same asin the training phase All descriptors extracted from thevideo footage are matched to their nearest neighbors in thedatabase (Figure 1(b)) and the key points are projected totheir reference centers (Figure 1(c)) A valid object centerwith a maximum density value can be found using kerneldensity estimation (Figure 1(d)) Considering that objectinstances in our applications have nearly the same scale thedominant scale ratio and an effective clustering thresholdare computed accordingly (Figure 1(e)) The second layerof density estimation detects all possible instances Firstthe feature template is reconstructed based on the initialvalue of the base scale and the calculated dominant scaleratio (Figure 1(f)) The majority of false feature matches areremoved by a cascade of filters based on the distance ratio testand scale restriction (Figure 1(g)) The key point projectionand 2D clustering methods are applied to find all candidateobject centers (Figure 1(h)) The final geometric verificationprocedure can eliminate incorrect detection results anddetermine each instancersquos pose (Figure 1(i))
3 Description of the Proposed Method
In this section we introduce our work in detail in accordancewith the aforementioned architectureThe schematic diagramfor the offline training phase and the flowchart of the onlinedetection are shown in Figures 2 and 4 respectively
31 Offline Training Template Generation and RetrievalStructure Construction Indeed the proposedmethod can beapplied in conjunction with any scale and rotation invariantfeatures As is described in Section 2 SIFT is applied inour work for its robustness To create templates for all typesof object instances frontal images of the targets must becaptured As noted in Section 2 the light conditions in ourapplication are relatively invariant In addition we assumethat all object instances face front outward SIFT is ableto work properly under these conditions Thus we cancollect one frontal image for each type of product for objectrecognition Besides for the following object pose estimationa sparse 3D model for each object was created (as shownin Figure 3) and thus 24 other images were captured atapproximately equally spaced intervals in a circle around eachobject According to SIFT theory the Gaussian smoothingparameter should be given first Suppose that the initial valueis set to 120590TrainInit = 120590119900 In this work 120590
119900is a fixed value as is
described in Section 4 and the SIFT feature extraction takesplace
We assume that the number of features for a specificobject is 119899 Each SIFT feature descriptor is a 128-dimensionalvector 119891
119894 where 119894 = 1 2 119899 Similarly the scale of the
feature is 119904119894 the principle orientation is 120579
119894 and its coordinate
is 119888119894(119909119894 119910119894) Coordinate differences V
119894between each SIFT key
point 119888119894(119909119894 119910119894) and the related object centroid 119888
119900(119909119900 119910119900) are
calculated according to the following
V119894119900= [
Δ119909119894
Δ119910119894
] = [
119909119894
119910119894
] minus [
119909119900
119910119900
] (1)
Featurematching is a subprocedure in ourmultiple objectinstance detection architecture The process is used to findthe most similar feature in the dataset based on a distancemeasurement In our work the Hellinger distance measure-ment is applied due to its robustness according to [12]Feature matching is typically a time-consuming process Theconstruction of an effective retrieval structure is necessaryfor speeding up the detection phase Two types of effectiveretrieval methods are currently available tree-basedmethodsand hashing-based methods The randomized kd-tree [2122] hierarchical 119896-means tree [21 22] and vocabulary tree[23] are typical representatives of tree-based methods Localsensitive hashing (LSH) [24 25] and SSH [26] are tworepresentative hashing-based methods In all of the feasiblemethods near-optimal hashing algorithms [27] have provento be highly efficient and accurate and this method waschosen for our work Construction of multiple independenttrees to form a forest is necessary to reduce the false negativeand false positive rates
32 Online Multiple Object Instances Detection
321 Feature Extraction for Query Image and Feature Match-ing During online detection the system first obtains accessto a new captured video frame SIFT key points are detectedand descriptors are extracted in the same manner as thefirst part of offline procedure The Gaussian smoothingparameter is also set as 120590Query = 120590119900 Then the near-optimal
4 Journal of Sensors
Database
Offline
(a)
(b)
(f) (g) (h) (i)
(c) (d) (e)
Effective training imageDominant scale ratioClustering threshold
Figure 1 Overview of the proposed framework (a) offline phase for constructing the retrieval structure (b)ndash(e) first layer of densityestimation (b) local feature detection (c) feature matching and key point mapping (d) first layer of density estimation and (e) intermediateresults (f)ndash(i) second layer of density estimation (f) feature template reconstruction (g) false matching result elimination and (h) clusteringfor candidate instances detection (i) geometric verification
Key points detection and
descriptors extraction
Reference vectors calculation
Retrieval structure construction Database
Frontal object images
Scale
Feature descriptors ScalesOrientationsReference vectors Original training images
120590o
Figure 2 Offline training procedure
Figure 3 3D sparse model of packing box from 25 images
hashing algorithm takes effect During feature matching lowdiscriminable matches are discarded based on ratio test ofdistances between the nearest neighbor and second nearestneighbor which was proposed in [3]
322 Key Points Projection and Object Center EstimationTheprinciple of key point projection is illustrated in Figure 5In Figure 5 the left part is the training image and the rightpart is the query image Regarding the middle part the solid
region is a matched patch from the query image and the areaformedbydotted lines is assumed to be the ideal case inwhichthere is only similarity transform Assume that the matchingpair of features is 119891
119894and 119891119895 where 119891
119894is from the database and
119891119895is from the query image The key points corresponding to
these two features are 119901119894(119909119894 119910119894) and 1199011015840
119895(1199091015840
119895 1199101015840
119895) As for a plane
object the center 1198881015840119900119895(1199091015840
1199001198951199101015840
119900119895) related to 119891
119895 can be estimated
according to (2)ndash(5)In the formulas 1199041015840
119895and 1205791015840119895are the corresponding scale and
orientation of feature 1198911015840119895 Similarly 119904
119894and 120579
119894are related to
feature 119891119894in the training image For each pair of matching
features there is a normalized deflection angle 120576119895between the
normal vector of an object surface and camera optical axisfor each matched features According to (5) the estimatedcenters would be located in a small range of areas aroundthe real center when the training image is the exact imagecorresponding to the ordered object instance and 120576
119895has an
extremely small value
120579 = 1205791015840
119895minus 120579119894 (2)
As shown in Figure 5 reference centers are distributedin small areas Then the problem of determining the center
Journal of Sensors 5
Result
false results eliminationObject level
Clustering based on Tr
Key points projection
False matches eliminationbased on sr
Feature matching
Feature extraction
Scale setting 120590 = sr times 120590o
Get access to the validtraining image
Begin
Query image acquisition
Feature extraction
Scale setting 120590 = 120590o
Database
Feature matching
Key points projection
Kernel densityestimation
Dominant scale ratio sr
clustering thresholdTr computation
and reference
Figure 4 Online detection flowchart
Training image Query image
Matched
Optic axis
features
i
pi
p998400j
c998400oco120576j
Figure 5 Key points projection principle diagram
coordinates is converted into a density estimation problemThe first layer of density estimation aims to find one of thevalid centers in the query image Object center estimationis a crucial problem A two-stage procedure-based adaptivekernel density estimation method elaborated in [28] isemployed to improve the precision Only those density valuesassociatedwith themapped key points are calculated to speedup the process The point with the highest density value issaved Although this point may be not the exact center it isa typical approximationThus the mapped point is identifiedas a valid center Simultaneously the exact training image canbe obtained As is illustrated in Figure 6 the blue point is theobtained object center
[
[
1199091015840
119900119895
1199101015840
119900119895
]
]
= [
1199091015840
119895
1199101015840
119895
] +
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times cos 120576
119895(3)
= [
[
1199091015840
119895
1199101015840
119895
]
]
+
1199041015840
119895
119904119894
[
[
cos 120579 minus sin 120579
sin 120579 cos 120579]
]
times V119894
times (1 minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot)
(4)
= [
[
1199091015840
119900119895
1199101015840
119900119895
]
]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
RealCenter
+
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times (minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot )
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
DistributionRange
(5)
6 Journal of Sensors
Columns
Row
s
Training image Query image
Tr
Figure 6 Reference clustering threshold calculation
323 Dominant Scale Ratio Estimation and Scale Restriction-Based False Matches Elimination The dominant scale ratioserves two purposes false match elimination and calculationof a reference clustering radius for the second layer of densityestimation In contrast to the conventional methods in [1011] the dominant scale ratio in our work can be derivedaccording to (6) based on the assumption that the estimatedcenter has a typical scale ratio value In (6) sr is the orientedscale ratio 1199041015840
119898is the scale of the key point related to the
estimated object center and 119904119899is the scale of the matched key
point in the training image
sr =1199041015840
119898
119904119899
(6)
Once the valid center is found the points that supportthe center are recordedThese points are used to calculate thehomography matrix119867
119900for the pattern The matrix is shown
in (7) Because the minimum safe distance between the robotand the shelves is far enough which means the camera onthe robot is far from the targets the actual homography issufficiently close to affine transformationThen the dominantscale ratio sr1015840 can also be computed according to (8)Then sr1015840is used to verify sr Only if the value of sr is approximate tosr1015840 the value of sr is confirmed to be correct We use (9) toassess the similarity between the two values
119867119900=[[
[
ℎ11ℎ12ℎ13
ℎ21ℎ22ℎ23
ℎ31ℎ32
1
]]
]
(7)
sr1015840 = radic1003816100381610038161003816ℎ11times ℎ22
1003816100381610038161003816+1003816100381610038161003816ℎ12times ℎ21
1003816100381610038161003816 (8)
100381610038161003816100381610038161003816100381610038161003816
sr minus sr1015840
min (sr sr1015840)
100381610038161003816100381610038161003816100381610038161003816
lt 15 (9)
To find all possible object instances a SIFT feature-basedtemplate of the ordered object must be reconstructed (seeFigure 1(f))The Gaussian smoothing factor is to be set basedon the dominant scale ratio and is adjusted in accordancewith (10) A new retrieval structure is constructed after SIFTfeatures are detected Then features obtained from the queryimage above are matched to the new dataset Due to theaforementioned preprocessing the amount of SIFT featuresin the newly constructed database is reduced compared
to offline training phase Thus the time overhead of thematching process is greatly reduced
120590TrainAdjust = sr times 120590119900 (10)
The strategy of feature matching disambiguation hereis a cascade of filters These filters can be divided into theratio test algorithm (proposed in [3]) scale restriction-basedmethod (presented in [11]) and geometric verification-basedapproachThe ratio test and scale restrictionmethods use thefollowing matching process The geometric verification takeseffect after clustering After this series of filters most of falsematches can be eliminated
324 Reference Clustering Threshold Computation and Can-didate Object Instances Detection Traditional methods fordetecting multiple object instances such as mean-shift andgrid voting are based on density estimation However thesemethods have the same disadvantage that the bandwidthmust be given by experience For example in [16] the cluster-ing thresholdwas set to a specific value In [19] the voting gridsize was set to the value associated with the size of the queryimage Nevertheless this approachmay still lead to unreliableresults For our specific application occasion the clusteringthreshold can be estimated based on the size of trainingimage and the aforementioned dominant scale ratio Beforethe clustering threshold is finally determined a referenceclustering threshold should be computed automatically Herethe reference clustering threshold can be estimated based on(11) In the formula119879
119903is the reference clustering threshold sr
is the oriented scale ratio and rows and cols are the numbersof rows and columns in the training image respectivelyAs noted above the mapped key points are located insmall regions around real centroids Therefore the clusteringthreshold Th can be finalized in line with (12) in which 119896 isa correction factor According to our repeated experimentsdescribed in Section 4 we provide a recommended value for119896 Candidate object instance detection is based on the secondlayer of density estimation Grid voting is employed here dueto its high precision and recall
119879119903=
sr times rows if rows lt cols
sr times cols otherwise(11)
Th = 119896 times 119879119903 (12)
33 Object Level False Result Elimination In the procedurefor eliminating false detection results we first calculate thehomography matrix for each cluster Then four corners ofthe training image are projected onto four new coordinatesAs a result a convex quadrilateral in accordance with thefour mapped corners is produced Here we provide a simplebut effective way to assess whether the system has obtainedcorrect object instances and error detections are eliminatedThe criterion is as follows
119888min leArea (Quadrilateral)
sr2 times Area (TrainingImage)le 119888max (13)
Journal of Sensors 7
(a) (b) (c)
Figure 7 Examples of objects with different texture levels (a) high texture (b) medium texture (c) low texture
In (13) Area(Quadrilateral) is the area of the convexquadrilateral derived from each candidate object instanceArea(TrainingImage) is the area of the training imageAccording to (13) if the detection is accurate the ratiocoefficient between the area of the quadrilateral and thetraining image is approximate to sr2 The threshold 119888min and119888max should be set before verification
Finally for each cluster the features are matched to the3D sparse model created in the offline training procedureA noniterative method called EPnp [29] was employed toestimate pose for each object instance
4 Experiments
41 Experimental Methodology We are developing a servicerobot for the detection and manipulation of multiple objectinstances and there is no standard database for our specificapplication To validate our approach we created a databasefor 70 types of products with different shapes colors andsizes in a supermarket Objects to be detected were placedon shelves with the front outside All images were capturedusing a SONYRGB cameraThe resolution of the camera was1240 times 780 pixels To comprehensively evaluate the accuracyof the proposed architecture the database was divided intothree sets according to the texture level of the objects Figure 7shows examples of objects with different texture levels
We designed three experiments to evaluate the proposedarchitecture The first experiment was to verify whether thescale ratio calculation and false eliminationmethod were fea-sible The second one was to examine whether the proposedclustering threshold computation method was effective Thelast experiment was to comprehensively evaluate the perfor-mance of the proposed architectureThese three experimentswere designed as follows
(i) Experiment I for each training image in the databasewe acquired an image considering that the objectinstance in the image had the same scale as thetraining image Then the captured images weredownsampled The size of the resampled imageswere 100 75 50 and 25 of the original sizeWe calculated the dominant scale ratios based onthe conventional histogram statistics and proposedmethod separately Then the accuracy of both valueswas compared The feature matching and key point
projection results with and without false eliminationwere also recorded and compared
(ii) Experiment II we first calculated a clustering thresh-old according to (14)Thenwe tested the performanceof the conventional methods (mean-shift and gridvoting) based on changing the clustering thresholdcontinuously Here an approximate nearest neigh-bor searching method was employed to speed upmean-shift Because the thresholds could not bedirectly compared in different experiments we usedthe multiple of the computed threshold in differentexperiments to express the new value In (14) CR isthe bandwidth for mean-shift GS is the grid size forgrid voting and 119896MS and 119896GV are the coefficients Wechose an optimal threshold value according to theexperimental results In the experiment the thresholdratio parameters were sampled as 119896MS = 119896GV =26 24 22 20 19 18 17 16 14 12 10 08
CR = 12
times 119896MS times 119879119903 using mean-shif t
GS = 119896GV times 119879119903 using grid voting (14)
(iii) Experiment III we compared the proposed methodwith the conventional grid voting on three types ofdatasets The experimental conditions of the con-ventional grid voting were as follows width andheight of the grid are 1130 of the width and theheight of the query image and the voting grid hadan overlap of 25 of size with an adjacent gridThe performances of the proposed method and theconventional grid voting were expressed in terms ofthe accuracy (precision and recall) and computationaltime
In all the experiments the parameters for SIFT featureextraction and the threshold for feature matching were setas the default values in [3] In particular the initial Gaussiansmoothing parameter was set as 120590
119900= 16 and the default
threshold on key point contrast was set to 01 In theverification procedure in our experiments thresholds 119888minand 119888max were set as 08 and 12 respectively In our work allof the experiments have been conducted on Windows 7 PCwith Core i7-4710MQ CPU 250GHz and 8GB RAM
8 Journal of Sensors
sr = 100 sr = 074
sr = 048 sr = 0254
(a) Center estimation and dominant scale ratio computation by proposedmethod
2000
1500
1000
500
0
1000
500
0
0 1 2
Scale ratio
0 1 2
Scale ratio0 1 2
Scale ratio
0 1 2
Scale ratio
Freq
uenc
yFr
eque
ncy
Freq
uenc
yFr
eque
ncy
sr = 099 sr = 075
sr = 0234sr = 047
300
200
100
0
1500
1000
500
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 8 The first example of dominant scale ratio computation
sr = 101 sr = 075
sr = 050 sr = 0251
(a) Center estimation and dominant scale ratio computation by proposedmethod
0 1 2
0 1 2
Scale ratio0 1 2
Scale ratio
Scale ratio0 1 2
Scale ratio
1000
500
0
Freq
uenc
y
1000
500
0Fr
eque
ncy
sr = 029 sr = 021
sr = 052sr = 021
Freq
uenc
y
400
300
200
100
0
Freq
uenc
y
60
40
20
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 9 The second example of dominant scale ratio computation
42 Experimental Results and Analysis
421 Results of the Dominant Scale Ratio Computation andScale Restriction-Based False Match Elimination Figures 8and 9 display the results of two examples for computing thedominant scale ratios Figures 8(a) and 9(a) are the resultsof the proposed method whereas Figures 8(b) and 9(b) are
the results of the conventional method The reference scaleratios are 100 75 50 and 25 in these figures In Figures8(a) 8(b) and 9(a) the calculated results are close to thereference valuesHowever in Figure 9(b) the results obtainedby the conventional method are not reliable The reason forthe error in Figure 9(b) is that the background noise is toosevere and the extracted features may have nearly the same
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
2 Journal of Sensors
descriptor that uses a log-polar transformation hierarchyrather than four quadrants The original high dimensionalityof its descriptor can be reduced using PCA In 2008 Bayet al developed a prominent method known as speeded uprobust features (SURF) [6] based on improvements in theconstruction of SIFT features In [5 7] the performances oflocal feature descriptors such as SIFT PCA-SIFTGLOH andSURF were compared According to [5 7] PCA-SIFT andSURF have advantages in terms of speed and illuminationchanges whereas SIFT and GLOH are invariant to rotationscale changes and affine transformations
Feature matching is a basic procedure in object detectionFeature matching is typically performed by comparing thesimilarity of two feature descriptors In fact raw matchesoften contain a large number of mistakes thus false matchelimination is necessaryThe classical approaches are the ratiotest [3] bidirectional matching algorithm [8] and RANSAC[9] In addition a remarkable method based on scale restric-tion [10 11] was proposed This method first estimates adominant scale ratio using statistics after prematchingThenfeatures are reextracted from the high-resolution image atan adjusted Gaussian smoothing parameter according to thedominant scale ratio After refined matching feature pairsthat do not conform to a certain scale ratio restriction arerejected This method is adopted in our work due to its highperformance In addition we provide a new approach to gain-ing access to the dominant scale ratio In 2010 Arandjelovicand Zisserman [12] considered that theHellinger kernel leadsto superior matching results compared to Euclidean distancein SIFT feature matching
Lin et al [13] used a key point coordinate clusteringmethod for duplicate object detection Regions of interestare detected using an adaptive window search Wu et al[14] reported an improved graph-based method to locateobject instances In [15] Collet et al proposed a scalableapproach known as MOPED The framework first clustersmatched features and generates hypothesis models Potentialinstances can be found after an iterative process for poserefinement However the key point coordinates obtainedfrom the clustering results in [13ndash15] might be unreliablebecause the key points are sparsely distributed Alternativelyapproaches based on Hough voting were proposed andapplied in [16ndash18] The Hough voting based approach locatespossible instances according to feature mapping and densityestimation Specifically themethod in [16] appliesmean-shiftin the voting step Similarly grid voting was adopted in [19]Although Hough voting is an effective approach for multipleobject instance detection the clustering radius formean-shiftor grid voting should be preset by experience which leads tolow adaptability and accuracy
In this paper we present a new architecture that improvesmultiple object instance detection accuracy by consideringthe adaptive selection of the optimal clustering threshold anda cascade of filters for false feature match elimination Thecontributions of our work are as follows
(i) We propose an architecture for multiple objectinstance detection based on dual-layer density esti-mationThe first layer calculates an optimal clustering
threshold for the second layer and applies a constraintfor the next scale restriction-based false match elim-ination The second layer aims to detect all candidateobject instances The proposed strategy can reducethe possibility of mismatch and improve detectionaccuracy Compared to traditional methods whichneed to set the threshold manually the proposedadaptive clustering threshold computation methodleads to stronger environmental flexibility and higherrobustness
(ii) We introduce a new method to compute and verifythe value of the dominant scale ratio between thetraining image and query image Rather than usinga histogram statistical method for matched featuresthe value is derived from the first layer of the densityestimationThen the value is tested by an approximateone which is obtained based on the homographymatrix According to our experiments the proposedmethod is more robust for dominant scale ratioestimation compared to the conventional methods
The remainder of this paper is organized as follows Sec-tion 2 describes the proposed architecture according to ourparticular application background Details of the proposedmethod are discussed in Section 3 A variety of experimentsare designed to evaluate our approach The experimentalmethodology results and discussions are presented in Sec-tion 4 Finally Section 5 summarizes our contributions andpresents conclusions
2 Framework Overview
In this section we provide an introduction to the backgroundof our work and briefly explain the proposed architecture
Our work develops a service robot for a supermarketThepurpose of the robot is to count the goods before the start ofbusiness and provide feedback to the staff to ensure adequatesupplies Because no standard database exists for our specificapplication we created a database for 70 types of man-madeproducts to evaluate our algorithm
The lighting conditions in the supermarket are generallyuniform and thus we collected training images for each itemunder same lighting conditions One image was obtainedfrom the front and another 24 were captured from 24different directionsThe frontal object image serves for objectrecognition and all 25 sequence images were used to build asparse 3D model for recovering pose of the identified objectAll training images were captured at the distance which isapproximately equal to the minimum safe distance betweenthe robot and shelves This sampling method can ensurethat the training image has more details To validate ourarchitecture the training database was divided into three setsbased on the density of textures The set with the highestdensity of textures contains 20 types of products the set withamediumdensity of textures has 30 types and the set with thelowest density of textures includes 20 types For each objectthere were 2 to 40 instances in the scene image
Our proposed method is based on local features whichcan provide information about scale and rotation and SIFT
Journal of Sensors 3
SURF and PCA-SIFT are three alternatives According to[5 7] SIFT has better performance in scale and rotationchange than SURF and PCA-SIFT thus SIFT is used inour work although it is time-consuming The proposedframework is based on SIFT feature extraction and featurematching by considering the specific application backgroundThe framework consists of two phases the offline trainingphase and the online detection phase A graphic illustrationof the proposed approach is shown in Figure 1 To make ouralgorithm more explicit we make selected arrangements inadvance First the term key point refers to a point with 2Dcoordinates and the point is detected by SIFT theory Theterm descriptor represents a 128-dimensional SIFT featurevector The term feature consists of a description vector andthe scale orientation and coordinate of the SIFT point
In the offline phase as shown in Figure 1(a) an initialvalue of the Gaussian smoothing parameter is given inadvance The SIFT features are extracted from the trainingimages for certain objects Reference vectors between all keypoints and the object center are computed to locate the objectcentroid All features are stored in a retrieval structure toreduce time overhead during detection On the other handwe created a sparse 3D model for each object with a standardStructure fromMotion algorithm [20] and each 3D point wasassociated with a corresponding SIFT descriptor
The online detection phase is a dual-layer density esti-mation-basedmethodThe first layer exists for two purposesto compute the dominant scale ratio between the trainingimage and query image (Figures 1(b)ndash1(e)) and to calculatea reference clustering threshold for the second layer ofdensity estimation (Figures 1(f)ndash1(i)) At the beginning offeature extraction for the query image an initial value ofthe Gaussian smoothing parameter is given the same asin the training phase All descriptors extracted from thevideo footage are matched to their nearest neighbors in thedatabase (Figure 1(b)) and the key points are projected totheir reference centers (Figure 1(c)) A valid object centerwith a maximum density value can be found using kerneldensity estimation (Figure 1(d)) Considering that objectinstances in our applications have nearly the same scale thedominant scale ratio and an effective clustering thresholdare computed accordingly (Figure 1(e)) The second layerof density estimation detects all possible instances Firstthe feature template is reconstructed based on the initialvalue of the base scale and the calculated dominant scaleratio (Figure 1(f)) The majority of false feature matches areremoved by a cascade of filters based on the distance ratio testand scale restriction (Figure 1(g)) The key point projectionand 2D clustering methods are applied to find all candidateobject centers (Figure 1(h)) The final geometric verificationprocedure can eliminate incorrect detection results anddetermine each instancersquos pose (Figure 1(i))
3 Description of the Proposed Method
In this section we introduce our work in detail in accordancewith the aforementioned architectureThe schematic diagramfor the offline training phase and the flowchart of the onlinedetection are shown in Figures 2 and 4 respectively
31 Offline Training Template Generation and RetrievalStructure Construction Indeed the proposedmethod can beapplied in conjunction with any scale and rotation invariantfeatures As is described in Section 2 SIFT is applied inour work for its robustness To create templates for all typesof object instances frontal images of the targets must becaptured As noted in Section 2 the light conditions in ourapplication are relatively invariant In addition we assumethat all object instances face front outward SIFT is ableto work properly under these conditions Thus we cancollect one frontal image for each type of product for objectrecognition Besides for the following object pose estimationa sparse 3D model for each object was created (as shownin Figure 3) and thus 24 other images were captured atapproximately equally spaced intervals in a circle around eachobject According to SIFT theory the Gaussian smoothingparameter should be given first Suppose that the initial valueis set to 120590TrainInit = 120590119900 In this work 120590
119900is a fixed value as is
described in Section 4 and the SIFT feature extraction takesplace
We assume that the number of features for a specificobject is 119899 Each SIFT feature descriptor is a 128-dimensionalvector 119891
119894 where 119894 = 1 2 119899 Similarly the scale of the
feature is 119904119894 the principle orientation is 120579
119894 and its coordinate
is 119888119894(119909119894 119910119894) Coordinate differences V
119894between each SIFT key
point 119888119894(119909119894 119910119894) and the related object centroid 119888
119900(119909119900 119910119900) are
calculated according to the following
V119894119900= [
Δ119909119894
Δ119910119894
] = [
119909119894
119910119894
] minus [
119909119900
119910119900
] (1)
Featurematching is a subprocedure in ourmultiple objectinstance detection architecture The process is used to findthe most similar feature in the dataset based on a distancemeasurement In our work the Hellinger distance measure-ment is applied due to its robustness according to [12]Feature matching is typically a time-consuming process Theconstruction of an effective retrieval structure is necessaryfor speeding up the detection phase Two types of effectiveretrieval methods are currently available tree-basedmethodsand hashing-based methods The randomized kd-tree [2122] hierarchical 119896-means tree [21 22] and vocabulary tree[23] are typical representatives of tree-based methods Localsensitive hashing (LSH) [24 25] and SSH [26] are tworepresentative hashing-based methods In all of the feasiblemethods near-optimal hashing algorithms [27] have provento be highly efficient and accurate and this method waschosen for our work Construction of multiple independenttrees to form a forest is necessary to reduce the false negativeand false positive rates
32 Online Multiple Object Instances Detection
321 Feature Extraction for Query Image and Feature Match-ing During online detection the system first obtains accessto a new captured video frame SIFT key points are detectedand descriptors are extracted in the same manner as thefirst part of offline procedure The Gaussian smoothingparameter is also set as 120590Query = 120590119900 Then the near-optimal
4 Journal of Sensors
Database
Offline
(a)
(b)
(f) (g) (h) (i)
(c) (d) (e)
Effective training imageDominant scale ratioClustering threshold
Figure 1 Overview of the proposed framework (a) offline phase for constructing the retrieval structure (b)ndash(e) first layer of densityestimation (b) local feature detection (c) feature matching and key point mapping (d) first layer of density estimation and (e) intermediateresults (f)ndash(i) second layer of density estimation (f) feature template reconstruction (g) false matching result elimination and (h) clusteringfor candidate instances detection (i) geometric verification
Key points detection and
descriptors extraction
Reference vectors calculation
Retrieval structure construction Database
Frontal object images
Scale
Feature descriptors ScalesOrientationsReference vectors Original training images
120590o
Figure 2 Offline training procedure
Figure 3 3D sparse model of packing box from 25 images
hashing algorithm takes effect During feature matching lowdiscriminable matches are discarded based on ratio test ofdistances between the nearest neighbor and second nearestneighbor which was proposed in [3]
322 Key Points Projection and Object Center EstimationTheprinciple of key point projection is illustrated in Figure 5In Figure 5 the left part is the training image and the rightpart is the query image Regarding the middle part the solid
region is a matched patch from the query image and the areaformedbydotted lines is assumed to be the ideal case inwhichthere is only similarity transform Assume that the matchingpair of features is 119891
119894and 119891119895 where 119891
119894is from the database and
119891119895is from the query image The key points corresponding to
these two features are 119901119894(119909119894 119910119894) and 1199011015840
119895(1199091015840
119895 1199101015840
119895) As for a plane
object the center 1198881015840119900119895(1199091015840
1199001198951199101015840
119900119895) related to 119891
119895 can be estimated
according to (2)ndash(5)In the formulas 1199041015840
119895and 1205791015840119895are the corresponding scale and
orientation of feature 1198911015840119895 Similarly 119904
119894and 120579
119894are related to
feature 119891119894in the training image For each pair of matching
features there is a normalized deflection angle 120576119895between the
normal vector of an object surface and camera optical axisfor each matched features According to (5) the estimatedcenters would be located in a small range of areas aroundthe real center when the training image is the exact imagecorresponding to the ordered object instance and 120576
119895has an
extremely small value
120579 = 1205791015840
119895minus 120579119894 (2)
As shown in Figure 5 reference centers are distributedin small areas Then the problem of determining the center
Journal of Sensors 5
Result
false results eliminationObject level
Clustering based on Tr
Key points projection
False matches eliminationbased on sr
Feature matching
Feature extraction
Scale setting 120590 = sr times 120590o
Get access to the validtraining image
Begin
Query image acquisition
Feature extraction
Scale setting 120590 = 120590o
Database
Feature matching
Key points projection
Kernel densityestimation
Dominant scale ratio sr
clustering thresholdTr computation
and reference
Figure 4 Online detection flowchart
Training image Query image
Matched
Optic axis
features
i
pi
p998400j
c998400oco120576j
Figure 5 Key points projection principle diagram
coordinates is converted into a density estimation problemThe first layer of density estimation aims to find one of thevalid centers in the query image Object center estimationis a crucial problem A two-stage procedure-based adaptivekernel density estimation method elaborated in [28] isemployed to improve the precision Only those density valuesassociatedwith themapped key points are calculated to speedup the process The point with the highest density value issaved Although this point may be not the exact center it isa typical approximationThus the mapped point is identifiedas a valid center Simultaneously the exact training image canbe obtained As is illustrated in Figure 6 the blue point is theobtained object center
[
[
1199091015840
119900119895
1199101015840
119900119895
]
]
= [
1199091015840
119895
1199101015840
119895
] +
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times cos 120576
119895(3)
= [
[
1199091015840
119895
1199101015840
119895
]
]
+
1199041015840
119895
119904119894
[
[
cos 120579 minus sin 120579
sin 120579 cos 120579]
]
times V119894
times (1 minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot)
(4)
= [
[
1199091015840
119900119895
1199101015840
119900119895
]
]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
RealCenter
+
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times (minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot )
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
DistributionRange
(5)
6 Journal of Sensors
Columns
Row
s
Training image Query image
Tr
Figure 6 Reference clustering threshold calculation
323 Dominant Scale Ratio Estimation and Scale Restriction-Based False Matches Elimination The dominant scale ratioserves two purposes false match elimination and calculationof a reference clustering radius for the second layer of densityestimation In contrast to the conventional methods in [1011] the dominant scale ratio in our work can be derivedaccording to (6) based on the assumption that the estimatedcenter has a typical scale ratio value In (6) sr is the orientedscale ratio 1199041015840
119898is the scale of the key point related to the
estimated object center and 119904119899is the scale of the matched key
point in the training image
sr =1199041015840
119898
119904119899
(6)
Once the valid center is found the points that supportthe center are recordedThese points are used to calculate thehomography matrix119867
119900for the pattern The matrix is shown
in (7) Because the minimum safe distance between the robotand the shelves is far enough which means the camera onthe robot is far from the targets the actual homography issufficiently close to affine transformationThen the dominantscale ratio sr1015840 can also be computed according to (8)Then sr1015840is used to verify sr Only if the value of sr is approximate tosr1015840 the value of sr is confirmed to be correct We use (9) toassess the similarity between the two values
119867119900=[[
[
ℎ11ℎ12ℎ13
ℎ21ℎ22ℎ23
ℎ31ℎ32
1
]]
]
(7)
sr1015840 = radic1003816100381610038161003816ℎ11times ℎ22
1003816100381610038161003816+1003816100381610038161003816ℎ12times ℎ21
1003816100381610038161003816 (8)
100381610038161003816100381610038161003816100381610038161003816
sr minus sr1015840
min (sr sr1015840)
100381610038161003816100381610038161003816100381610038161003816
lt 15 (9)
To find all possible object instances a SIFT feature-basedtemplate of the ordered object must be reconstructed (seeFigure 1(f))The Gaussian smoothing factor is to be set basedon the dominant scale ratio and is adjusted in accordancewith (10) A new retrieval structure is constructed after SIFTfeatures are detected Then features obtained from the queryimage above are matched to the new dataset Due to theaforementioned preprocessing the amount of SIFT featuresin the newly constructed database is reduced compared
to offline training phase Thus the time overhead of thematching process is greatly reduced
120590TrainAdjust = sr times 120590119900 (10)
The strategy of feature matching disambiguation hereis a cascade of filters These filters can be divided into theratio test algorithm (proposed in [3]) scale restriction-basedmethod (presented in [11]) and geometric verification-basedapproachThe ratio test and scale restrictionmethods use thefollowing matching process The geometric verification takeseffect after clustering After this series of filters most of falsematches can be eliminated
324 Reference Clustering Threshold Computation and Can-didate Object Instances Detection Traditional methods fordetecting multiple object instances such as mean-shift andgrid voting are based on density estimation However thesemethods have the same disadvantage that the bandwidthmust be given by experience For example in [16] the cluster-ing thresholdwas set to a specific value In [19] the voting gridsize was set to the value associated with the size of the queryimage Nevertheless this approachmay still lead to unreliableresults For our specific application occasion the clusteringthreshold can be estimated based on the size of trainingimage and the aforementioned dominant scale ratio Beforethe clustering threshold is finally determined a referenceclustering threshold should be computed automatically Herethe reference clustering threshold can be estimated based on(11) In the formula119879
119903is the reference clustering threshold sr
is the oriented scale ratio and rows and cols are the numbersof rows and columns in the training image respectivelyAs noted above the mapped key points are located insmall regions around real centroids Therefore the clusteringthreshold Th can be finalized in line with (12) in which 119896 isa correction factor According to our repeated experimentsdescribed in Section 4 we provide a recommended value for119896 Candidate object instance detection is based on the secondlayer of density estimation Grid voting is employed here dueto its high precision and recall
119879119903=
sr times rows if rows lt cols
sr times cols otherwise(11)
Th = 119896 times 119879119903 (12)
33 Object Level False Result Elimination In the procedurefor eliminating false detection results we first calculate thehomography matrix for each cluster Then four corners ofthe training image are projected onto four new coordinatesAs a result a convex quadrilateral in accordance with thefour mapped corners is produced Here we provide a simplebut effective way to assess whether the system has obtainedcorrect object instances and error detections are eliminatedThe criterion is as follows
119888min leArea (Quadrilateral)
sr2 times Area (TrainingImage)le 119888max (13)
Journal of Sensors 7
(a) (b) (c)
Figure 7 Examples of objects with different texture levels (a) high texture (b) medium texture (c) low texture
In (13) Area(Quadrilateral) is the area of the convexquadrilateral derived from each candidate object instanceArea(TrainingImage) is the area of the training imageAccording to (13) if the detection is accurate the ratiocoefficient between the area of the quadrilateral and thetraining image is approximate to sr2 The threshold 119888min and119888max should be set before verification
Finally for each cluster the features are matched to the3D sparse model created in the offline training procedureA noniterative method called EPnp [29] was employed toestimate pose for each object instance
4 Experiments
41 Experimental Methodology We are developing a servicerobot for the detection and manipulation of multiple objectinstances and there is no standard database for our specificapplication To validate our approach we created a databasefor 70 types of products with different shapes colors andsizes in a supermarket Objects to be detected were placedon shelves with the front outside All images were capturedusing a SONYRGB cameraThe resolution of the camera was1240 times 780 pixels To comprehensively evaluate the accuracyof the proposed architecture the database was divided intothree sets according to the texture level of the objects Figure 7shows examples of objects with different texture levels
We designed three experiments to evaluate the proposedarchitecture The first experiment was to verify whether thescale ratio calculation and false eliminationmethod were fea-sible The second one was to examine whether the proposedclustering threshold computation method was effective Thelast experiment was to comprehensively evaluate the perfor-mance of the proposed architectureThese three experimentswere designed as follows
(i) Experiment I for each training image in the databasewe acquired an image considering that the objectinstance in the image had the same scale as thetraining image Then the captured images weredownsampled The size of the resampled imageswere 100 75 50 and 25 of the original sizeWe calculated the dominant scale ratios based onthe conventional histogram statistics and proposedmethod separately Then the accuracy of both valueswas compared The feature matching and key point
projection results with and without false eliminationwere also recorded and compared
(ii) Experiment II we first calculated a clustering thresh-old according to (14)Thenwe tested the performanceof the conventional methods (mean-shift and gridvoting) based on changing the clustering thresholdcontinuously Here an approximate nearest neigh-bor searching method was employed to speed upmean-shift Because the thresholds could not bedirectly compared in different experiments we usedthe multiple of the computed threshold in differentexperiments to express the new value In (14) CR isthe bandwidth for mean-shift GS is the grid size forgrid voting and 119896MS and 119896GV are the coefficients Wechose an optimal threshold value according to theexperimental results In the experiment the thresholdratio parameters were sampled as 119896MS = 119896GV =26 24 22 20 19 18 17 16 14 12 10 08
CR = 12
times 119896MS times 119879119903 using mean-shif t
GS = 119896GV times 119879119903 using grid voting (14)
(iii) Experiment III we compared the proposed methodwith the conventional grid voting on three types ofdatasets The experimental conditions of the con-ventional grid voting were as follows width andheight of the grid are 1130 of the width and theheight of the query image and the voting grid hadan overlap of 25 of size with an adjacent gridThe performances of the proposed method and theconventional grid voting were expressed in terms ofthe accuracy (precision and recall) and computationaltime
In all the experiments the parameters for SIFT featureextraction and the threshold for feature matching were setas the default values in [3] In particular the initial Gaussiansmoothing parameter was set as 120590
119900= 16 and the default
threshold on key point contrast was set to 01 In theverification procedure in our experiments thresholds 119888minand 119888max were set as 08 and 12 respectively In our work allof the experiments have been conducted on Windows 7 PCwith Core i7-4710MQ CPU 250GHz and 8GB RAM
8 Journal of Sensors
sr = 100 sr = 074
sr = 048 sr = 0254
(a) Center estimation and dominant scale ratio computation by proposedmethod
2000
1500
1000
500
0
1000
500
0
0 1 2
Scale ratio
0 1 2
Scale ratio0 1 2
Scale ratio
0 1 2
Scale ratio
Freq
uenc
yFr
eque
ncy
Freq
uenc
yFr
eque
ncy
sr = 099 sr = 075
sr = 0234sr = 047
300
200
100
0
1500
1000
500
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 8 The first example of dominant scale ratio computation
sr = 101 sr = 075
sr = 050 sr = 0251
(a) Center estimation and dominant scale ratio computation by proposedmethod
0 1 2
0 1 2
Scale ratio0 1 2
Scale ratio
Scale ratio0 1 2
Scale ratio
1000
500
0
Freq
uenc
y
1000
500
0Fr
eque
ncy
sr = 029 sr = 021
sr = 052sr = 021
Freq
uenc
y
400
300
200
100
0
Freq
uenc
y
60
40
20
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 9 The second example of dominant scale ratio computation
42 Experimental Results and Analysis
421 Results of the Dominant Scale Ratio Computation andScale Restriction-Based False Match Elimination Figures 8and 9 display the results of two examples for computing thedominant scale ratios Figures 8(a) and 9(a) are the resultsof the proposed method whereas Figures 8(b) and 9(b) are
the results of the conventional method The reference scaleratios are 100 75 50 and 25 in these figures In Figures8(a) 8(b) and 9(a) the calculated results are close to thereference valuesHowever in Figure 9(b) the results obtainedby the conventional method are not reliable The reason forthe error in Figure 9(b) is that the background noise is toosevere and the extracted features may have nearly the same
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 3
SURF and PCA-SIFT are three alternatives According to[5 7] SIFT has better performance in scale and rotationchange than SURF and PCA-SIFT thus SIFT is used inour work although it is time-consuming The proposedframework is based on SIFT feature extraction and featurematching by considering the specific application backgroundThe framework consists of two phases the offline trainingphase and the online detection phase A graphic illustrationof the proposed approach is shown in Figure 1 To make ouralgorithm more explicit we make selected arrangements inadvance First the term key point refers to a point with 2Dcoordinates and the point is detected by SIFT theory Theterm descriptor represents a 128-dimensional SIFT featurevector The term feature consists of a description vector andthe scale orientation and coordinate of the SIFT point
In the offline phase as shown in Figure 1(a) an initialvalue of the Gaussian smoothing parameter is given inadvance The SIFT features are extracted from the trainingimages for certain objects Reference vectors between all keypoints and the object center are computed to locate the objectcentroid All features are stored in a retrieval structure toreduce time overhead during detection On the other handwe created a sparse 3D model for each object with a standardStructure fromMotion algorithm [20] and each 3D point wasassociated with a corresponding SIFT descriptor
The online detection phase is a dual-layer density esti-mation-basedmethodThe first layer exists for two purposesto compute the dominant scale ratio between the trainingimage and query image (Figures 1(b)ndash1(e)) and to calculatea reference clustering threshold for the second layer ofdensity estimation (Figures 1(f)ndash1(i)) At the beginning offeature extraction for the query image an initial value ofthe Gaussian smoothing parameter is given the same asin the training phase All descriptors extracted from thevideo footage are matched to their nearest neighbors in thedatabase (Figure 1(b)) and the key points are projected totheir reference centers (Figure 1(c)) A valid object centerwith a maximum density value can be found using kerneldensity estimation (Figure 1(d)) Considering that objectinstances in our applications have nearly the same scale thedominant scale ratio and an effective clustering thresholdare computed accordingly (Figure 1(e)) The second layerof density estimation detects all possible instances Firstthe feature template is reconstructed based on the initialvalue of the base scale and the calculated dominant scaleratio (Figure 1(f)) The majority of false feature matches areremoved by a cascade of filters based on the distance ratio testand scale restriction (Figure 1(g)) The key point projectionand 2D clustering methods are applied to find all candidateobject centers (Figure 1(h)) The final geometric verificationprocedure can eliminate incorrect detection results anddetermine each instancersquos pose (Figure 1(i))
3 Description of the Proposed Method
In this section we introduce our work in detail in accordancewith the aforementioned architectureThe schematic diagramfor the offline training phase and the flowchart of the onlinedetection are shown in Figures 2 and 4 respectively
31 Offline Training Template Generation and RetrievalStructure Construction Indeed the proposedmethod can beapplied in conjunction with any scale and rotation invariantfeatures As is described in Section 2 SIFT is applied inour work for its robustness To create templates for all typesof object instances frontal images of the targets must becaptured As noted in Section 2 the light conditions in ourapplication are relatively invariant In addition we assumethat all object instances face front outward SIFT is ableto work properly under these conditions Thus we cancollect one frontal image for each type of product for objectrecognition Besides for the following object pose estimationa sparse 3D model for each object was created (as shownin Figure 3) and thus 24 other images were captured atapproximately equally spaced intervals in a circle around eachobject According to SIFT theory the Gaussian smoothingparameter should be given first Suppose that the initial valueis set to 120590TrainInit = 120590119900 In this work 120590
119900is a fixed value as is
described in Section 4 and the SIFT feature extraction takesplace
We assume that the number of features for a specificobject is 119899 Each SIFT feature descriptor is a 128-dimensionalvector 119891
119894 where 119894 = 1 2 119899 Similarly the scale of the
feature is 119904119894 the principle orientation is 120579
119894 and its coordinate
is 119888119894(119909119894 119910119894) Coordinate differences V
119894between each SIFT key
point 119888119894(119909119894 119910119894) and the related object centroid 119888
119900(119909119900 119910119900) are
calculated according to the following
V119894119900= [
Δ119909119894
Δ119910119894
] = [
119909119894
119910119894
] minus [
119909119900
119910119900
] (1)
Featurematching is a subprocedure in ourmultiple objectinstance detection architecture The process is used to findthe most similar feature in the dataset based on a distancemeasurement In our work the Hellinger distance measure-ment is applied due to its robustness according to [12]Feature matching is typically a time-consuming process Theconstruction of an effective retrieval structure is necessaryfor speeding up the detection phase Two types of effectiveretrieval methods are currently available tree-basedmethodsand hashing-based methods The randomized kd-tree [2122] hierarchical 119896-means tree [21 22] and vocabulary tree[23] are typical representatives of tree-based methods Localsensitive hashing (LSH) [24 25] and SSH [26] are tworepresentative hashing-based methods In all of the feasiblemethods near-optimal hashing algorithms [27] have provento be highly efficient and accurate and this method waschosen for our work Construction of multiple independenttrees to form a forest is necessary to reduce the false negativeand false positive rates
32 Online Multiple Object Instances Detection
321 Feature Extraction for Query Image and Feature Match-ing During online detection the system first obtains accessto a new captured video frame SIFT key points are detectedand descriptors are extracted in the same manner as thefirst part of offline procedure The Gaussian smoothingparameter is also set as 120590Query = 120590119900 Then the near-optimal
4 Journal of Sensors
Database
Offline
(a)
(b)
(f) (g) (h) (i)
(c) (d) (e)
Effective training imageDominant scale ratioClustering threshold
Figure 1 Overview of the proposed framework (a) offline phase for constructing the retrieval structure (b)ndash(e) first layer of densityestimation (b) local feature detection (c) feature matching and key point mapping (d) first layer of density estimation and (e) intermediateresults (f)ndash(i) second layer of density estimation (f) feature template reconstruction (g) false matching result elimination and (h) clusteringfor candidate instances detection (i) geometric verification
Key points detection and
descriptors extraction
Reference vectors calculation
Retrieval structure construction Database
Frontal object images
Scale
Feature descriptors ScalesOrientationsReference vectors Original training images
120590o
Figure 2 Offline training procedure
Figure 3 3D sparse model of packing box from 25 images
hashing algorithm takes effect During feature matching lowdiscriminable matches are discarded based on ratio test ofdistances between the nearest neighbor and second nearestneighbor which was proposed in [3]
322 Key Points Projection and Object Center EstimationTheprinciple of key point projection is illustrated in Figure 5In Figure 5 the left part is the training image and the rightpart is the query image Regarding the middle part the solid
region is a matched patch from the query image and the areaformedbydotted lines is assumed to be the ideal case inwhichthere is only similarity transform Assume that the matchingpair of features is 119891
119894and 119891119895 where 119891
119894is from the database and
119891119895is from the query image The key points corresponding to
these two features are 119901119894(119909119894 119910119894) and 1199011015840
119895(1199091015840
119895 1199101015840
119895) As for a plane
object the center 1198881015840119900119895(1199091015840
1199001198951199101015840
119900119895) related to 119891
119895 can be estimated
according to (2)ndash(5)In the formulas 1199041015840
119895and 1205791015840119895are the corresponding scale and
orientation of feature 1198911015840119895 Similarly 119904
119894and 120579
119894are related to
feature 119891119894in the training image For each pair of matching
features there is a normalized deflection angle 120576119895between the
normal vector of an object surface and camera optical axisfor each matched features According to (5) the estimatedcenters would be located in a small range of areas aroundthe real center when the training image is the exact imagecorresponding to the ordered object instance and 120576
119895has an
extremely small value
120579 = 1205791015840
119895minus 120579119894 (2)
As shown in Figure 5 reference centers are distributedin small areas Then the problem of determining the center
Journal of Sensors 5
Result
false results eliminationObject level
Clustering based on Tr
Key points projection
False matches eliminationbased on sr
Feature matching
Feature extraction
Scale setting 120590 = sr times 120590o
Get access to the validtraining image
Begin
Query image acquisition
Feature extraction
Scale setting 120590 = 120590o
Database
Feature matching
Key points projection
Kernel densityestimation
Dominant scale ratio sr
clustering thresholdTr computation
and reference
Figure 4 Online detection flowchart
Training image Query image
Matched
Optic axis
features
i
pi
p998400j
c998400oco120576j
Figure 5 Key points projection principle diagram
coordinates is converted into a density estimation problemThe first layer of density estimation aims to find one of thevalid centers in the query image Object center estimationis a crucial problem A two-stage procedure-based adaptivekernel density estimation method elaborated in [28] isemployed to improve the precision Only those density valuesassociatedwith themapped key points are calculated to speedup the process The point with the highest density value issaved Although this point may be not the exact center it isa typical approximationThus the mapped point is identifiedas a valid center Simultaneously the exact training image canbe obtained As is illustrated in Figure 6 the blue point is theobtained object center
[
[
1199091015840
119900119895
1199101015840
119900119895
]
]
= [
1199091015840
119895
1199101015840
119895
] +
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times cos 120576
119895(3)
= [
[
1199091015840
119895
1199101015840
119895
]
]
+
1199041015840
119895
119904119894
[
[
cos 120579 minus sin 120579
sin 120579 cos 120579]
]
times V119894
times (1 minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot)
(4)
= [
[
1199091015840
119900119895
1199101015840
119900119895
]
]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
RealCenter
+
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times (minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot )
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
DistributionRange
(5)
6 Journal of Sensors
Columns
Row
s
Training image Query image
Tr
Figure 6 Reference clustering threshold calculation
323 Dominant Scale Ratio Estimation and Scale Restriction-Based False Matches Elimination The dominant scale ratioserves two purposes false match elimination and calculationof a reference clustering radius for the second layer of densityestimation In contrast to the conventional methods in [1011] the dominant scale ratio in our work can be derivedaccording to (6) based on the assumption that the estimatedcenter has a typical scale ratio value In (6) sr is the orientedscale ratio 1199041015840
119898is the scale of the key point related to the
estimated object center and 119904119899is the scale of the matched key
point in the training image
sr =1199041015840
119898
119904119899
(6)
Once the valid center is found the points that supportthe center are recordedThese points are used to calculate thehomography matrix119867
119900for the pattern The matrix is shown
in (7) Because the minimum safe distance between the robotand the shelves is far enough which means the camera onthe robot is far from the targets the actual homography issufficiently close to affine transformationThen the dominantscale ratio sr1015840 can also be computed according to (8)Then sr1015840is used to verify sr Only if the value of sr is approximate tosr1015840 the value of sr is confirmed to be correct We use (9) toassess the similarity between the two values
119867119900=[[
[
ℎ11ℎ12ℎ13
ℎ21ℎ22ℎ23
ℎ31ℎ32
1
]]
]
(7)
sr1015840 = radic1003816100381610038161003816ℎ11times ℎ22
1003816100381610038161003816+1003816100381610038161003816ℎ12times ℎ21
1003816100381610038161003816 (8)
100381610038161003816100381610038161003816100381610038161003816
sr minus sr1015840
min (sr sr1015840)
100381610038161003816100381610038161003816100381610038161003816
lt 15 (9)
To find all possible object instances a SIFT feature-basedtemplate of the ordered object must be reconstructed (seeFigure 1(f))The Gaussian smoothing factor is to be set basedon the dominant scale ratio and is adjusted in accordancewith (10) A new retrieval structure is constructed after SIFTfeatures are detected Then features obtained from the queryimage above are matched to the new dataset Due to theaforementioned preprocessing the amount of SIFT featuresin the newly constructed database is reduced compared
to offline training phase Thus the time overhead of thematching process is greatly reduced
120590TrainAdjust = sr times 120590119900 (10)
The strategy of feature matching disambiguation hereis a cascade of filters These filters can be divided into theratio test algorithm (proposed in [3]) scale restriction-basedmethod (presented in [11]) and geometric verification-basedapproachThe ratio test and scale restrictionmethods use thefollowing matching process The geometric verification takeseffect after clustering After this series of filters most of falsematches can be eliminated
324 Reference Clustering Threshold Computation and Can-didate Object Instances Detection Traditional methods fordetecting multiple object instances such as mean-shift andgrid voting are based on density estimation However thesemethods have the same disadvantage that the bandwidthmust be given by experience For example in [16] the cluster-ing thresholdwas set to a specific value In [19] the voting gridsize was set to the value associated with the size of the queryimage Nevertheless this approachmay still lead to unreliableresults For our specific application occasion the clusteringthreshold can be estimated based on the size of trainingimage and the aforementioned dominant scale ratio Beforethe clustering threshold is finally determined a referenceclustering threshold should be computed automatically Herethe reference clustering threshold can be estimated based on(11) In the formula119879
119903is the reference clustering threshold sr
is the oriented scale ratio and rows and cols are the numbersof rows and columns in the training image respectivelyAs noted above the mapped key points are located insmall regions around real centroids Therefore the clusteringthreshold Th can be finalized in line with (12) in which 119896 isa correction factor According to our repeated experimentsdescribed in Section 4 we provide a recommended value for119896 Candidate object instance detection is based on the secondlayer of density estimation Grid voting is employed here dueto its high precision and recall
119879119903=
sr times rows if rows lt cols
sr times cols otherwise(11)
Th = 119896 times 119879119903 (12)
33 Object Level False Result Elimination In the procedurefor eliminating false detection results we first calculate thehomography matrix for each cluster Then four corners ofthe training image are projected onto four new coordinatesAs a result a convex quadrilateral in accordance with thefour mapped corners is produced Here we provide a simplebut effective way to assess whether the system has obtainedcorrect object instances and error detections are eliminatedThe criterion is as follows
119888min leArea (Quadrilateral)
sr2 times Area (TrainingImage)le 119888max (13)
Journal of Sensors 7
(a) (b) (c)
Figure 7 Examples of objects with different texture levels (a) high texture (b) medium texture (c) low texture
In (13) Area(Quadrilateral) is the area of the convexquadrilateral derived from each candidate object instanceArea(TrainingImage) is the area of the training imageAccording to (13) if the detection is accurate the ratiocoefficient between the area of the quadrilateral and thetraining image is approximate to sr2 The threshold 119888min and119888max should be set before verification
Finally for each cluster the features are matched to the3D sparse model created in the offline training procedureA noniterative method called EPnp [29] was employed toestimate pose for each object instance
4 Experiments
41 Experimental Methodology We are developing a servicerobot for the detection and manipulation of multiple objectinstances and there is no standard database for our specificapplication To validate our approach we created a databasefor 70 types of products with different shapes colors andsizes in a supermarket Objects to be detected were placedon shelves with the front outside All images were capturedusing a SONYRGB cameraThe resolution of the camera was1240 times 780 pixels To comprehensively evaluate the accuracyof the proposed architecture the database was divided intothree sets according to the texture level of the objects Figure 7shows examples of objects with different texture levels
We designed three experiments to evaluate the proposedarchitecture The first experiment was to verify whether thescale ratio calculation and false eliminationmethod were fea-sible The second one was to examine whether the proposedclustering threshold computation method was effective Thelast experiment was to comprehensively evaluate the perfor-mance of the proposed architectureThese three experimentswere designed as follows
(i) Experiment I for each training image in the databasewe acquired an image considering that the objectinstance in the image had the same scale as thetraining image Then the captured images weredownsampled The size of the resampled imageswere 100 75 50 and 25 of the original sizeWe calculated the dominant scale ratios based onthe conventional histogram statistics and proposedmethod separately Then the accuracy of both valueswas compared The feature matching and key point
projection results with and without false eliminationwere also recorded and compared
(ii) Experiment II we first calculated a clustering thresh-old according to (14)Thenwe tested the performanceof the conventional methods (mean-shift and gridvoting) based on changing the clustering thresholdcontinuously Here an approximate nearest neigh-bor searching method was employed to speed upmean-shift Because the thresholds could not bedirectly compared in different experiments we usedthe multiple of the computed threshold in differentexperiments to express the new value In (14) CR isthe bandwidth for mean-shift GS is the grid size forgrid voting and 119896MS and 119896GV are the coefficients Wechose an optimal threshold value according to theexperimental results In the experiment the thresholdratio parameters were sampled as 119896MS = 119896GV =26 24 22 20 19 18 17 16 14 12 10 08
CR = 12
times 119896MS times 119879119903 using mean-shif t
GS = 119896GV times 119879119903 using grid voting (14)
(iii) Experiment III we compared the proposed methodwith the conventional grid voting on three types ofdatasets The experimental conditions of the con-ventional grid voting were as follows width andheight of the grid are 1130 of the width and theheight of the query image and the voting grid hadan overlap of 25 of size with an adjacent gridThe performances of the proposed method and theconventional grid voting were expressed in terms ofthe accuracy (precision and recall) and computationaltime
In all the experiments the parameters for SIFT featureextraction and the threshold for feature matching were setas the default values in [3] In particular the initial Gaussiansmoothing parameter was set as 120590
119900= 16 and the default
threshold on key point contrast was set to 01 In theverification procedure in our experiments thresholds 119888minand 119888max were set as 08 and 12 respectively In our work allof the experiments have been conducted on Windows 7 PCwith Core i7-4710MQ CPU 250GHz and 8GB RAM
8 Journal of Sensors
sr = 100 sr = 074
sr = 048 sr = 0254
(a) Center estimation and dominant scale ratio computation by proposedmethod
2000
1500
1000
500
0
1000
500
0
0 1 2
Scale ratio
0 1 2
Scale ratio0 1 2
Scale ratio
0 1 2
Scale ratio
Freq
uenc
yFr
eque
ncy
Freq
uenc
yFr
eque
ncy
sr = 099 sr = 075
sr = 0234sr = 047
300
200
100
0
1500
1000
500
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 8 The first example of dominant scale ratio computation
sr = 101 sr = 075
sr = 050 sr = 0251
(a) Center estimation and dominant scale ratio computation by proposedmethod
0 1 2
0 1 2
Scale ratio0 1 2
Scale ratio
Scale ratio0 1 2
Scale ratio
1000
500
0
Freq
uenc
y
1000
500
0Fr
eque
ncy
sr = 029 sr = 021
sr = 052sr = 021
Freq
uenc
y
400
300
200
100
0
Freq
uenc
y
60
40
20
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 9 The second example of dominant scale ratio computation
42 Experimental Results and Analysis
421 Results of the Dominant Scale Ratio Computation andScale Restriction-Based False Match Elimination Figures 8and 9 display the results of two examples for computing thedominant scale ratios Figures 8(a) and 9(a) are the resultsof the proposed method whereas Figures 8(b) and 9(b) are
the results of the conventional method The reference scaleratios are 100 75 50 and 25 in these figures In Figures8(a) 8(b) and 9(a) the calculated results are close to thereference valuesHowever in Figure 9(b) the results obtainedby the conventional method are not reliable The reason forthe error in Figure 9(b) is that the background noise is toosevere and the extracted features may have nearly the same
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
4 Journal of Sensors
Database
Offline
(a)
(b)
(f) (g) (h) (i)
(c) (d) (e)
Effective training imageDominant scale ratioClustering threshold
Figure 1 Overview of the proposed framework (a) offline phase for constructing the retrieval structure (b)ndash(e) first layer of densityestimation (b) local feature detection (c) feature matching and key point mapping (d) first layer of density estimation and (e) intermediateresults (f)ndash(i) second layer of density estimation (f) feature template reconstruction (g) false matching result elimination and (h) clusteringfor candidate instances detection (i) geometric verification
Key points detection and
descriptors extraction
Reference vectors calculation
Retrieval structure construction Database
Frontal object images
Scale
Feature descriptors ScalesOrientationsReference vectors Original training images
120590o
Figure 2 Offline training procedure
Figure 3 3D sparse model of packing box from 25 images
hashing algorithm takes effect During feature matching lowdiscriminable matches are discarded based on ratio test ofdistances between the nearest neighbor and second nearestneighbor which was proposed in [3]
322 Key Points Projection and Object Center EstimationTheprinciple of key point projection is illustrated in Figure 5In Figure 5 the left part is the training image and the rightpart is the query image Regarding the middle part the solid
region is a matched patch from the query image and the areaformedbydotted lines is assumed to be the ideal case inwhichthere is only similarity transform Assume that the matchingpair of features is 119891
119894and 119891119895 where 119891
119894is from the database and
119891119895is from the query image The key points corresponding to
these two features are 119901119894(119909119894 119910119894) and 1199011015840
119895(1199091015840
119895 1199101015840
119895) As for a plane
object the center 1198881015840119900119895(1199091015840
1199001198951199101015840
119900119895) related to 119891
119895 can be estimated
according to (2)ndash(5)In the formulas 1199041015840
119895and 1205791015840119895are the corresponding scale and
orientation of feature 1198911015840119895 Similarly 119904
119894and 120579
119894are related to
feature 119891119894in the training image For each pair of matching
features there is a normalized deflection angle 120576119895between the
normal vector of an object surface and camera optical axisfor each matched features According to (5) the estimatedcenters would be located in a small range of areas aroundthe real center when the training image is the exact imagecorresponding to the ordered object instance and 120576
119895has an
extremely small value
120579 = 1205791015840
119895minus 120579119894 (2)
As shown in Figure 5 reference centers are distributedin small areas Then the problem of determining the center
Journal of Sensors 5
Result
false results eliminationObject level
Clustering based on Tr
Key points projection
False matches eliminationbased on sr
Feature matching
Feature extraction
Scale setting 120590 = sr times 120590o
Get access to the validtraining image
Begin
Query image acquisition
Feature extraction
Scale setting 120590 = 120590o
Database
Feature matching
Key points projection
Kernel densityestimation
Dominant scale ratio sr
clustering thresholdTr computation
and reference
Figure 4 Online detection flowchart
Training image Query image
Matched
Optic axis
features
i
pi
p998400j
c998400oco120576j
Figure 5 Key points projection principle diagram
coordinates is converted into a density estimation problemThe first layer of density estimation aims to find one of thevalid centers in the query image Object center estimationis a crucial problem A two-stage procedure-based adaptivekernel density estimation method elaborated in [28] isemployed to improve the precision Only those density valuesassociatedwith themapped key points are calculated to speedup the process The point with the highest density value issaved Although this point may be not the exact center it isa typical approximationThus the mapped point is identifiedas a valid center Simultaneously the exact training image canbe obtained As is illustrated in Figure 6 the blue point is theobtained object center
[
[
1199091015840
119900119895
1199101015840
119900119895
]
]
= [
1199091015840
119895
1199101015840
119895
] +
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times cos 120576
119895(3)
= [
[
1199091015840
119895
1199101015840
119895
]
]
+
1199041015840
119895
119904119894
[
[
cos 120579 minus sin 120579
sin 120579 cos 120579]
]
times V119894
times (1 minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot)
(4)
= [
[
1199091015840
119900119895
1199101015840
119900119895
]
]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
RealCenter
+
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times (minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot )
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
DistributionRange
(5)
6 Journal of Sensors
Columns
Row
s
Training image Query image
Tr
Figure 6 Reference clustering threshold calculation
323 Dominant Scale Ratio Estimation and Scale Restriction-Based False Matches Elimination The dominant scale ratioserves two purposes false match elimination and calculationof a reference clustering radius for the second layer of densityestimation In contrast to the conventional methods in [1011] the dominant scale ratio in our work can be derivedaccording to (6) based on the assumption that the estimatedcenter has a typical scale ratio value In (6) sr is the orientedscale ratio 1199041015840
119898is the scale of the key point related to the
estimated object center and 119904119899is the scale of the matched key
point in the training image
sr =1199041015840
119898
119904119899
(6)
Once the valid center is found the points that supportthe center are recordedThese points are used to calculate thehomography matrix119867
119900for the pattern The matrix is shown
in (7) Because the minimum safe distance between the robotand the shelves is far enough which means the camera onthe robot is far from the targets the actual homography issufficiently close to affine transformationThen the dominantscale ratio sr1015840 can also be computed according to (8)Then sr1015840is used to verify sr Only if the value of sr is approximate tosr1015840 the value of sr is confirmed to be correct We use (9) toassess the similarity between the two values
119867119900=[[
[
ℎ11ℎ12ℎ13
ℎ21ℎ22ℎ23
ℎ31ℎ32
1
]]
]
(7)
sr1015840 = radic1003816100381610038161003816ℎ11times ℎ22
1003816100381610038161003816+1003816100381610038161003816ℎ12times ℎ21
1003816100381610038161003816 (8)
100381610038161003816100381610038161003816100381610038161003816
sr minus sr1015840
min (sr sr1015840)
100381610038161003816100381610038161003816100381610038161003816
lt 15 (9)
To find all possible object instances a SIFT feature-basedtemplate of the ordered object must be reconstructed (seeFigure 1(f))The Gaussian smoothing factor is to be set basedon the dominant scale ratio and is adjusted in accordancewith (10) A new retrieval structure is constructed after SIFTfeatures are detected Then features obtained from the queryimage above are matched to the new dataset Due to theaforementioned preprocessing the amount of SIFT featuresin the newly constructed database is reduced compared
to offline training phase Thus the time overhead of thematching process is greatly reduced
120590TrainAdjust = sr times 120590119900 (10)
The strategy of feature matching disambiguation hereis a cascade of filters These filters can be divided into theratio test algorithm (proposed in [3]) scale restriction-basedmethod (presented in [11]) and geometric verification-basedapproachThe ratio test and scale restrictionmethods use thefollowing matching process The geometric verification takeseffect after clustering After this series of filters most of falsematches can be eliminated
324 Reference Clustering Threshold Computation and Can-didate Object Instances Detection Traditional methods fordetecting multiple object instances such as mean-shift andgrid voting are based on density estimation However thesemethods have the same disadvantage that the bandwidthmust be given by experience For example in [16] the cluster-ing thresholdwas set to a specific value In [19] the voting gridsize was set to the value associated with the size of the queryimage Nevertheless this approachmay still lead to unreliableresults For our specific application occasion the clusteringthreshold can be estimated based on the size of trainingimage and the aforementioned dominant scale ratio Beforethe clustering threshold is finally determined a referenceclustering threshold should be computed automatically Herethe reference clustering threshold can be estimated based on(11) In the formula119879
119903is the reference clustering threshold sr
is the oriented scale ratio and rows and cols are the numbersof rows and columns in the training image respectivelyAs noted above the mapped key points are located insmall regions around real centroids Therefore the clusteringthreshold Th can be finalized in line with (12) in which 119896 isa correction factor According to our repeated experimentsdescribed in Section 4 we provide a recommended value for119896 Candidate object instance detection is based on the secondlayer of density estimation Grid voting is employed here dueto its high precision and recall
119879119903=
sr times rows if rows lt cols
sr times cols otherwise(11)
Th = 119896 times 119879119903 (12)
33 Object Level False Result Elimination In the procedurefor eliminating false detection results we first calculate thehomography matrix for each cluster Then four corners ofthe training image are projected onto four new coordinatesAs a result a convex quadrilateral in accordance with thefour mapped corners is produced Here we provide a simplebut effective way to assess whether the system has obtainedcorrect object instances and error detections are eliminatedThe criterion is as follows
119888min leArea (Quadrilateral)
sr2 times Area (TrainingImage)le 119888max (13)
Journal of Sensors 7
(a) (b) (c)
Figure 7 Examples of objects with different texture levels (a) high texture (b) medium texture (c) low texture
In (13) Area(Quadrilateral) is the area of the convexquadrilateral derived from each candidate object instanceArea(TrainingImage) is the area of the training imageAccording to (13) if the detection is accurate the ratiocoefficient between the area of the quadrilateral and thetraining image is approximate to sr2 The threshold 119888min and119888max should be set before verification
Finally for each cluster the features are matched to the3D sparse model created in the offline training procedureA noniterative method called EPnp [29] was employed toestimate pose for each object instance
4 Experiments
41 Experimental Methodology We are developing a servicerobot for the detection and manipulation of multiple objectinstances and there is no standard database for our specificapplication To validate our approach we created a databasefor 70 types of products with different shapes colors andsizes in a supermarket Objects to be detected were placedon shelves with the front outside All images were capturedusing a SONYRGB cameraThe resolution of the camera was1240 times 780 pixels To comprehensively evaluate the accuracyof the proposed architecture the database was divided intothree sets according to the texture level of the objects Figure 7shows examples of objects with different texture levels
We designed three experiments to evaluate the proposedarchitecture The first experiment was to verify whether thescale ratio calculation and false eliminationmethod were fea-sible The second one was to examine whether the proposedclustering threshold computation method was effective Thelast experiment was to comprehensively evaluate the perfor-mance of the proposed architectureThese three experimentswere designed as follows
(i) Experiment I for each training image in the databasewe acquired an image considering that the objectinstance in the image had the same scale as thetraining image Then the captured images weredownsampled The size of the resampled imageswere 100 75 50 and 25 of the original sizeWe calculated the dominant scale ratios based onthe conventional histogram statistics and proposedmethod separately Then the accuracy of both valueswas compared The feature matching and key point
projection results with and without false eliminationwere also recorded and compared
(ii) Experiment II we first calculated a clustering thresh-old according to (14)Thenwe tested the performanceof the conventional methods (mean-shift and gridvoting) based on changing the clustering thresholdcontinuously Here an approximate nearest neigh-bor searching method was employed to speed upmean-shift Because the thresholds could not bedirectly compared in different experiments we usedthe multiple of the computed threshold in differentexperiments to express the new value In (14) CR isthe bandwidth for mean-shift GS is the grid size forgrid voting and 119896MS and 119896GV are the coefficients Wechose an optimal threshold value according to theexperimental results In the experiment the thresholdratio parameters were sampled as 119896MS = 119896GV =26 24 22 20 19 18 17 16 14 12 10 08
CR = 12
times 119896MS times 119879119903 using mean-shif t
GS = 119896GV times 119879119903 using grid voting (14)
(iii) Experiment III we compared the proposed methodwith the conventional grid voting on three types ofdatasets The experimental conditions of the con-ventional grid voting were as follows width andheight of the grid are 1130 of the width and theheight of the query image and the voting grid hadan overlap of 25 of size with an adjacent gridThe performances of the proposed method and theconventional grid voting were expressed in terms ofthe accuracy (precision and recall) and computationaltime
In all the experiments the parameters for SIFT featureextraction and the threshold for feature matching were setas the default values in [3] In particular the initial Gaussiansmoothing parameter was set as 120590
119900= 16 and the default
threshold on key point contrast was set to 01 In theverification procedure in our experiments thresholds 119888minand 119888max were set as 08 and 12 respectively In our work allof the experiments have been conducted on Windows 7 PCwith Core i7-4710MQ CPU 250GHz and 8GB RAM
8 Journal of Sensors
sr = 100 sr = 074
sr = 048 sr = 0254
(a) Center estimation and dominant scale ratio computation by proposedmethod
2000
1500
1000
500
0
1000
500
0
0 1 2
Scale ratio
0 1 2
Scale ratio0 1 2
Scale ratio
0 1 2
Scale ratio
Freq
uenc
yFr
eque
ncy
Freq
uenc
yFr
eque
ncy
sr = 099 sr = 075
sr = 0234sr = 047
300
200
100
0
1500
1000
500
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 8 The first example of dominant scale ratio computation
sr = 101 sr = 075
sr = 050 sr = 0251
(a) Center estimation and dominant scale ratio computation by proposedmethod
0 1 2
0 1 2
Scale ratio0 1 2
Scale ratio
Scale ratio0 1 2
Scale ratio
1000
500
0
Freq
uenc
y
1000
500
0Fr
eque
ncy
sr = 029 sr = 021
sr = 052sr = 021
Freq
uenc
y
400
300
200
100
0
Freq
uenc
y
60
40
20
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 9 The second example of dominant scale ratio computation
42 Experimental Results and Analysis
421 Results of the Dominant Scale Ratio Computation andScale Restriction-Based False Match Elimination Figures 8and 9 display the results of two examples for computing thedominant scale ratios Figures 8(a) and 9(a) are the resultsof the proposed method whereas Figures 8(b) and 9(b) are
the results of the conventional method The reference scaleratios are 100 75 50 and 25 in these figures In Figures8(a) 8(b) and 9(a) the calculated results are close to thereference valuesHowever in Figure 9(b) the results obtainedby the conventional method are not reliable The reason forthe error in Figure 9(b) is that the background noise is toosevere and the extracted features may have nearly the same
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 5
Result
false results eliminationObject level
Clustering based on Tr
Key points projection
False matches eliminationbased on sr
Feature matching
Feature extraction
Scale setting 120590 = sr times 120590o
Get access to the validtraining image
Begin
Query image acquisition
Feature extraction
Scale setting 120590 = 120590o
Database
Feature matching
Key points projection
Kernel densityestimation
Dominant scale ratio sr
clustering thresholdTr computation
and reference
Figure 4 Online detection flowchart
Training image Query image
Matched
Optic axis
features
i
pi
p998400j
c998400oco120576j
Figure 5 Key points projection principle diagram
coordinates is converted into a density estimation problemThe first layer of density estimation aims to find one of thevalid centers in the query image Object center estimationis a crucial problem A two-stage procedure-based adaptivekernel density estimation method elaborated in [28] isemployed to improve the precision Only those density valuesassociatedwith themapped key points are calculated to speedup the process The point with the highest density value issaved Although this point may be not the exact center it isa typical approximationThus the mapped point is identifiedas a valid center Simultaneously the exact training image canbe obtained As is illustrated in Figure 6 the blue point is theobtained object center
[
[
1199091015840
119900119895
1199101015840
119900119895
]
]
= [
1199091015840
119895
1199101015840
119895
] +
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times cos 120576
119895(3)
= [
[
1199091015840
119895
1199101015840
119895
]
]
+
1199041015840
119895
119904119894
[
[
cos 120579 minus sin 120579
sin 120579 cos 120579]
]
times V119894
times (1 minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot)
(4)
= [
[
1199091015840
119900119895
1199101015840
119900119895
]
]⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
RealCenter
+
1199041015840
119895
119904119894
[
cos 120579 minus sin 120579sin 120579 cos 120579
] times V119894times (minus
1205762
119895
2
+
1205764
119895
4
minus sdot sdot sdot )
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
DistributionRange
(5)
6 Journal of Sensors
Columns
Row
s
Training image Query image
Tr
Figure 6 Reference clustering threshold calculation
323 Dominant Scale Ratio Estimation and Scale Restriction-Based False Matches Elimination The dominant scale ratioserves two purposes false match elimination and calculationof a reference clustering radius for the second layer of densityestimation In contrast to the conventional methods in [1011] the dominant scale ratio in our work can be derivedaccording to (6) based on the assumption that the estimatedcenter has a typical scale ratio value In (6) sr is the orientedscale ratio 1199041015840
119898is the scale of the key point related to the
estimated object center and 119904119899is the scale of the matched key
point in the training image
sr =1199041015840
119898
119904119899
(6)
Once the valid center is found the points that supportthe center are recordedThese points are used to calculate thehomography matrix119867
119900for the pattern The matrix is shown
in (7) Because the minimum safe distance between the robotand the shelves is far enough which means the camera onthe robot is far from the targets the actual homography issufficiently close to affine transformationThen the dominantscale ratio sr1015840 can also be computed according to (8)Then sr1015840is used to verify sr Only if the value of sr is approximate tosr1015840 the value of sr is confirmed to be correct We use (9) toassess the similarity between the two values
119867119900=[[
[
ℎ11ℎ12ℎ13
ℎ21ℎ22ℎ23
ℎ31ℎ32
1
]]
]
(7)
sr1015840 = radic1003816100381610038161003816ℎ11times ℎ22
1003816100381610038161003816+1003816100381610038161003816ℎ12times ℎ21
1003816100381610038161003816 (8)
100381610038161003816100381610038161003816100381610038161003816
sr minus sr1015840
min (sr sr1015840)
100381610038161003816100381610038161003816100381610038161003816
lt 15 (9)
To find all possible object instances a SIFT feature-basedtemplate of the ordered object must be reconstructed (seeFigure 1(f))The Gaussian smoothing factor is to be set basedon the dominant scale ratio and is adjusted in accordancewith (10) A new retrieval structure is constructed after SIFTfeatures are detected Then features obtained from the queryimage above are matched to the new dataset Due to theaforementioned preprocessing the amount of SIFT featuresin the newly constructed database is reduced compared
to offline training phase Thus the time overhead of thematching process is greatly reduced
120590TrainAdjust = sr times 120590119900 (10)
The strategy of feature matching disambiguation hereis a cascade of filters These filters can be divided into theratio test algorithm (proposed in [3]) scale restriction-basedmethod (presented in [11]) and geometric verification-basedapproachThe ratio test and scale restrictionmethods use thefollowing matching process The geometric verification takeseffect after clustering After this series of filters most of falsematches can be eliminated
324 Reference Clustering Threshold Computation and Can-didate Object Instances Detection Traditional methods fordetecting multiple object instances such as mean-shift andgrid voting are based on density estimation However thesemethods have the same disadvantage that the bandwidthmust be given by experience For example in [16] the cluster-ing thresholdwas set to a specific value In [19] the voting gridsize was set to the value associated with the size of the queryimage Nevertheless this approachmay still lead to unreliableresults For our specific application occasion the clusteringthreshold can be estimated based on the size of trainingimage and the aforementioned dominant scale ratio Beforethe clustering threshold is finally determined a referenceclustering threshold should be computed automatically Herethe reference clustering threshold can be estimated based on(11) In the formula119879
119903is the reference clustering threshold sr
is the oriented scale ratio and rows and cols are the numbersof rows and columns in the training image respectivelyAs noted above the mapped key points are located insmall regions around real centroids Therefore the clusteringthreshold Th can be finalized in line with (12) in which 119896 isa correction factor According to our repeated experimentsdescribed in Section 4 we provide a recommended value for119896 Candidate object instance detection is based on the secondlayer of density estimation Grid voting is employed here dueto its high precision and recall
119879119903=
sr times rows if rows lt cols
sr times cols otherwise(11)
Th = 119896 times 119879119903 (12)
33 Object Level False Result Elimination In the procedurefor eliminating false detection results we first calculate thehomography matrix for each cluster Then four corners ofthe training image are projected onto four new coordinatesAs a result a convex quadrilateral in accordance with thefour mapped corners is produced Here we provide a simplebut effective way to assess whether the system has obtainedcorrect object instances and error detections are eliminatedThe criterion is as follows
119888min leArea (Quadrilateral)
sr2 times Area (TrainingImage)le 119888max (13)
Journal of Sensors 7
(a) (b) (c)
Figure 7 Examples of objects with different texture levels (a) high texture (b) medium texture (c) low texture
In (13) Area(Quadrilateral) is the area of the convexquadrilateral derived from each candidate object instanceArea(TrainingImage) is the area of the training imageAccording to (13) if the detection is accurate the ratiocoefficient between the area of the quadrilateral and thetraining image is approximate to sr2 The threshold 119888min and119888max should be set before verification
Finally for each cluster the features are matched to the3D sparse model created in the offline training procedureA noniterative method called EPnp [29] was employed toestimate pose for each object instance
4 Experiments
41 Experimental Methodology We are developing a servicerobot for the detection and manipulation of multiple objectinstances and there is no standard database for our specificapplication To validate our approach we created a databasefor 70 types of products with different shapes colors andsizes in a supermarket Objects to be detected were placedon shelves with the front outside All images were capturedusing a SONYRGB cameraThe resolution of the camera was1240 times 780 pixels To comprehensively evaluate the accuracyof the proposed architecture the database was divided intothree sets according to the texture level of the objects Figure 7shows examples of objects with different texture levels
We designed three experiments to evaluate the proposedarchitecture The first experiment was to verify whether thescale ratio calculation and false eliminationmethod were fea-sible The second one was to examine whether the proposedclustering threshold computation method was effective Thelast experiment was to comprehensively evaluate the perfor-mance of the proposed architectureThese three experimentswere designed as follows
(i) Experiment I for each training image in the databasewe acquired an image considering that the objectinstance in the image had the same scale as thetraining image Then the captured images weredownsampled The size of the resampled imageswere 100 75 50 and 25 of the original sizeWe calculated the dominant scale ratios based onthe conventional histogram statistics and proposedmethod separately Then the accuracy of both valueswas compared The feature matching and key point
projection results with and without false eliminationwere also recorded and compared
(ii) Experiment II we first calculated a clustering thresh-old according to (14)Thenwe tested the performanceof the conventional methods (mean-shift and gridvoting) based on changing the clustering thresholdcontinuously Here an approximate nearest neigh-bor searching method was employed to speed upmean-shift Because the thresholds could not bedirectly compared in different experiments we usedthe multiple of the computed threshold in differentexperiments to express the new value In (14) CR isthe bandwidth for mean-shift GS is the grid size forgrid voting and 119896MS and 119896GV are the coefficients Wechose an optimal threshold value according to theexperimental results In the experiment the thresholdratio parameters were sampled as 119896MS = 119896GV =26 24 22 20 19 18 17 16 14 12 10 08
CR = 12
times 119896MS times 119879119903 using mean-shif t
GS = 119896GV times 119879119903 using grid voting (14)
(iii) Experiment III we compared the proposed methodwith the conventional grid voting on three types ofdatasets The experimental conditions of the con-ventional grid voting were as follows width andheight of the grid are 1130 of the width and theheight of the query image and the voting grid hadan overlap of 25 of size with an adjacent gridThe performances of the proposed method and theconventional grid voting were expressed in terms ofthe accuracy (precision and recall) and computationaltime
In all the experiments the parameters for SIFT featureextraction and the threshold for feature matching were setas the default values in [3] In particular the initial Gaussiansmoothing parameter was set as 120590
119900= 16 and the default
threshold on key point contrast was set to 01 In theverification procedure in our experiments thresholds 119888minand 119888max were set as 08 and 12 respectively In our work allof the experiments have been conducted on Windows 7 PCwith Core i7-4710MQ CPU 250GHz and 8GB RAM
8 Journal of Sensors
sr = 100 sr = 074
sr = 048 sr = 0254
(a) Center estimation and dominant scale ratio computation by proposedmethod
2000
1500
1000
500
0
1000
500
0
0 1 2
Scale ratio
0 1 2
Scale ratio0 1 2
Scale ratio
0 1 2
Scale ratio
Freq
uenc
yFr
eque
ncy
Freq
uenc
yFr
eque
ncy
sr = 099 sr = 075
sr = 0234sr = 047
300
200
100
0
1500
1000
500
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 8 The first example of dominant scale ratio computation
sr = 101 sr = 075
sr = 050 sr = 0251
(a) Center estimation and dominant scale ratio computation by proposedmethod
0 1 2
0 1 2
Scale ratio0 1 2
Scale ratio
Scale ratio0 1 2
Scale ratio
1000
500
0
Freq
uenc
y
1000
500
0Fr
eque
ncy
sr = 029 sr = 021
sr = 052sr = 021
Freq
uenc
y
400
300
200
100
0
Freq
uenc
y
60
40
20
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 9 The second example of dominant scale ratio computation
42 Experimental Results and Analysis
421 Results of the Dominant Scale Ratio Computation andScale Restriction-Based False Match Elimination Figures 8and 9 display the results of two examples for computing thedominant scale ratios Figures 8(a) and 9(a) are the resultsof the proposed method whereas Figures 8(b) and 9(b) are
the results of the conventional method The reference scaleratios are 100 75 50 and 25 in these figures In Figures8(a) 8(b) and 9(a) the calculated results are close to thereference valuesHowever in Figure 9(b) the results obtainedby the conventional method are not reliable The reason forthe error in Figure 9(b) is that the background noise is toosevere and the extracted features may have nearly the same
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
6 Journal of Sensors
Columns
Row
s
Training image Query image
Tr
Figure 6 Reference clustering threshold calculation
323 Dominant Scale Ratio Estimation and Scale Restriction-Based False Matches Elimination The dominant scale ratioserves two purposes false match elimination and calculationof a reference clustering radius for the second layer of densityestimation In contrast to the conventional methods in [1011] the dominant scale ratio in our work can be derivedaccording to (6) based on the assumption that the estimatedcenter has a typical scale ratio value In (6) sr is the orientedscale ratio 1199041015840
119898is the scale of the key point related to the
estimated object center and 119904119899is the scale of the matched key
point in the training image
sr =1199041015840
119898
119904119899
(6)
Once the valid center is found the points that supportthe center are recordedThese points are used to calculate thehomography matrix119867
119900for the pattern The matrix is shown
in (7) Because the minimum safe distance between the robotand the shelves is far enough which means the camera onthe robot is far from the targets the actual homography issufficiently close to affine transformationThen the dominantscale ratio sr1015840 can also be computed according to (8)Then sr1015840is used to verify sr Only if the value of sr is approximate tosr1015840 the value of sr is confirmed to be correct We use (9) toassess the similarity between the two values
119867119900=[[
[
ℎ11ℎ12ℎ13
ℎ21ℎ22ℎ23
ℎ31ℎ32
1
]]
]
(7)
sr1015840 = radic1003816100381610038161003816ℎ11times ℎ22
1003816100381610038161003816+1003816100381610038161003816ℎ12times ℎ21
1003816100381610038161003816 (8)
100381610038161003816100381610038161003816100381610038161003816
sr minus sr1015840
min (sr sr1015840)
100381610038161003816100381610038161003816100381610038161003816
lt 15 (9)
To find all possible object instances a SIFT feature-basedtemplate of the ordered object must be reconstructed (seeFigure 1(f))The Gaussian smoothing factor is to be set basedon the dominant scale ratio and is adjusted in accordancewith (10) A new retrieval structure is constructed after SIFTfeatures are detected Then features obtained from the queryimage above are matched to the new dataset Due to theaforementioned preprocessing the amount of SIFT featuresin the newly constructed database is reduced compared
to offline training phase Thus the time overhead of thematching process is greatly reduced
120590TrainAdjust = sr times 120590119900 (10)
The strategy of feature matching disambiguation hereis a cascade of filters These filters can be divided into theratio test algorithm (proposed in [3]) scale restriction-basedmethod (presented in [11]) and geometric verification-basedapproachThe ratio test and scale restrictionmethods use thefollowing matching process The geometric verification takeseffect after clustering After this series of filters most of falsematches can be eliminated
324 Reference Clustering Threshold Computation and Can-didate Object Instances Detection Traditional methods fordetecting multiple object instances such as mean-shift andgrid voting are based on density estimation However thesemethods have the same disadvantage that the bandwidthmust be given by experience For example in [16] the cluster-ing thresholdwas set to a specific value In [19] the voting gridsize was set to the value associated with the size of the queryimage Nevertheless this approachmay still lead to unreliableresults For our specific application occasion the clusteringthreshold can be estimated based on the size of trainingimage and the aforementioned dominant scale ratio Beforethe clustering threshold is finally determined a referenceclustering threshold should be computed automatically Herethe reference clustering threshold can be estimated based on(11) In the formula119879
119903is the reference clustering threshold sr
is the oriented scale ratio and rows and cols are the numbersof rows and columns in the training image respectivelyAs noted above the mapped key points are located insmall regions around real centroids Therefore the clusteringthreshold Th can be finalized in line with (12) in which 119896 isa correction factor According to our repeated experimentsdescribed in Section 4 we provide a recommended value for119896 Candidate object instance detection is based on the secondlayer of density estimation Grid voting is employed here dueto its high precision and recall
119879119903=
sr times rows if rows lt cols
sr times cols otherwise(11)
Th = 119896 times 119879119903 (12)
33 Object Level False Result Elimination In the procedurefor eliminating false detection results we first calculate thehomography matrix for each cluster Then four corners ofthe training image are projected onto four new coordinatesAs a result a convex quadrilateral in accordance with thefour mapped corners is produced Here we provide a simplebut effective way to assess whether the system has obtainedcorrect object instances and error detections are eliminatedThe criterion is as follows
119888min leArea (Quadrilateral)
sr2 times Area (TrainingImage)le 119888max (13)
Journal of Sensors 7
(a) (b) (c)
Figure 7 Examples of objects with different texture levels (a) high texture (b) medium texture (c) low texture
In (13) Area(Quadrilateral) is the area of the convexquadrilateral derived from each candidate object instanceArea(TrainingImage) is the area of the training imageAccording to (13) if the detection is accurate the ratiocoefficient between the area of the quadrilateral and thetraining image is approximate to sr2 The threshold 119888min and119888max should be set before verification
Finally for each cluster the features are matched to the3D sparse model created in the offline training procedureA noniterative method called EPnp [29] was employed toestimate pose for each object instance
4 Experiments
41 Experimental Methodology We are developing a servicerobot for the detection and manipulation of multiple objectinstances and there is no standard database for our specificapplication To validate our approach we created a databasefor 70 types of products with different shapes colors andsizes in a supermarket Objects to be detected were placedon shelves with the front outside All images were capturedusing a SONYRGB cameraThe resolution of the camera was1240 times 780 pixels To comprehensively evaluate the accuracyof the proposed architecture the database was divided intothree sets according to the texture level of the objects Figure 7shows examples of objects with different texture levels
We designed three experiments to evaluate the proposedarchitecture The first experiment was to verify whether thescale ratio calculation and false eliminationmethod were fea-sible The second one was to examine whether the proposedclustering threshold computation method was effective Thelast experiment was to comprehensively evaluate the perfor-mance of the proposed architectureThese three experimentswere designed as follows
(i) Experiment I for each training image in the databasewe acquired an image considering that the objectinstance in the image had the same scale as thetraining image Then the captured images weredownsampled The size of the resampled imageswere 100 75 50 and 25 of the original sizeWe calculated the dominant scale ratios based onthe conventional histogram statistics and proposedmethod separately Then the accuracy of both valueswas compared The feature matching and key point
projection results with and without false eliminationwere also recorded and compared
(ii) Experiment II we first calculated a clustering thresh-old according to (14)Thenwe tested the performanceof the conventional methods (mean-shift and gridvoting) based on changing the clustering thresholdcontinuously Here an approximate nearest neigh-bor searching method was employed to speed upmean-shift Because the thresholds could not bedirectly compared in different experiments we usedthe multiple of the computed threshold in differentexperiments to express the new value In (14) CR isthe bandwidth for mean-shift GS is the grid size forgrid voting and 119896MS and 119896GV are the coefficients Wechose an optimal threshold value according to theexperimental results In the experiment the thresholdratio parameters were sampled as 119896MS = 119896GV =26 24 22 20 19 18 17 16 14 12 10 08
CR = 12
times 119896MS times 119879119903 using mean-shif t
GS = 119896GV times 119879119903 using grid voting (14)
(iii) Experiment III we compared the proposed methodwith the conventional grid voting on three types ofdatasets The experimental conditions of the con-ventional grid voting were as follows width andheight of the grid are 1130 of the width and theheight of the query image and the voting grid hadan overlap of 25 of size with an adjacent gridThe performances of the proposed method and theconventional grid voting were expressed in terms ofthe accuracy (precision and recall) and computationaltime
In all the experiments the parameters for SIFT featureextraction and the threshold for feature matching were setas the default values in [3] In particular the initial Gaussiansmoothing parameter was set as 120590
119900= 16 and the default
threshold on key point contrast was set to 01 In theverification procedure in our experiments thresholds 119888minand 119888max were set as 08 and 12 respectively In our work allof the experiments have been conducted on Windows 7 PCwith Core i7-4710MQ CPU 250GHz and 8GB RAM
8 Journal of Sensors
sr = 100 sr = 074
sr = 048 sr = 0254
(a) Center estimation and dominant scale ratio computation by proposedmethod
2000
1500
1000
500
0
1000
500
0
0 1 2
Scale ratio
0 1 2
Scale ratio0 1 2
Scale ratio
0 1 2
Scale ratio
Freq
uenc
yFr
eque
ncy
Freq
uenc
yFr
eque
ncy
sr = 099 sr = 075
sr = 0234sr = 047
300
200
100
0
1500
1000
500
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 8 The first example of dominant scale ratio computation
sr = 101 sr = 075
sr = 050 sr = 0251
(a) Center estimation and dominant scale ratio computation by proposedmethod
0 1 2
0 1 2
Scale ratio0 1 2
Scale ratio
Scale ratio0 1 2
Scale ratio
1000
500
0
Freq
uenc
y
1000
500
0Fr
eque
ncy
sr = 029 sr = 021
sr = 052sr = 021
Freq
uenc
y
400
300
200
100
0
Freq
uenc
y
60
40
20
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 9 The second example of dominant scale ratio computation
42 Experimental Results and Analysis
421 Results of the Dominant Scale Ratio Computation andScale Restriction-Based False Match Elimination Figures 8and 9 display the results of two examples for computing thedominant scale ratios Figures 8(a) and 9(a) are the resultsof the proposed method whereas Figures 8(b) and 9(b) are
the results of the conventional method The reference scaleratios are 100 75 50 and 25 in these figures In Figures8(a) 8(b) and 9(a) the calculated results are close to thereference valuesHowever in Figure 9(b) the results obtainedby the conventional method are not reliable The reason forthe error in Figure 9(b) is that the background noise is toosevere and the extracted features may have nearly the same
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 7
(a) (b) (c)
Figure 7 Examples of objects with different texture levels (a) high texture (b) medium texture (c) low texture
In (13) Area(Quadrilateral) is the area of the convexquadrilateral derived from each candidate object instanceArea(TrainingImage) is the area of the training imageAccording to (13) if the detection is accurate the ratiocoefficient between the area of the quadrilateral and thetraining image is approximate to sr2 The threshold 119888min and119888max should be set before verification
Finally for each cluster the features are matched to the3D sparse model created in the offline training procedureA noniterative method called EPnp [29] was employed toestimate pose for each object instance
4 Experiments
41 Experimental Methodology We are developing a servicerobot for the detection and manipulation of multiple objectinstances and there is no standard database for our specificapplication To validate our approach we created a databasefor 70 types of products with different shapes colors andsizes in a supermarket Objects to be detected were placedon shelves with the front outside All images were capturedusing a SONYRGB cameraThe resolution of the camera was1240 times 780 pixels To comprehensively evaluate the accuracyof the proposed architecture the database was divided intothree sets according to the texture level of the objects Figure 7shows examples of objects with different texture levels
We designed three experiments to evaluate the proposedarchitecture The first experiment was to verify whether thescale ratio calculation and false eliminationmethod were fea-sible The second one was to examine whether the proposedclustering threshold computation method was effective Thelast experiment was to comprehensively evaluate the perfor-mance of the proposed architectureThese three experimentswere designed as follows
(i) Experiment I for each training image in the databasewe acquired an image considering that the objectinstance in the image had the same scale as thetraining image Then the captured images weredownsampled The size of the resampled imageswere 100 75 50 and 25 of the original sizeWe calculated the dominant scale ratios based onthe conventional histogram statistics and proposedmethod separately Then the accuracy of both valueswas compared The feature matching and key point
projection results with and without false eliminationwere also recorded and compared
(ii) Experiment II we first calculated a clustering thresh-old according to (14)Thenwe tested the performanceof the conventional methods (mean-shift and gridvoting) based on changing the clustering thresholdcontinuously Here an approximate nearest neigh-bor searching method was employed to speed upmean-shift Because the thresholds could not bedirectly compared in different experiments we usedthe multiple of the computed threshold in differentexperiments to express the new value In (14) CR isthe bandwidth for mean-shift GS is the grid size forgrid voting and 119896MS and 119896GV are the coefficients Wechose an optimal threshold value according to theexperimental results In the experiment the thresholdratio parameters were sampled as 119896MS = 119896GV =26 24 22 20 19 18 17 16 14 12 10 08
CR = 12
times 119896MS times 119879119903 using mean-shif t
GS = 119896GV times 119879119903 using grid voting (14)
(iii) Experiment III we compared the proposed methodwith the conventional grid voting on three types ofdatasets The experimental conditions of the con-ventional grid voting were as follows width andheight of the grid are 1130 of the width and theheight of the query image and the voting grid hadan overlap of 25 of size with an adjacent gridThe performances of the proposed method and theconventional grid voting were expressed in terms ofthe accuracy (precision and recall) and computationaltime
In all the experiments the parameters for SIFT featureextraction and the threshold for feature matching were setas the default values in [3] In particular the initial Gaussiansmoothing parameter was set as 120590
119900= 16 and the default
threshold on key point contrast was set to 01 In theverification procedure in our experiments thresholds 119888minand 119888max were set as 08 and 12 respectively In our work allof the experiments have been conducted on Windows 7 PCwith Core i7-4710MQ CPU 250GHz and 8GB RAM
8 Journal of Sensors
sr = 100 sr = 074
sr = 048 sr = 0254
(a) Center estimation and dominant scale ratio computation by proposedmethod
2000
1500
1000
500
0
1000
500
0
0 1 2
Scale ratio
0 1 2
Scale ratio0 1 2
Scale ratio
0 1 2
Scale ratio
Freq
uenc
yFr
eque
ncy
Freq
uenc
yFr
eque
ncy
sr = 099 sr = 075
sr = 0234sr = 047
300
200
100
0
1500
1000
500
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 8 The first example of dominant scale ratio computation
sr = 101 sr = 075
sr = 050 sr = 0251
(a) Center estimation and dominant scale ratio computation by proposedmethod
0 1 2
0 1 2
Scale ratio0 1 2
Scale ratio
Scale ratio0 1 2
Scale ratio
1000
500
0
Freq
uenc
y
1000
500
0Fr
eque
ncy
sr = 029 sr = 021
sr = 052sr = 021
Freq
uenc
y
400
300
200
100
0
Freq
uenc
y
60
40
20
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 9 The second example of dominant scale ratio computation
42 Experimental Results and Analysis
421 Results of the Dominant Scale Ratio Computation andScale Restriction-Based False Match Elimination Figures 8and 9 display the results of two examples for computing thedominant scale ratios Figures 8(a) and 9(a) are the resultsof the proposed method whereas Figures 8(b) and 9(b) are
the results of the conventional method The reference scaleratios are 100 75 50 and 25 in these figures In Figures8(a) 8(b) and 9(a) the calculated results are close to thereference valuesHowever in Figure 9(b) the results obtainedby the conventional method are not reliable The reason forthe error in Figure 9(b) is that the background noise is toosevere and the extracted features may have nearly the same
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
8 Journal of Sensors
sr = 100 sr = 074
sr = 048 sr = 0254
(a) Center estimation and dominant scale ratio computation by proposedmethod
2000
1500
1000
500
0
1000
500
0
0 1 2
Scale ratio
0 1 2
Scale ratio0 1 2
Scale ratio
0 1 2
Scale ratio
Freq
uenc
yFr
eque
ncy
Freq
uenc
yFr
eque
ncy
sr = 099 sr = 075
sr = 0234sr = 047
300
200
100
0
1500
1000
500
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 8 The first example of dominant scale ratio computation
sr = 101 sr = 075
sr = 050 sr = 0251
(a) Center estimation and dominant scale ratio computation by proposedmethod
0 1 2
0 1 2
Scale ratio0 1 2
Scale ratio
Scale ratio0 1 2
Scale ratio
1000
500
0
Freq
uenc
y
1000
500
0Fr
eque
ncy
sr = 029 sr = 021
sr = 052sr = 021
Freq
uenc
y
400
300
200
100
0
Freq
uenc
y
60
40
20
0
(b) Dominant scale ratio computation by conventional histogram statistic
Figure 9 The second example of dominant scale ratio computation
42 Experimental Results and Analysis
421 Results of the Dominant Scale Ratio Computation andScale Restriction-Based False Match Elimination Figures 8and 9 display the results of two examples for computing thedominant scale ratios Figures 8(a) and 9(a) are the resultsof the proposed method whereas Figures 8(b) and 9(b) are
the results of the conventional method The reference scaleratios are 100 75 50 and 25 in these figures In Figures8(a) 8(b) and 9(a) the calculated results are close to thereference valuesHowever in Figure 9(b) the results obtainedby the conventional method are not reliable The reason forthe error in Figure 9(b) is that the background noise is toosevere and the extracted features may have nearly the same
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 9
(a) (b) (c)
Figure 10 Raw matching results (a) training image (b) feature matching (c) key points projection
(a) (b) (c)
Figure 11 Matching results with false matches elimination (a) training image (b) feature matching (c) key points projection
scale ratio The proposed method evaluates the dominantscale ratio depending on the distribution and relationship ofkey points therefore the result is more reliable
Figure 10 shows that the raw matching results withoutscale-constrained filtering exhibit a large number of falsematches The matching results based on scale-constrainedfiltering are shown in Figure 11 with fewer outliers presentScale restriction-based template reconstruction and elimi-nation of false matches lead to the best optimum results(Figure 12) Most of the false matches are eliminated and lay agood foundation for the subsequent clustering Figures 10ndash12illustrate the effectiveness of the proposed filters
422 Results of Clustering Threshold Estimation Figures13(a)ndash14(b) show the performance of the methods usingmean-shift and grid voting The brown curve in Figure 13(a)describes the accuracy of grid voting and the blue onedescribes accuracy of mean-shift Figure 13(b) illustrates thetrue positive rate versus false positive rate of mean-shift andgrid voting as the discrimination threshold changes Points inboth Figures 13(a) and 13(b) were sampled based on differentclustering threshold ratios as detailed in the experimentalmethodology The threshold ratio values decrease graduallyfrom left to right Besides coordinates surrounded by circlesare related to the precalculated threshold Figures 14(a) and14(b) show the average value and standard deviation ofcomputational time for mean-shift and grid voting based ondifferent thresholds
As shown in Figure 13(a) the precision decreases and therecall increases as the threshold is decreased In Figure 13(b)
both the true and false positive rates increase as the thresholdis decreased Figure 13(a) shows that grid voting has abetter performance than mean-shift in recall as a whole andFigure 13(b) indicates that grid voting has a better perfor-mance in accuracy than mean-shift According to Figures13(a) and 13(b) 119896MS and 119896GV corresponding to the inflectionpoint are both 18 As shown in Figure 14(a) the time costfor feature matching and ANN-based mean-shift clusteringremains relatively stable However a smaller threshold ratioleads to a higher time cost for geometric verification becausethe number of clusters increases As shown in Figure 14(b)the computational time for clustering using grid voting isconsiderably shorter than when using mean-shift but theverification time becomes longer due to the clustering errorsAccording to the results of the feasibility validation clusteringradius 119896MS = 18 for mean-shift and 119896GV = 18 for grid votingare optimized preset parameters for the detection of multipleobject instances in inventory management
423 Performance for Different Object Instance DetectionBased on the Proposed Architecture Table 1 shows the averageresults of different levels of textures using the proposedmethod and grid voting The precision and recall wererecorded The computational times for feature extractionraw matching density estimation template reconstruction-based rematching clustering and geometric verificationweredocumented separately Figure 15 shows the results of twoexamples using the proposed method
According to Table 1 different levels of texture densitywill lead to different accuracies and computational times
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
10 Journal of Sensors
(a) (b) (c)
Figure 12 Matching results based on template reconstruction and scale restriction (a) training image (b) feature matching (c) key pointsprojection
Mean-shift + RANSACGrid voting + RANSAC
Recall ()
90
92
94
96
98
100
Prec
ision
()
kMS = 18kGV = 18
1009590858075
(a) Accuracy of mean-shift and grid voting
Mean-shift + RANSAC
kMS = 18
kGV = 18
False positive rate ()
True
pos
itive
rate
()
Grid voting + RANSAC
100
95
90
85
80
750 10 20 30 40 50 60 70
(b) True positive rate versus false positive rate of mean-shift and gridvoting
Figure 13 Accuracy performance using mean-shift and grid voting
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(a) Computational time for mean-shift
6000
5000
4000
3000
2000
1000
0
Com
puta
tiona
l tim
e (m
s)
k
Feature matchingClusteringGeometric verification
26 24 22 20 19 18 17 16 14 12 10 08
(b) Computational time for grid voting
Figure 14 Computational time statistics
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 11
A
(a)
EDB C
(b)
H
F
G
(c)
Figure 15 Results of two detection examples
Table 1 Average results for different levels of texture using proposed method and grid voting
Texture level MethodsAccuracy () Computational time (ms)
Precision Recall Featuredetection Raw match Density
estimation Rematch Clustering Geometricverification Total
High Proposed 976 968 1027 379 479 526 3 522 2936Grid voting 962 963 1027 379 0 0 4 2595 4005
Medium Proposed 964 958 941 220 191 246 3 866 2467Grid voting 957 954 941 220 0 0 4 2033 3198
Low Proposed 921 936 586 94 72 119 4 1054 1929Grid voting 916 919 586 94 0 0 3 1345 2028
Precision and time overhead increase with increases in thetexture density Although the first layer of density esti-mation and template reconstruction-based rematching takesome computational time the geometric verification latencyis greatly reduced compared to the conventional methodbecause the adaptive threshold is more reasonable than thejudgment based simply on the size of the query image Table 1indicates that the proposed architecture can accurately detectand identify multiple identical objects with low latency Ascan be seen in Figure 15 most of object instances weredetected However objects marked as ldquoArdquo in Figure 15(a)ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b) and ldquoFrdquo ldquoHrdquo and ldquoGrdquo inFigure 15(c) were not detected and objects marked as ldquoErdquowere a false detection result Reasons for these errors are thereflection of light (in Figure 15(a)) high similarity of objects(the short bottle marked as ldquoErdquo is similar to the high one inFigure 15(b)) translucent occlusion (three undetected yellowbottlesmarked as ldquoBrdquo ldquoCrdquo and ldquoDrdquo in Figure 15(b)) and errorclustering results (ldquoFrdquo ldquoGrdquo and ldquoHrdquo in Figure 15(c))
5 Conclusions
In this paper we introduced the problem of multiple objectinstance detection in robot inventory management and pro-posed a dual-layer density estimation-based architecture forresolving this issueThe proposed approach is able to success-fully address the multiple object instance detection problemin practice by considering dominant scale ratio-based falsematch elimination and adaptive clustering threshold-based
grid voting The experimental results illustrate the superiorperformance our proposed method in terms of its highaccuracy and low latency
Although the presented architecture performs well inthese types of applications the algorithm would fail whenapplied to more complex problems For example if objectinstances have different scales in the query image theassumptions made in this paper will be no longer validFurther more the accuracy of the proposed method willbe greatly reduced when there is a dramatic change ofillumination or the target is occluded by other translucentobjects In our future work we will focus on improving themethod for solving such complex problems
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The authors would like to thank Shenyang SIASUN RobotAutomation Co Ltd for funding this research The projectis supported byTheNational Key Technology RampD ProgramChina (no 2015BAF13B00)
References
[1] C L Zitnick and P Dollar ldquoEdge boxes locating object pro-posals from edgesrdquo in Proceedings of the European Conference
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
12 Journal of Sensors
on Computer Vision (ECCV rsquo14) Zurich Switzerland September2014 pp 391ndash405 Springer Cham Switzerland 2014
[2] SHinterstoisser S BenhimaneNNavab P Fua andV LepetitldquoOnline learning of patch perspective rectification for efficientobject detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8IEEE Anchorage Alaska USA June 2008
[3] D G Lowe ldquoDistinctive image features from scale-invariantkeypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
[4] Y Ke and R Sukthankar ldquoPCA-SIFT a more distinctiverepresentation for local image descriptorsrdquo in Proceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo04) pp II506ndashII513 WashingtonDC USA July 2004
[5] K Mikolajczyk and C Schmid ldquoA performance evaluation oflocal descriptorsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 27 no 10 pp 1615ndash1630 2005
[6] H Bay A Ess T Tuytelaars and L Van Gool ldquoSpeeded-uprobust features (SURF)rdquo Computer Vision and Image Under-standing vol 110 no 3 pp 346ndash359 2008
[7] L Juan and O Gwun ldquoA comparison of SIFT PCA-SIFT andSURFrdquo International Journal of Image Processing vol 3 no 4pp 143ndash152 2009
[8] Q Sen and Z Jianying ldquoImproved SIFT-based bidirectionalimage matching algorithm Mechanical science and technologyfor aerospace engineeringrdquoMechanical Science and Technologyfor Aerospace Engineering vol 26 pp 1179ndash1182 2007
[9] J Wang and M F Cohen ldquoImage and video matting a surveyrdquoFoundations and Trends in Computer Graphics and Vision vol3 no 2 pp 97ndash175 2008
[10] Y Bastanlar A Temizel and Y Yardimci ldquoImproved SIFTmatching for image pairs with scale differencerdquo ElectronicsLetters vol 46 no 5 pp 346ndash348 2010
[11] J Zhang andH-S Sang ldquoSIFTmatchingmethod based on basescale transformationrdquo Journal of Infrared andMillimeter Wavesvol 33 no 2 pp 177ndash182 2014
[12] R Arandjelovic and A Zisserman ldquoThree things everyoneshould know to improve object retrievalrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 2911ndash2918 San Francisco Calif USA June 2012
[13] F-E Lin Y-H Kuo and W H Hsu ldquoMultiple object local-ization by context-aware adaptive window search and search-based object recognitionrdquo in Proceedings of the 19th ACMInternational Conference onMultimedia ACMMultimedia (MMrsquo11) pp 1021ndash1024 ACM Scottsdale Ariz USA December 2011
[14] C-C Wu Y-H Kuo and W Hsu ldquoLarge-scale simultaneousmulti-object recognition and localization via bottom up search-based approachrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 969ndash972 Nara JapanNovember 2012
[15] AColletMMartinez and S S Srinivasa ldquoTheMOPED frame-work object recognition andpose estimation formanipulationrdquoThe International Journal of Robotics Research vol 30 no 10 pp1284ndash1306 2011
[16] S Zickler and M M Veloso ldquoDetection and localization ofmultiple objectsrdquo in Proceedings of the 6th IEEE-RAS Inter-national Conference on Humanoid Robots pp 20ndash25 GenovaItaly December 2006
[17] G Aragon-Camarasa and J P Siebert ldquoUnsupervised clusteringinHough space for recognition ofmultiple instances of the same
object in a cluttered scenerdquo Pattern Recognition Letters vol 31no 11 pp 1274ndash1284 2010
[18] R Bao K Higa and K Iwamoto ldquoLocal feature based multipleobject instance identification using scale and rotation invariantimplicit shape modelrdquo in Proceedings of the 12th Asian Confer-ence onComputer Vision (ACCV rsquo14) Singapore November 2014pp 600ndash614 Springer Cham Switzerland 2014
[19] K Higa K Iwamoto and T Nomura ldquoMultiple object iden-tification using grid voting of object center estimated fromkeypoint matchesrdquo in Proceedings of the 20th IEEE InternationalConference on Image Processing (ICIP rsquo13) pp 2973ndash2977Melbourne Australia September 2013
[20] R Szeliski and S B Kang ldquoRecovering 3D shape and motionfrom image streams using nonlinear least squaresrdquo in Proceed-ings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo93) pp 752ndash753 IEEENew York NY USA June 1993
[21] M Muja and D G Lowe ldquoFast approximate nearest neighborswith automatic algorithm configurationrdquo in Proceedings ofthe 4th International Conference on Computer Vision Theoryand Applications (VISAPP rsquo09) pp 331ndash340 Lisboa PortugalFebruary 2009
[22] M Muja and D G Lowe ldquoFast matching of binary featuresrdquo inProceedings of the 9th Conference on Computer and Robot Vision(CRV rsquo12) pp 404ndash410 IEEE Toronto Canada May 2012
[23] D Nister and H Stewenius ldquoScalable recognition with avocabulary treerdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo06) vol 2 pp 2161ndash2168 IEEE NewYork NY USA June 2006
[24] B Matei Y Shan H S Sawhney et al ldquoRapid object indexingusing locality sensitive hashing and joint 3D-signature spaceestimationrdquo IEEETransactions onPatternAnalysis AndMachineIntelligence vol 28 no 7 pp 1111ndash1126 2006
[25] B Kulis andK Grauman ldquoKernelized locality-sensitive hashingfor scalable image searchrdquo in Proceedings of the 12th Interna-tional Conference onComputerVision (ICCV rsquo09) pp 2130ndash2137Kyoto Japan October 2009
[26] J Wang S Kumar and S-F Chang ldquoSemi-supervised hash-ing for scalable image retrievalrdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 3424ndash3431 IEEE San FranciscoCalif USA June 2010
[27] A Andoni and P Indyk ldquoNear-optimal hashing algorithmsfor approximate nearest neighbor in high dimensionsrdquo inProceedings of the 47th Annual IEEE Symposium on Foundationsof Computer Science (FOCS rsquo06) pp 459ndash468 Berkeley CalifUSA October 2006
[28] B W Silverman ldquoDensity Estimation for Statistics and DataAnalysis Chapman amp Hall LondonmdashNew York 1986 175 ppm12rdquo Biometrical Journal vol 30 pp 876ndash877 1988
[29] V Lepetit F Moreno-Noguer and P Fua ldquoEPnP An accurateO(n) solution to the PnP problemrdquo International Journal ofComputer Vision vol 81 no 2 pp 155ndash166 2009
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of