Mutual Localization: Two Camera Relative 6-DOF Pose ...jryde/...mutual_localization.pdf · Mutual...

Mutual Localization Two Camera Relative 6-DOF Pose Estimationfrom Reciprocal Fiducial Observation

Vikas Dhiman Julian Ryde Jason J Corso

AbstractmdashConcurrently estimating the 6-DOF pose of multiplecameras or robotsmdashcooperative localizationmdashis a core problemin contemporary robotics Current works focus on a set ofmutually observable world landmarks and often require inbuiltegomotion estimates situations in which both assumptions areviolated often arise for example robots with erroneous lowquality odometry and IMU exploring an unknown environmentIn contrast to these existing works in cooperative localization wepropose a cooperative localization method which we call mutuallocalization that uses reciprocal observations of camera-fiducialsto obviate the need for egomotion estimates and mutually ob-servable world landmarks We formulate and solve an algebraicformulation for the pose of the two camera mutual localizationsetup under these assumptions Our experiments demonstrate thecapabilities of our proposal egomotion-free cooperative localiza-tion method for example the method achieves 2cm range and 07degree accuracy at 2m sensing for 6-DOF pose To demonstratethe applicability of the proposed work we deploy our methodon Turtlebots and we compare our results with ARToolKit [1]and Bundler [2] over which our method achieves a tenfoldimprovement in translation estimation accuracy

I INTRODUCTION

Cooperative localization is the problem of finding the rel-ative 6-DOF pose between robots using sensors from morethan one robot Various strategies involving different sensorshave been used to solve this problem For example Cognettiet al [3] [4] use multiple bearning-only observations with amotion detector to solve for cooperative localization amongmultiple anonymous robots Trawny et al [5] and lately Zhouet al [6] [7] provide a comprehensive mathematical analysisof solving cooperative localization for different cases of sensordata availability Section II covers related literature in moredetail

To the best of our knowledge all other cooperative localiza-tion works (see Section II) require estimation of egomotionHowever a dependency on egomotion is a limitation for sys-tems that do not have gyroscopes or accelerometers which canprovide displacement between two successive observationsVisual egomotion like MonoSLAM [8] using distinctive im-age features estimates requires high quality correspondenceswhich remains a challenge in machine vision especially incases of non-textured environments Moreover visual egomo-tion techniques are only correct upto a scale factor Contempo-rary cooperative localization methods that use egomotion [5][6] [9] yield best results only with motion perpendicular tothe direction of mutual observation and fails to produce resultswhen either observer undergoes pure rotation or motion in

JJ Corso J Ryde and Vikas Dhiman are with Department of Com-puter Science and Engineering SUNY at Buffalo Buffalo NY USAjcorsojrydevikasdhibuffaloedu

s1p1q1

s2p2q2

p3s3q3

q3p1

p2

p4s4q4

q4Cp

M3

M4 M2

M1

Cq

Figure 1 Simplified diagram for the two-camera problemAssuming the length of respective rays to be s1 s2 s3 s4respectively each marker coordinates can be written in bothcoordinate frames p and q For example M1 is s1p1 inframe p and q1 in q where p1 unit vector parallel to p1

the direction of observation Consequently in simple robotslike Turtlebot this technique produces poor results becauseof absence of sideways motion that require omni-directionalwheels

To obviate the need for egomotion we propose a method forrelative pose estimation that leverages distance between fidu-cial markers mounted on robots for resolving scale ambiguityOur method which we call mutual localization depends uponthe simultaneous mutualreciprocal observation of bearing-only sensors Each sensor is outfitted with fiducial markers(Fig 1) whose position within the host sensor coordinate sys-tem is known in contrast to assumptions in earlier works thatmultiple world landmarks would be concurrently observableby each sensor [10] Since our method does not depend onegomotion hence it is instantaneous which means it is robustto false negatives and it do not suffers from the errors inegomotion estimation

The main contribution of our work is a generalizationof Perspective-3-Points (P3P) problem where observer andthe observed points are distributed in different referenceframes unlike conventional approach where observerrsquos refer-ence frame do not contain any observed points and vice versaIn this paper we present an algebraic derivation to solve forthe relative camera pose (rotation and translation) of the twobearing-only sensors in the case that each can observe twoknown fiducial points in the other sensor essentially givingan algebraic system to compute the relative pose from fourcorrespondences (only three are required in our algorithmbut we show how the fourth correspondence can be usedto generate a set of hypothesis solutions from which bestsolution can be chosen) Two fiducial points on each robot(providing four correspondences) are preferable to one on oneand two on the other as it allows extension to multi-robot(gt 2) systems ensuring that any pair of similarly equippedrobots can estimate their relative pose In this paper we focus

on only two robot case as an extension to multi-robot case aspairwise localization is trivial yet practically effective

Our derivation although inspired by the linear pose es-timation method of Quan and Lan [11] is novel since allrelevant past works we know on P3P problem [12] assume allobservations are made in one coordinate frame and observedpoints in the other In contrast our method makes no such as-sumption and concurrently solves the pose estimation problemfor landmarks sensed in camera-specific coordinates frames

We demonstrate the effectiveness of our method by analyz-ing its accuracy in both synthetic which affords quantitativeabsolute assessment and real localization situations by deploy-ment on Turtlebots We use 3D reconstruction experimentsto show the accuracy of our algorithm Our experimentsdemonstrate the effectiveness of the proposed approach

II RELATED WORK

Cooperative localization has been extensively studied andapplied to various applications One of the latest works in thisarea comes from Cognetti et al [3] [4] where they focuson the problem of cooperatively localizing multiple robotsanonymously They use multiple bearing-only observationsand a motion detector to localize the robots The robot detectoris a simple feature extractor that detects vertical cardboardsquares mounted atop each robot in the shadow zone of therange finder One of oldest works come from Karazume etal [13] where they focus on using cooperative localizationas a substitute to dead reckoning by suggesting a ldquodancerdquo inwhich robots act as mobile landmarks Although they do notuse egomotion but instead assume that position of two robotsare known while localizing the third robot Table I summarizesa few closely related works with emphasis on how our workis different different from each of them Rest of the sectiondiscusses those in detail

Howard et al [14] coined the CLAM (Cooperative Localiza-tion and Mapping) where they concluded that as an observerrobot observes the explorer robot it improves the localizationof robots by the new constraints of observer to explorerdistance Recognizing that odometry errors can cumulate overtime they suggest using constraints based on cooperativelocalization to refine the position estimates Their approachhowever do not utilizes the merits of mutual observation asthey propose that one robot explores the world and otherrobot watches We show in our experiments by comparisonto ARToolKit [1] and Bundler [2] that mutual observations ofrobots can be up to 10 times more accurate than observationsby single robot

A number of groups have considered cooperative visionand laser based mapping in outdoor environments [15] [16]and vision only [17] [18] Localization and mapping usingheterogeneous robot teams with sonar sensors is examinedextensively by [19] [20] Using more than one robot enableseasier identification of previously mapped locations simplify-ing the loop-closing problem [21]

Fox et al [22] propose cooperative localization based onMonte-Carlo localization technique The method uses odome-

Related work Tags NoEM BO NoSLAM MOMutual localization X X X XHoward et al [14] 7 X X XZou and Tan [10] X X 7 7Cognetti et al [3] 7 X X 7Trawny et al [5] 7 X X XZhou and Roumeliotis [6] [7] 7 X X XRoumeliotis et al [24] 7 7 7 X

where

Tag meaningNoEM Without Ego-Motion All those works that use egomo-

tion are marked as 7BO Localization using bearing only measurements No

depth measurements required All those works thatrequire depth measurements are marked with 7

NoSLAM SLAM like tight coupling Inaccuracy in mappingleads to cumulating interdependent errors in localiza-tion and mapping All those works that use SLAM likeapproach are marked with a 7

MO Utilizes mutual observation which is more accuratethan one-sided observations All those works that donot use mutual observation and depend on one-sidedobservations are marked as 7

Table I Comparison of related work with Mutual localization

try measurements for ego motion Chang et al [23] uses depthand visual sensors to localize Nao robots in the 2D groundplane Roumeliotis and Bekey [24] focus on sharing sensordata across robots employing as many sensors as possiblewhich include odometry and range sensors Rekleitis et al[25] provide a model of robots moving in 2D equipped withboth distance and bearing sensors

Zou and Tan [10] proposed a cooperative simultaneous lo-calization and mapping method CoSLAM in which multiplerobots concurrently observe the same scene Correspondencesin time (for each robot) and across robots are fed into anextended Kalman filter and used to simultaneously solve thelocalization and mapping problem However this and otherldquoco-slamrdquo approaches such as [26] remain limited due to theinterdependence of localization and mapping variables errorsin the map are propagated to localization and vice versa

Recently Zhou and Roumeliotis [6] [7] have publishedsolution of a set of 14 minimal solutions that covers a widerange of robot to robot measurements However they useegomotion for their derivation and they assume that observablefiducial markers coincide with the optical center of the cameraOur work does not make any of the two assumptions

III PROBLEM FORMULATION

We use the following notation in this paper see Fig 1Cp and Cq represent two robots each with a camera as asensor The corresponding coordinate frames are p and qrespectively with origin at the optical center of the cameraFiducial markers M1 and M2 are fixed on robot Cq andhence their positions are known in frame q as q1q2 isin R3Similarly p3p4 isin R3 are the positions of markers M3

and M4 in coordinate frame p Robots are positioned suchthat they can observe each others markers in their respectivecamera sensors The 2D image coordinates of the markers M1

and M2 in the image captured by the camera p are measuredas p1 p2 isin R2 and that of M3 and M4 is q3 q4 isin R2

in camera q Let KpKq isin R3times3 be the intrinsic cameramatrices of the respective camera sensors on robot Cp Cq Also note the superscript notation 2D image coordinates aredenoted by a bar example p Unit vectors that provide bearinginformation are denoted by a caret example p

Since the real life images are noisy the measured imagepositions pi and qi will differ from the actual positions pi0and qi0 by gaussian noise ηi

pi = pi0 + ηpi foralli isin 1 2 (1)qi = qi0 + ηqi foralli isin 3 4 (2)

The problem is to determine the rotation R isin R3times3 andtranslation t isin R3 from frame p to frame q such thatany point pi in frame p is related to its corresponding pointqi in frame q by the following equation

qi = Rpi + t (3)

The actual projections of markers Mi into the camera imageframes of the other robot are governed by following equations

pi0 = f(KpRminus1(qi minus t)) foralli isin 1 2 (4)

qi0 = f(Kq(Rpi + t)) foralli isin 3 4 (5)

where f is the projection function defined over a vectorv =

[vx vy vz

]gtas

f(v) =[ vxvzvyvz

]gt(6)

To minimize the effect of noise we must compute the optimaltransformation Rlowast and tlowast

(Rlowast tlowast) = arg min(Rt)

sumiisin12

pi minus f(KpRminus1(qi minus t))2

+sum

iisin34

qi minus f(Kq(Rpi + t))2 (7)

To solve this system of equations we start with exactequations that lead to a large number of polynomial rootsTo choose the best root among the set of roots we use theabove minimization criteria

Let pi qi isin R3 be the unit vectors drawn from the camerarsquosoptical center to the image projection of the markers Theunit vectors can be computed from the position of markers incamera images pi qi by the following equations

pi =Kminus1

p

[pgti 1

]gtKminus1

p

[pgti 1

]gtforalli isin 1 2 (8)

qi =Kminus1

q

[qgti 1

]gtKminus1

q

[qgti 1


Further let s1 s2 be the distances of markers M1 M2 fromthe optical center of the camera sensor in robot Cp And s3s4 be the distances of markers M3 M4 from the optical center

of camera sensor in robot Cq Then the points q1 q2 s3q3s4q4 in coordinate frame q correspond to the points s1p1s2p2 p3 p4 in coordinate frame p

q1 = t+ s1Rp1

q2 = t+ s2Rp2

s3q3 = t+Rp3

s4q4 = t+Rp4

(10)

These four vector equations provide us 12 constraints (threefor each coordinate in 3D) for our 10 unknowns (3 for rotationR 3 for translation t and 4 for si) We first consider only thefirst three equations which allows an exact algebraic solutionof the nine unknowns from the nine constraints

Our approach to solving the system is inspired by the wellstudied problem of Perspective-3-points [12] also known asspace resection [11] However note that the method cannot bedirectly applied to our problem as known points are distributedin both coordinate frames as opposed to the space resectionproblem where all the known points are in the one coordinateframe

The basic flow steps of our approach are to first solve for thethree range factors s1 s2 and s3 (Section III-A) Then we setup a classical absolute orientation system on the rotation andtranslation (Section III-B) which is solved using establishedmethods such as Arun et al [27] or Horn [28] finally since ouralgebraic solution will give rise to many candidate roots wedevelop a root-filtering approach to determine the best solution(Section III-C)

A Solving for s1 s2 and s3

The first step is to solve the system for s1 s2 and s3 Weeliminate R and t by considering the inter-point distances inboth coordinate frames

s1p1 minus s2p2 = q1 minus q2s2p2 minus p3 = q2 minus s3q3p3 minus s1p1 = s3q3 minus q1

(11)

Squaring both sides and representing the vector norm asthe dot product gives the following system of polynomialequations

s21 + s22 minus 2s1s2pgt1 p2 minus q1 minus q22 = 0 (12a)

s22 minus s23 minus 2s2pgt2 p3 + 2s3q

gt2 q3 + p32 minus q22 = 0

(12b)


gt1 q3 + p32 minus q12 = 0

(12c)

This system has three quadratic equations implying a Bezoutbound of eight (23) solutions Using the Sylvester resultant wesequentially eliminate variables from each equation Rewriting

(12a) and (12b) as quadratics in terms of s2 gives

s22 + (minus2s1pgt1 p2)︸︷︷︸

a1

s2 + (s21 minus |q1 minus q2|2)︸︷︷︸a0

= 0

(13)

s22 + (minus2pgt2 p3)︸︷︷︸b1

s2 minus (s23 minus 2s3qgt2 q3 minus p32 + q22)︸︷︷︸

b0

= 0

(14)

The Sylvester determinant [29 p 123] of (13) and (14) is givenby the determinant of the matrix formed by the coefficients ofs2

r(s1 s3) =

∣∣∣∣∣∣∣∣1 a1 a0 00 1 a1 a01 b1 b0 00 1 b1 b0

∣∣∣∣∣∣∣∣ (15)

This determinant is a quartic function in s1 s3 By definitionof resultant the resultant is zero if and only if the parentequations have at least a common root [29] Thus we haveeliminated variable s2 from (12a) and (12b) We can repeatthe process for eliminating s3 by rewriting r(s1 s3) and (12c)as

r(s1 s3) = c4s43 + c3s

33 + c2s

23 + c1s3 + c0 = 0

minuss23 + (2qgt1 q3)︸︷︷︸d1

s3 + s21 minus 2s1pgt1 p3 + p32 minus q12︸︷︷︸

d0

= 0

(16)

The Sylvester determinant of (16) would be

r2(s1) =

∣∣∣∣∣∣∣∣∣∣∣∣

c4 c3 c2 c1 c0 00 c4 c3 c2 c1 c01 d1 d0 0 0 00 1 d1 d0 0 00 0 1 d1 d0 00 0 0 1 d1 d0

∣∣∣∣∣∣∣∣∣∣∣∣= 0 (17)

Solving (17) gives an 8 degree polynomial in s1 By Abel-Ruffini theorem [30 p 131] a closed-form solution of theabove polynomial does not exist

The numeric solution to (17) gives eight roots for s3 Wecompute s1 and s2 using (12c) and (12b) respectively Becausethe camera cannot see objects behind it only real positive rootsare maintained from the resultant solution set

B Solving for R and t

With the solutions for the scale factors s1 s2 s3 we cancompute the absolute location of the Markers M1M2M3in both the frames p and q

pi = sipi foralli isin 1 2qi = siqi foralli isin 3

These exact correspondences give rise to the classical problemof absolute orientation ie given three points in two coordinateframes find the relative rotation and translation between theframes For each positive root of s1 s2 s3 we use the methodin Arun et al [27] method (similar to Hornrsquos method [28]) tocompute the corresponding rotation R and translation value t

C Choosing the optimal root

Completing squares in (12) yields important informationabout redundant roots

(s1 + s2)2 minus 2s1s2(1 + pgt1 p2)minus q1 minus q22 = 0 (18a)

(s2 minus pgt2 p3)2 minus (s3 minus qgt2 q3)2

+ (p3 minus p2)gtp3 minus qgt2 (q2 minus q3) = 0

(18b)



(18c)

Equations (18) do not put any constraints on positivity ofterms (s2minuspgt2 p3) (s3minusqgt2 q3) (s1minuspgt1 p3) or (s3minusqgt1 q3)However all these terms are positive as long as the markersof the observed robot are farther from the camera than themarkers of the observing robot Also the distances si areassumed to be positive Assuming the above we filter the realroots by the following criteria

s1 ge p3 (19)s2 ge p3 (20)s3 ge max(q1 q2) (21)

These criteria not only reduce the number of roots signifi-cantly but also filter out certain degenerate cases

For all the filtered roots of (17) we compute the correspond-ing values of R and t choosing the best root that minimizesthe error function (7)

D Extension to more than three markers

Even though the system is solvable by only three markerswe choose to use four markers for symmetry We can fall backto the three marker solution in situations when one of themarkers is occluded Once we extend this system to 4 markerpoints we obtain 6 bivariate quadratic equations instead of thethree in (12) that can be reduced to three 8-degree univariatepolynomials The approach to finding the root with the leasterror is the same as described above

The problem of finding relative pose from five or moremarkers is better addressed by solving for the homographywhen two cameras observe the same set of points as done by[31]ndash[34] The difference for us is that the distance betweenthe points in both coordinate frames is known hence we canestimate the translation metrically which is not the case inclassical homography estimation Assuming the setup for fivepoints such that (10) becomes

q1 = t+ s1Rp1

q2 = t+ s2Rp2

s3q3 = t+Rp3

s4q4 = t+Rp4

s5q5 = t+Rp5

(22)

Markers

Camera

Figure 2 The deployment of markers on Turtlebot that weused for our experiments

If the essential matrix is E the setup is the same as solvingfor

[q1q2 q3 q4 q5]gtE[p1 p2p3p4p5] = 0 (23)

The scale ambiguity of the problem can be resolved by one ofthe distance relations from (11) Please refer to [32] for solving(23) For more points refer to [35] for the widely known 7-point and linear 8-point algorithms

IV IMPLEMENTATION

We implement our algorithm on two Turtlebots with fiducialmarkers One of the Turtlebots with markers is shown inFig 2 We have implemented the algorithm in Python usingthe Sympy [36] OpenCV [37] and Numpy [38] libraries Asthe implementing software formulates and solves polynomialssymbolically it is generic enough to handle any reasonablenumber of points in two camera coordinate frames We havetested the solver for the following combination of points 0-31-2 2-2 where 1-2 means that 1 point is known in the firstcoordinate frame and 2 points are known in the second

We use blinking lights as fiducial markers on the robots andbarcode-like cylindrical markers as for the 3D reconstructionexperiment

The detection of blinking lights follows a simple thresh-olding strategy on the time differential of images This ap-proach coupled with decaying confidence tracking producessatisfactory results for simple motion of robots and relativelystatic backgrounds Fig 3 shows the cameras mounted withblinking lights as fiducial markers The robots shown in 3are also mounted with ARToolKit [1] fiducial markers for thecomparison experiments

V EXPERIMENTS

To assess the accuracy of our method we perform a lo-calization experiment in which we measure how accuratelyour method can determine the pose of the other camera We

Median Trans error Median Rotation errorARToolKit [1] 057m 92

Bundler [2] 020m 0016

Mutual Localization 0016m 033

Table II Table showing mean translation and rotation error forARToolKit Bundler and Mutual Localization

compare our localization results with the widely used fiducial-based pose estimation in ARToolKit [1] and visual egomotionand SfM framework Bundler [2] We also generate a semi-dense reconstruction to compare the mapping accuracy of ourmethod to that of Bundler A good quality reconstruction isa measure of the accuracy of mutual localization of the twocameras used in the reconstruction

A Localization Experiment

a) Setup Two turtlebots were set up to face each otherOne of the turtlebot was kept stationary and the other movedin 1 ft increments in an X-Z plane (Y-axis is down Z-axisis along the optical axis of the static camera and the X-axis is towards the right of the static camera) We calculatethe rotation error by extracting the rotation angle from thedifferential rotation RgtgtRest as follows

Eθ =180

πarccos

(Tr(RgtgtRest)minus 1

2

)(24)

where Rgt is the ground truth rotation matrix Rest is theestimated rotation matrix and Tr is the matrix trace Thetranslation error is simply the norm difference between twotranslation vectors

b) Results in comparison with ARToolKit [1] The AR-ToolKit is an open source library for detecting and determiningthe pose of fiducial markers from video We use a ROS [39]wrapper ndash ar_pose ndash over ARToolKit for our experimentsWe repeat the relative camera localization experiment withthe ARToolKit library and compare to our results The resultsshow a tenfold improvement in translation error over Bundler[2]

B Simulation experiments with noise

A simple scene was constructed in Blender to verify themathematical correctness of the method Two cameras wereset up in the blender scene along with a target object 1m fromthe static camera Camera images were rendered at a resolutionof 960 times 540 The markers were simulated as colored ballsthat were detected by simple hue based thresholding The twocameras in the simulated scene were rotated and translated tocover maximum range of motion After detection of the centerof the colored balls zero mean gaussian noise was added tothe detected positions to investigate the noise characteristicsof our method The experiment was repeated with differentvalues of noise covariance Fig 6 shows the translation androtation error in the experiment with variation in noise It canbe seen that our method is robust to noise as it deviates onlyby 5cm and 25 when tested with noise of up to 10 pixels

Static Camera

Cp1

Static Camera

Mobile Camera

Cp0

Static Camera

Mobile CameraCp0

Cp1

Cq Cq Cq

Figure 3 Diagram of the two camera setup for mutual localization 3D metric reconstruction along with images from eachcamera for two poses of the mobile camera Cameras have distinctive cylindrical barcode-like markers to aid detection in eachothers image frames Also depicted is the triangulation to two example feature points

minus10 minus05 00 05 10X (meters)

00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)

ARTookitMutual LocalizationBundler

08 12 16 20 24 28Z (meters)

00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)

Figure 4 Translation error comparison between the ARToolKitand our mutual localization The translation error is plotted toground truth X and Z axis positions to show how error varieswith depth (Z) and lateral (X) movements We get better resultsin localization by a factor of ten Also note how the translationerror increases with Z-axis (inter-camera separation)


minus10

0

10

20

30

40

50

60

Rot

atio

nE

rror

(deg

rees

)

ARToolkitMutual LocalizationBundler

08 12 16 20 24 28Z (meters)

minus10

0

10

20

30

40

50

60

70

Rot

atio

nE

rror

(deg

rees

)

Figure 5 Rotation error comparison between the ARToolKitand Mutual localization Rotation error decreases with Z-axis(ground truth inter-camera separation) See (24) for computa-tion of rotation error

C 3D Reconstruction experiment

The position and orientation obtained from our methodis inputted into the patch based multi-view stereo (PMVS-2) library [40] to obtain a semi-dense reconstruction of an

2 4 6 8 10Noise (pixels)

000

001

002

003

004

005

006Tr

ansl

atio

nE

rror

(m)


00

05

10

15

20

25

30

Rot

atio

nE

rror

(deg

rees

)

Figure 6 Rotation and translation error as noise is incremen-tally added to the detection of markers

indoor environment Our reconstruction is less noisy whencompared to that obtained by Bundler [2] Fig 7 shows a side-by-side snapshot of the semi-dense map from Bundler-PMVSand our method Mutual Localization-PMVS To compare thereconstruction accuracy we captured the scene as a pointcloud with an RGB-D camera (Asus-Xtion) The Bundlerand Mutual Localization output point clouds were manuallyaligned (and scaled) to the Asus-Xtion point cloud We thencomputed the nearest neighbor distance from each point inthe BundlerMutual localization point clouds discarding pointswith nearest neighbors further than 1m as outliers With thismetric the mean nearest neighbor distance for our method was0176m while that for Bundler was 0331m

VI CONCLUSION

We have developed a method to cooperatively localize twocameras using fiducial markers on the cameras in sensor-specific coordinate frames obviating the common assumptionof sensor egomotion We have compared our results with theARToolKit showing that our method can localize significantlymore accurately with a tenfold error reduction observed in ourexperiments We have also demonstrated how the cooperativelocalization can be used as an input for 3D reconstruction ofunknown environments and find better accuracy (018m versus033m) than the visual egomotion-based Bundler method Weplan to build on this work and apply it to multiple robots forcooperative mapping Though we achieve reasonable accuracywe believe we can improve the accuracy of our method byimproving camera calibration and measurement of the fiducialmarker locations with respect to the camera optical center

We will release the source code (open-source) for our methodupon publication

ACKNOWLEDGMENTS

This material is based upon work partially supported by theFederal Highway Administration under Cooperative Agree-ment No DTFH61-07-H-00023 the Army Research Office(W911NF-11-1-0090) and the National Science FoundationCAREER grant (IIS-0845282) Any opinions findings con-clusions or recommendations are those of the authors and donot necessarily reflect the views of the FHWA ARO or NSF

REFERENCES

[1] H Kato and M Billinghurst ldquoMarker tracking and HMD calibration fora video-based augmented reality conferencing systemrdquo in Proceedingsof the 2nd IEEE and ACM International Workshop on Augmented Reality(IWAR 99) Oct 1999

[2] N Snavely S Seitz and R Szeliski ldquoPhoto tourism exploring photocollections in 3Drdquo in ACM Transactions on Graphics (TOG) vol 25no 3 ACM 2006 pp 835ndash846

[3] M Cognetti P Stegagno A Franchi G Oriolo and H Bulthoff ldquo3-Dmutual localization with anonymous bearing measurementsrdquo in Roboticsand Automation (ICRA) 2012 IEEE International Conference on may2012 pp 791 ndash798

[4] A Franchi G Oriolo and P Stegagno ldquoMutual localization in a multi-robot system with anonymous relative position measuresrdquo in IntelligentRobots and Systems 2009 IROS 2009 IEEERSJ International Confer-ence on IEEE 2009 pp 3974ndash3980

[5] N Trawny X Zhou K Zhou and S Roumeliotis ldquoInterrobot trans-formations in 3-Drdquo Robotics IEEE Transactions on vol 26 no 2 pp226ndash243 2010

[6] X S Zhou and S I Roumeliotis ldquoDetermining the robot-to-robot 3Drelative pose using combinations of range and bearing measurements14 minimal problems and closed-form solutions to three of themrdquo inIntelligent Robots and Systems (IROS) 2010 IEEERSJ InternationalConference on IEEE 2010 pp 2983ndash2990

[7] mdashmdash ldquoDetermining 3-D relative transformations for any combinationof range and bearing measurementsrdquo Robotics IEEE Transactions onvol PP no 99 pp 1ndash17 2012

[8] A J Davison I D Reid N D Molton and O Stasse ldquoMonoslamReal-time single camera slamrdquo Pattern Analysis and Machine Intelli-gence IEEE Transactions on vol 29 no 6 pp 1052ndash1067 2007

[9] A Martinelli ldquoVision and IMU data fusion Closed-form solutions forattitude speed absolute scale and bias determinationrdquo Robotics IEEETransactions on no 99 pp 1ndash17 2012

[10] D Zou and P Tan ldquoCoSLAM Collaborative visual SLAM in dynamicenvironmentsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence 2012

[11] L Quan and Z Lan ldquoLinear n-point camera pose determinationrdquo PatternAnalysis and Machine Intelligence IEEE Transactions on vol 21 no 8pp 774ndash780 1999

[12] B Haralick C Lee K Ottenberg and M Noumllle ldquoReview and analysisof solutions of the three point perspective pose estimation problemrdquoInternational Journal of Computer Vision vol 13 no 3 pp 331ndash3561994

[13] R Kurazume S Nagata and S Hirose ldquoCooperative positioning withmultiple robotsrdquo in Robotics and Automation 1994 Proceedings 1994IEEE International Conference on may 1994 pp 1250 ndash1257 vol2

[14] A Howard and L Kitchen ldquoCooperative localisation and mappingrdquoin International Conference on Field and Service Robotics (FSR99)Citeseer 1999 pp 92ndash97

[15] R Madhavan K Fregene and L Parker ldquoDistributed cooperativeoutdoor multirobot localization and mappingrdquo Autonomous Robotsvol 17 pp 23ndash39 2004

[16] J Ryde and H Hu ldquoMutual localization and 3D mapping by cooperativemobile robotsrdquo in Proceedings of International Conference on IntelligentAutonomous Systems (IAS) The University of Tokyo Tokyo Japan Mar2006

[17] J Little C Jennings and D Murray ldquoVision-based mapping withcooperative robotsrdquo in Sensor Fusion and Decentralized Control inRobotic Systems vol 3523 October 1998 pp 2ndash12

(a) Bundler-PMVS (b) Mutual Localization-PMVS (c) Actual scene

Figure 7 The semi-dense reconstruction produced by our method Mutual Localization is less noisy (018m) when comparedto that produced by Bundler (033m)

[18] R Rocha J Dias and A Carvalho ldquoCooperative multi-robot systems astudy of vision-based 3-D mapping using information theoryrdquo Roboticsand Autonomous Systems vol 53 pp 282ndash311 April 2005

[19] R Grabowski and P Khosla ldquoLocalization techniques for a team ofsmall robotsrdquo in Proceedings of the IEEERSJ International Conferenceon Intelligent Robots and Systems (IROS) 2001

[20] P Khosla R Grabowski and H Choset ldquoAn enhanced occupancy mapfor exploration via pose separationrdquo in Proceedings of the IEEERSJInternational Conference on Intelligent Robots and Systems (IROS)2003

[21] K Konolige and S Gutmann ldquoIncremental mapping of large cyclicenvironmentsrdquo International Symposium on Computer Intelligence inRobotics and Automation (CIRA) pp 318ndash325 2000

[22] D Fox W Burgard H Kruppa and S Thrun ldquoCollaborative multi-robot localizationrdquo in KI-99 Advances in Artificial Intelligence serLecture Notes in Computer Science W Burgard A Cremers andT Cristaller Eds Springer Berlin Heidelberg 1999 vol 1701 pp698ndash698

[23] C-H Chang S-C Wang and C-C Wang ldquoVision-based cooperativesimultaneous localization and trackingrdquo in Robotics and Automation(ICRA) 2011 IEEE International Conference on may 2011 pp 5191ndash5197

[24] S Roumeliotis and G Bekey ldquoDistributed multirobot localizationrdquoRobotics and Automation IEEE Transactions on vol 18 no 5 pp781 ndash 795 oct 2002

[25] L M Rekleitis G Dudek and E E Milios ldquoMulti-robot explorationof an unknown environment efficiently reducing the odometry errorrdquo inIn Proc of the International Joint Conference on Artificial Intelligence(IJCAI) 1997 pp 1340ndash1345

[26] G-H Kim J-S Kim and K-S Hong ldquoVision-based simultaneouslocalization and mapping with two camerasrdquo in Intelligent Robots andSystems 2005 (IROS 2005) 2005 IEEERSJ International Conferenceon aug 2005 pp 1671 ndash 1676

[27] K Arun T Huang and S Blostein ldquoLeast-squares fitting of two 3-Dpoint setsrdquo Pattern Analysis and Machine Intelligence IEEE Transac-tions on no 5 pp 698ndash700 1987

[28] B Horn ldquoClosed-form solution of absolute orientation using unitquaternionsrdquo JOSA A vol 4 no 4 pp 629ndash642 1987

[29] V Bykov A Kytmanov M Lazman and M Passare Eliminationmethods in polynomial computer algebra Kluwer Academic Pub 1998vol 448

[30] E Barbeau Polynomials ser Problem Books in Mathematics Springer2003

[31] H Steweacutenius C Engels and D Nisteacuter ldquoRecent developments on directrelative orientationrdquo ISPRS Journal of Photogrammetry and RemoteSensing vol 60 no 4 pp 284 ndash 294 2006 [Online] AvailablehttpwwwsciencedirectcomsciencearticlepiiS092427160600030X

[32] D Nister ldquoAn efficient solution to the five-point relative pose problemrdquoPattern Analysis and Machine Intelligence IEEE Transactions onvol 26 no 6 pp 756ndash770 2004

[33] J Philip ldquoA non-iterative algorithm for determining all essentialmatrices corresponding to five point pairsrdquo The Photogrammetric

Record vol 15 no 88 pp 589ndash599 1996 [Online] Availablehttpdxdoiorg1011110031-868X00066

[34] H Longuet-Higgins ldquoA computer algorithm for reconstructing a scenefrom two projectionsrdquo Readings in Computer Vision Issues ProblemsPrinciples and Paradigms MA Fischler and O Firschein eds pp 61ndash62 1987

[35] R Hartley and A Zisserman Multiple view geometry in computer visionCambridge Univ Press 2000 vol 2

[36] O Certik et al ldquoSympy python library for symbolicmathematicsrdquo Technical report (since 2006) httpcode googlecompsympy(accessed November 2009) Tech Rep 2008

[37] G Bradski ldquoThe opencv libraryrdquo Doctor Dobbs Journal vol 25 no 11pp 120ndash126 2000

[38] N Developers ldquoScientific computing tools for python-numpyrdquo 2010[39] M Quigley B Gerkey K Conley J Faust T Foote J Leibs E Berger

R Wheeler and A Ng ldquoROS an open-source robot operating systemrdquoin ICRA workshop on open source software vol 3 no 32 2009

[40] Y Furukawa and J Ponce ldquoAccurate dense and robust multiview stere-opsisrdquo Pattern Analysis and Machine Intelligence IEEE Transactionson vol 32 no 8 pp 1362ndash1376 2010

Introduction

Related Work

Problem Formulation

Solving for s1 s2 and s3

Solving for R and t

Choosing the optimal root

Extension to more than three markers

Implementation

Experiments

Localization Experiment

Simulation experiments with noise

3D Reconstruction experiment

Conclusion

References

on only two robot case as an extension to multi-robot case aspairwise localization is trivial yet practically effective

Our derivation although inspired by the linear pose es-timation method of Quan and Lan [11] is novel since allrelevant past works we know on P3P problem [12] assume allobservations are made in one coordinate frame and observedpoints in the other In contrast our method makes no such as-sumption and concurrently solves the pose estimation problemfor landmarks sensed in camera-specific coordinates frames

We demonstrate the effectiveness of our method by analyz-ing its accuracy in both synthetic which affords quantitativeabsolute assessment and real localization situations by deploy-ment on Turtlebots We use 3D reconstruction experimentsto show the accuracy of our algorithm Our experimentsdemonstrate the effectiveness of the proposed approach

II RELATED WORK

Cooperative localization has been extensively studied andapplied to various applications One of the latest works in thisarea comes from Cognetti et al [3] [4] where they focuson the problem of cooperatively localizing multiple robotsanonymously They use multiple bearing-only observationsand a motion detector to localize the robots The robot detectoris a simple feature extractor that detects vertical cardboardsquares mounted atop each robot in the shadow zone of therange finder One of oldest works come from Karazume etal [13] where they focus on using cooperative localizationas a substitute to dead reckoning by suggesting a ldquodancerdquo inwhich robots act as mobile landmarks Although they do notuse egomotion but instead assume that position of two robotsare known while localizing the third robot Table I summarizesa few closely related works with emphasis on how our workis different different from each of them Rest of the sectiondiscusses those in detail

Howard et al [14] coined the CLAM (Cooperative Localiza-tion and Mapping) where they concluded that as an observerrobot observes the explorer robot it improves the localizationof robots by the new constraints of observer to explorerdistance Recognizing that odometry errors can cumulate overtime they suggest using constraints based on cooperativelocalization to refine the position estimates Their approachhowever do not utilizes the merits of mutual observation asthey propose that one robot explores the world and otherrobot watches We show in our experiments by comparisonto ARToolKit [1] and Bundler [2] that mutual observations ofrobots can be up to 10 times more accurate than observationsby single robot

A number of groups have considered cooperative visionand laser based mapping in outdoor environments [15] [16]and vision only [17] [18] Localization and mapping usingheterogeneous robot teams with sonar sensors is examinedextensively by [19] [20] Using more than one robot enableseasier identification of previously mapped locations simplify-ing the loop-closing problem [21]

Fox et al [22] propose cooperative localization based onMonte-Carlo localization technique The method uses odome-

Related work Tags NoEM BO NoSLAM MOMutual localization X X X XHoward et al [14] 7 X X XZou and Tan [10] X X 7 7Cognetti et al [3] 7 X X 7Trawny et al [5] 7 X X XZhou and Roumeliotis [6] [7] 7 X X XRoumeliotis et al [24] 7 7 7 X

where

Tag meaningNoEM Without Ego-Motion All those works that use egomo-

tion are marked as 7BO Localization using bearing only measurements No

depth measurements required All those works thatrequire depth measurements are marked with 7

NoSLAM SLAM like tight coupling Inaccuracy in mappingleads to cumulating interdependent errors in localiza-tion and mapping All those works that use SLAM likeapproach are marked with a 7

MO Utilizes mutual observation which is more accuratethan one-sided observations All those works that donot use mutual observation and depend on one-sidedobservations are marked as 7

Table I Comparison of related work with Mutual localization

try measurements for ego motion Chang et al [23] uses depthand visual sensors to localize Nao robots in the 2D groundplane Roumeliotis and Bekey [24] focus on sharing sensordata across robots employing as many sensors as possiblewhich include odometry and range sensors Rekleitis et al[25] provide a model of robots moving in 2D equipped withboth distance and bearing sensors

Zou and Tan [10] proposed a cooperative simultaneous lo-calization and mapping method CoSLAM in which multiplerobots concurrently observe the same scene Correspondencesin time (for each robot) and across robots are fed into anextended Kalman filter and used to simultaneously solve thelocalization and mapping problem However this and otherldquoco-slamrdquo approaches such as [26] remain limited due to theinterdependence of localization and mapping variables errorsin the map are propagated to localization and vice versa

Recently Zhou and Roumeliotis [6] [7] have publishedsolution of a set of 14 minimal solutions that covers a widerange of robot to robot measurements However they useegomotion for their derivation and they assume that observablefiducial markers coincide with the optical center of the cameraOur work does not make any of the two assumptions

III PROBLEM FORMULATION

We use the following notation in this paper see Fig 1Cp and Cq represent two robots each with a camera as asensor The corresponding coordinate frames are p and qrespectively with origin at the optical center of the cameraFiducial markers M1 and M2 are fixed on robot Cq andhence their positions are known in frame q as q1q2 isin R3Similarly p3p4 isin R3 are the positions of markers M3

and M4 in coordinate frame p Robots are positioned suchthat they can observe each others markers in their respectivecamera sensors The 2D image coordinates of the markers M1






qi = Rpi + t (3)





[vx vy vz

]gtas

f(v) =[ vxvzvyvz

]gt(6)



sumiisin12


+sum

iisin34




pi =Kminus1

p

[pgti 1

]gtKminus1

p

[pgti 1


qi =Kminus1

q

[qgti 1

]gtKminus1

q

[qgti 1




q1 = t+ s1Rp1

q2 = t+ s2Rp2

s3q3 = t+Rp3

s4q4 = t+Rp4

(10)







(11)




gt2 q3 + p32 minus q22 = 0

(12b)


gt1 q3 + p32 minus q12 = 0

(12c)




a1


= 0

(13)



b0

= 0

(14)


r(s1 s3) =

∣∣∣∣∣∣∣∣1 a1 a0 00 1 a1 a01 b1 b0 00 1 b1 b0

∣∣∣∣∣∣∣∣ (15)


r(s1 s3) = c4s43 + c3s

33 + c2s

23 + c1s3 + c0 = 0



d0

= 0

(16)


r2(s1) =

∣∣∣∣∣∣∣∣∣∣∣∣


∣∣∣∣∣∣∣∣∣∣∣∣= 0 (17)












(18b)



(18c)








q1 = t+ s1Rp1

q2 = t+ s2Rp2

s3q3 = t+Rp3

s4q4 = t+Rp4

s5q5 = t+Rp5

(22)

Markers

Camera



[q1q2 q3 q4 q5]gtE[p1 p2p3p4p5] = 0 (23)


IV IMPLEMENTATION




V EXPERIMENTS



Bundler [2] 020m 0016






Eθ =180

πarccos


2

)(24)





Static Camera

Cp1

Static Camera

Mobile Camera

Cp0

Static Camera

Mobile CameraCp0

Cp1

Cq Cq Cq



00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)


08 12 16 20 24 28Z (meters)

00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)



minus10

0

10

20

30

40

50

60

Rot

atio

nE

rror

(deg

rees

)


08 12 16 20 24 28Z (meters)

minus10

0

10

20

30

40

50

60

70

Rot

atio

nE

rror

(deg

rees

)





000

001

002

003

004

005

006Tr

ansl

atio

nE

rror

(m)


00

05

10

15

20

25

30

Rot

atio

nE

rror

(deg

rees

)



VI CONCLUSION



ACKNOWLEDGMENTS


REFERENCES












































Introduction

Related Work

Problem Formulation


Solving for R and t



Implementation

Experiments




Conclusion

References






qi = Rpi + t (3)





[vx vy vz

]gtas

f(v) =[ vxvzvyvz

]gt(6)



sumiisin12


+sum

iisin34




pi =Kminus1

p

[pgti 1

]gtKminus1

p

[pgti 1


qi =Kminus1

q

[qgti 1

]gtKminus1

q

[qgti 1




q1 = t+ s1Rp1

q2 = t+ s2Rp2

s3q3 = t+Rp3

s4q4 = t+Rp4

(10)







(11)




gt2 q3 + p32 minus q22 = 0

(12b)


gt1 q3 + p32 minus q12 = 0

(12c)




a1


= 0

(13)



b0

= 0

(14)


r(s1 s3) =

∣∣∣∣∣∣∣∣1 a1 a0 00 1 a1 a01 b1 b0 00 1 b1 b0

∣∣∣∣∣∣∣∣ (15)


r(s1 s3) = c4s43 + c3s

33 + c2s

23 + c1s3 + c0 = 0



d0

= 0

(16)


r2(s1) =

∣∣∣∣∣∣∣∣∣∣∣∣


∣∣∣∣∣∣∣∣∣∣∣∣= 0 (17)












(18b)



(18c)








q1 = t+ s1Rp1

q2 = t+ s2Rp2

s3q3 = t+Rp3

s4q4 = t+Rp4

s5q5 = t+Rp5

(22)

Markers

Camera



[q1q2 q3 q4 q5]gtE[p1 p2p3p4p5] = 0 (23)


IV IMPLEMENTATION




V EXPERIMENTS



Bundler [2] 020m 0016






Eθ =180

πarccos


2

)(24)





Static Camera

Cp1

Static Camera

Mobile Camera

Cp0

Static Camera

Mobile CameraCp0

Cp1

Cq Cq Cq



00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)


08 12 16 20 24 28Z (meters)

00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)



minus10

0

10

20

30

40

50

60

Rot

atio

nE

rror

(deg

rees

)


08 12 16 20 24 28Z (meters)

minus10

0

10

20

30

40

50

60

70

Rot

atio

nE

rror

(deg

rees

)





000

001

002

003

004

005

006Tr

ansl

atio

nE

rror

(m)


00

05

10

15

20

25

30

Rot

atio

nE

rror

(deg

rees

)



VI CONCLUSION



ACKNOWLEDGMENTS


REFERENCES












































Introduction

Related Work

Problem Formulation


Solving for R and t



Implementation

Experiments




Conclusion

References



a1


= 0

(13)



b0

= 0

(14)


r(s1 s3) =

∣∣∣∣∣∣∣∣1 a1 a0 00 1 a1 a01 b1 b0 00 1 b1 b0

∣∣∣∣∣∣∣∣ (15)


r(s1 s3) = c4s43 + c3s

33 + c2s

23 + c1s3 + c0 = 0



d0

= 0

(16)


r2(s1) =

∣∣∣∣∣∣∣∣∣∣∣∣


∣∣∣∣∣∣∣∣∣∣∣∣= 0 (17)












(18b)



(18c)








q1 = t+ s1Rp1

q2 = t+ s2Rp2

s3q3 = t+Rp3

s4q4 = t+Rp4

s5q5 = t+Rp5

(22)

Markers

Camera



[q1q2 q3 q4 q5]gtE[p1 p2p3p4p5] = 0 (23)


IV IMPLEMENTATION




V EXPERIMENTS



Bundler [2] 020m 0016






Eθ =180

πarccos


2

)(24)





Static Camera

Cp1

Static Camera

Mobile Camera

Cp0

Static Camera

Mobile CameraCp0

Cp1

Cq Cq Cq



00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)


08 12 16 20 24 28Z (meters)

00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)



minus10

0

10

20

30

40

50

60

Rot

atio

nE

rror

(deg

rees

)


08 12 16 20 24 28Z (meters)

minus10

0

10

20

30

40

50

60

70

Rot

atio

nE

rror

(deg

rees

)





000

001

002

003

004

005

006Tr

ansl

atio

nE

rror

(m)


00

05

10

15

20

25

30

Rot

atio

nE

rror

(deg

rees

)



VI CONCLUSION



ACKNOWLEDGMENTS


REFERENCES












































Introduction

Related Work

Problem Formulation


Solving for R and t



Implementation

Experiments




Conclusion

References

Markers

Camera



[q1q2 q3 q4 q5]gtE[p1 p2p3p4p5] = 0 (23)


IV IMPLEMENTATION




V EXPERIMENTS



Bundler [2] 020m 0016






Eθ =180

πarccos


2

)(24)





Static Camera

Cp1

Static Camera

Mobile Camera

Cp0

Static Camera

Mobile CameraCp0

Cp1

Cq Cq Cq



00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)


08 12 16 20 24 28Z (meters)

00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)



minus10

0

10

20

30

40

50

60

Rot

atio

nE

rror

(deg

rees

)


08 12 16 20 24 28Z (meters)

minus10

0

10

20

30

40

50

60

70

Rot

atio

nE

rror

(deg

rees

)





000

001

002

003

004

005

006Tr

ansl

atio

nE

rror

(m)


00

05

10

15

20

25

30

Rot

atio

nE

rror

(deg

rees

)



VI CONCLUSION



ACKNOWLEDGMENTS


REFERENCES












































Introduction

Related Work

Problem Formulation


Solving for R and t



Implementation

Experiments




Conclusion

References

Static Camera

Cp1

Static Camera

Mobile Camera

Cp0

Static Camera

Mobile CameraCp0

Cp1

Cq Cq Cq



00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)


08 12 16 20 24 28Z (meters)

00

05

10

15

20

Tran

slat

ion

Err

or(m

eter

s)



minus10

0

10

20

30

40

50

60

Rot

atio

nE

rror

(deg

rees

)


08 12 16 20 24 28Z (meters)

minus10

0

10

20

30

40

50

60

70

Rot

atio

nE

rror

(deg

rees

)





000

001

002

003

004

005

006Tr

ansl

atio

nE

rror

(m)


00

05

10

15

20

25

30

Rot

atio

nE

rror

(deg

rees

)



VI CONCLUSION



ACKNOWLEDGMENTS


REFERENCES












































Introduction

Related Work

Problem Formulation


Solving for R and t



Implementation

Experiments




Conclusion

References


000

001

002

003

004

005

006Tr

ansl

atio

nE

rror

(m)


00

05

10

15

20

25

30

Rot

atio

nE

rror

(deg

rees

)



VI CONCLUSION



ACKNOWLEDGMENTS


REFERENCES












































Introduction

Related Work

Problem Formulation


Solving for R and t



Implementation

Experiments




Conclusion

References



























Introduction

Related Work

Problem Formulation


Solving for R and t



Implementation

Experiments




Conclusion

References

Mutual Localization: Two Camera Relative 6-DOF Pose ...jryde/...mutual_localization.pdf · Mutual...

Documents

Transcript of Mutual Localization: Two Camera Relative 6-DOF Pose ...jryde/...mutual_localization.pdf · Mutual...