ICRA 2015 interactive presentation

18
PAUL STURGESS AND SUNANDO SENGUPTA OXFORD BROOKES UNIVERSITY ICRA 2015 Semantic Octree: Unifying Recognition, Reconstruction and Representation via an Octree Constrained Higher Order MRF *Joint First Author, {paul.sturgess.cv,sunando.sengupta}@gmail.com

Transcript of ICRA 2015 interactive presentation

Page 1: ICRA 2015 interactive presentation

PAUL STURGESS AND SUNANDO SENGUPTAOXFORD BROOKES UNIVERSITY

ICRA 2015

Semantic Octree: Unifying Recognition, Reconstruction and

Representation via an Octree Constrained Higher Order MRF

*Joint First Author, {paul.sturgess.cv,sunando.sengupta}@gmail.com

Page 2: ICRA 2015 interactive presentation

Semantic Octree

Recognition Structured Prediction widely adopted in vision: AHRF[1]

Efficiency of the outputted structure is not the focus.Reconstruction

Octree widely adopted in robotics: Octomap[2]

Incorporating high level semantic information is not the focusUnifying Representation

Complementary to recognition and reconstruction. Efficient for further manipulations of underlying data.

Combine Octomap and AHRF to get best of both

2

[1] P. Kohli et at. Robust Higher Order Potentials for Enforcing Label Consistency[2] O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.

Page 3: ICRA 2015 interactive presentation

Recognition3

● AHRF - Associative Higher-order Random Fields Framework.

● Multi-resolution approach to Semantic image segmentation.● Efficient and bounded inference with alpha-expansion.

Page 4: ICRA 2015 interactive presentation

Reconstruction4

The main elements of a occupancy based scene reconstruction are: Occupied: Objects present in the world, Free: required for collision avoidance, path planning. Unmapped: unknown areas in the scene need to be avoided.

Page 5: ICRA 2015 interactive presentation

Representation5

• Efficient access to, and manipulation of, 3D object models are at the heart of robotics. o Point clouds, Mesh---cannot map free and unknown area.o Stixels/Height maps/2.5D---one height value in a 2D grid and free

area not accurately mapped.o Fixed sized grid of voxels---Voxels not indexed which makes it �

inefficient• Octree based volumetric representationo Represents accurately 3d space, efficient indexing of volume

Image courtesy: O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.

Page 6: ICRA 2015 interactive presentation

Semantic Octree - framework6

Input stereo images

Chap 6, Sec 6.3

Page 7: ICRA 2015 interactive presentation

Semantic Octree - framework7

Generate point clouds and class hypothesis for every pixel

Chap 6, Sec 6.3

Page 8: ICRA 2015 interactive presentation

Semantic Octree - framework8

Fuse into an octree through estimated camera

Octree – each volume subdivided in 8 sub-volumesLeaf- nodes (xi) are the smallest sized voxelsAny internal node (xc) gives a natural grouping of 3D space

Chap 6, Sec 6.3

Page 9: ICRA 2015 interactive presentation

Semantic Octree - framework9

Perform inference over 3D voxels to give labelled scene.

Chap 6, Sec 6.3

Page 10: ICRA 2015 interactive presentation

CRF graph on Octree voxels10

Octree divides the space into subvolumes indexed through tree with nodes τint : Internal nodes in the tree (xc) τleaf : leaf level voxels (xi)

Random variable for every leaf voxel Every internal node is associated with a set of leaf voxels

resulting in a cliqueLabel set defined asFinal energy :

Page 11: ICRA 2015 interactive presentation

Octree Volume update All voxels initially set unknown and occupancy probability P(xi) = 0.5 and

log odds

For each 3D point (obtained from stereo pairs), voxels’ log odds updated in a ray casting manner

Log odds are updated for all 3D points for every stereo pairs Final occupancy probability obtained as

Unary score for leaf voxels11

Chap 6, Sec 6.3.1

Page 12: ICRA 2015 interactive presentation

Each occupied voxel xi is associated with a set of 3D ptsThe corresponding image pixels denoted asPixel scores combined togetherGiven the initial occupancy P(xi), the unary is given as:

Thus, for every initially estimated occupied voxels have low cost for free label and vice verca

Unary score for leaf voxels12

Chap 6, Sec 6.3.1

Page 13: ICRA 2015 interactive presentation

Robust PN potential applied over hierarchical groupings of voxels Penalise label inconsistency within the grouping of voxels

Takes the form

Maximum cost truncated to ϒmax

Grouping of voxels correspond to internals nodes in the octree

Hierarchical tree potential13

Chap 6, Sec 6.3.2

Page 14: ICRA 2015 interactive presentation

Experiments14

Octree defined of 16 levels

Smallest resolution of voxels = (8x8x8)cm3

Maximum mapped volume (216 x 8 )3cm 5.24km3

Hierarchical grouping of voxels corresponding to internal nodes 13-15 considered

Page 15: ICRA 2015 interactive presentation

Results15

Higherarchial grouping while inference vs leaf level voxel labelling (much sparser)

Chap 6, Sec 6.4

Page 16: ICRA 2015 interactive presentation

Quantitative evaluation : Performed by projecting into image domain

Observations Small objects tend to get decimated due to octree quantization while mesh

based representation better in representing surface.

Results16

[1] Sengupta et.al. “Urban 3d semantic modelling using stereo vision,” in ICRA, 2013[2] Valentin, et. al , “Mesh based semantic modelling for indoor and outdoor scenes,” in CVPR, 2013

[2][1][1]

[2][1][1]

Page 17: ICRA 2015 interactive presentation

Occupancy mapping17

Grouping of voxels hierarchically increases the occupied volume reducing the sparsity

Page 18: ICRA 2015 interactive presentation

Conclusion18

● Proposed a method which performs reconstruction in an efficient representation aided by semantics of the scene

● Combined AHRF and Octomap to get best of both

● Some Future Applications○ Scene interaction and manipulation.○ Collision detection, with known object types.○ Path Planning with known affordances.