ICRA 2015 interactive presentation

PAUL STURGESS AND SUNANDO SENGUPTAOXFORD BROOKES UNIVERSITY

ICRA 2015

Semantic Octree: Unifying Recognition, Reconstruction and

Representation via an Octree Constrained Higher Order MRF

*Joint First Author, {paul.sturgess.cv,sunando.sengupta}@gmail.com

Semantic Octree

Recognition Structured Prediction widely adopted in vision: AHRF[1]

Efficiency of the outputted structure is not the focus.Reconstruction

Octree widely adopted in robotics: Octomap[2]

Incorporating high level semantic information is not the focusUnifying Representation

Complementary to recognition and reconstruction. Efficient for further manipulations of underlying data.

Combine Octomap and AHRF to get best of both

2

[1] P. Kohli et at. Robust Higher Order Potentials for Enforcing Label Consistency[2] O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.

Recognition3

● AHRF - Associative Higher-order Random Fields Framework.

● Multi-resolution approach to Semantic image segmentation.● Efficient and bounded inference with alpha-expansion.

Reconstruction4

The main elements of a occupancy based scene reconstruction are: Occupied: Objects present in the world, Free: required for collision avoidance, path planning. Unmapped: unknown areas in the scene need to be avoided.

Representation5

• Efficient access to, and manipulation of, 3D object models are at the heart of robotics. o Point clouds, Mesh---cannot map free and unknown area.o Stixels/Height maps/2.5D---one height value in a 2D grid and free

area not accurately mapped.o Fixed sized grid of voxels---Voxels not indexed which makes it �

inefficient• Octree based volumetric representationo Represents accurately 3d space, efficient indexing of volume

Image courtesy: O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.

Semantic Octree - framework6

Input stereo images

Chap 6, Sec 6.3


Generate point clouds and class hypothesis for every pixel

Chap 6, Sec 6.3


Fuse into an octree through estimated camera

Octree – each volume subdivided in 8 sub-volumesLeaf- nodes (xi) are the smallest sized voxelsAny internal node (xc) gives a natural grouping of 3D space

Chap 6, Sec 6.3


Perform inference over 3D voxels to give labelled scene.

Chap 6, Sec 6.3

CRF graph on Octree voxels10

Octree divides the space into subvolumes indexed through tree with nodes τint : Internal nodes in the tree (xc) τleaf : leaf level voxels (xi)

Random variable for every leaf voxel Every internal node is associated with a set of leaf voxels

resulting in a cliqueLabel set defined asFinal energy :

Octree Volume update All voxels initially set unknown and occupancy probability P(xi) = 0.5 and

log odds

For each 3D point (obtained from stereo pairs), voxels’ log odds updated in a ray casting manner

Log odds are updated for all 3D points for every stereo pairs Final occupancy probability obtained as

Unary score for leaf voxels11

Chap 6, Sec 6.3.1

Each occupied voxel xi is associated with a set of 3D ptsThe corresponding image pixels denoted asPixel scores combined togetherGiven the initial occupancy P(xi), the unary is given as:

Thus, for every initially estimated occupied voxels have low cost for free label and vice verca

Unary score for leaf voxels12

Chap 6, Sec 6.3.1

Robust PN potential applied over hierarchical groupings of voxels Penalise label inconsistency within the grouping of voxels

Takes the form

Maximum cost truncated to ϒmax

Grouping of voxels correspond to internals nodes in the octree

Hierarchical tree potential13

Chap 6, Sec 6.3.2

Experiments14

Octree defined of 16 levels

Smallest resolution of voxels = (8x8x8)cm3

Maximum mapped volume (216 x 8 )3cm 5.24km3

Hierarchical grouping of voxels corresponding to internal nodes 13-15 considered

Results15

Higherarchial grouping while inference vs leaf level voxel labelling (much sparser)

Chap 6, Sec 6.4

Quantitative evaluation : Performed by projecting into image domain

Observations Small objects tend to get decimated due to octree quantization while mesh

based representation better in representing surface.

Results16

[1] Sengupta et.al. “Urban 3d semantic modelling using stereo vision,” in ICRA, 2013[2] Valentin, et. al , “Mesh based semantic modelling for indoor and outdoor scenes,” in CVPR, 2013

[2][1][1]

[2][1][1]

Occupancy mapping17

Grouping of voxels hierarchically increases the occupied volume reducing the sparsity

Conclusion18

● Proposed a method which performs reconstruction in an efficient representation aided by semantics of the scene

● Combined AHRF and Octomap to get best of both

● Some Future Applications○ Scene interaction and manipulation.○ Collision detection, with known object types.○ Path Planning with known affordances.

ICRA 2015 interactive presentation

Science

Transcript of ICRA 2015 interactive presentation