Challenges for Deep Scene Understanding - MIT...

28
Challenges for Deep Scene Understanding Bolei Zhou MIT Hang Zhao Sanja Fidler (UToronto) Adela Barriuso Antonio Torralba Aditya Khosla Aude Oliva Xavier Puig Bolei Zhou

Transcript of Challenges for Deep Scene Understanding - MIT...

Page 1: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

Challenges for Deep Scene Understanding

Bolei Zhou

MIT

Hang

ZhaoSanja

Fidler(UToronto)

Adela

BarriusoAntonio

Torralba

Aditya

Khosla

Aude

Oliva

Xavier

Puig

Bolei

Zhou

Page 2: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

ObjectsintheSceneContext

Page 3: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

Challenge 1: Scene Classification

Top1: street

Top2: residential neighborhood

Top3: crosswalk

Top4: apartment building

Top5: office building

objects

Challenge 2: Scene Parsingstuff

Deep s

cene u

nders

tandin

g

Page 4: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

• 8milliontrainingimagesfrom365categoriesofPlacesDatabase

• Testset:900imagespercategory

Webpage:http://places2.csail.mit.edu

Page 5: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

Constructing Places Database

2.Queryanddownload

images

696adjectives+scenenames

~90million rawimagesdownloaded

1. Collectscenenames

fromdictionary

~1000scenenames

3.Annotatethrough

AmazonMechanicalTurk

Threeroundsofannotations

Page 6: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

amusement park

arch

corral

windmill train station platform

Urban

tower

swimming pool street

soccer field

elevator door

bar

cafeteria

veterinarians office

bedroom

conference center

Indoor

staircase

shoe shop

field road

fishpond

watering hole

Nature

rainforest

Page 7: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

Results

92 validsubmissionsfrom27 teams(eachteamallowstosubmit

atmost5submissions).

6.00%

8.00%

10.00%

12.00%

14.00%

16.00%

18.00%

20.00%

0 10 20 30 40 50 60 70 80 90 100

Top-5errorsofallthe92submission(sorted)

Baseline

singleResNet152:14.9%

Page 8: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

Results

Team Name Top-5Error

Hikvision 9.01%

MW 10.19%

Trimps-Soushen 10.30%

SIAT_MMLAB 10.43%

NTU-SC 10.85%

Hikvision

Qiaoyong Zhong, ChaoLi,Yingying

Zhang,Haiming Sun,Shicai Yang,Di

Xie,Shiliang Pu.

Hikvision ResearchInstitute

MW

GangSunandJie Hu

ChineseAcademyofSciencesand

PekingUniversity

Trimps-Soushen

Jie Shao,Xiaoteng Zhang,Zhengyan

Ding,Yixin Zhao,Yanjun Chen,Jianying

Zhou,Wenfei Wang,LinMei,

Chuanping Hu

TheThirdResearchInstituteofthe

MinistryofPublicSecurity,China

92 validsubmissionsfrom27 teams.

ResNet152 14.93%

VGG16 14.99%

AlexNet 17.25%

Singlemodel

baselines

Page 9: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

Ambiguouspredictions

1)Unusualactivityinascene 2)Multiplesceneparts

top-1: restaurant

top-2: ice cream parlor

top-3: coffee shop

top-4: pizzeria

top-5: cafeteria

aquarium

top-1: campsite

top-2: sandbox

top-3: beer garden

top-4: market outdoor

top-5: flea market indoor

junkyard

top-1: balcony interior

top-2: beach house

top-3: boardwalk

top-4: roof garden

top-5: restaurant patio

lagoon

top-1: martial arts gym

top-2: stable

top-3: boxing ring

top-4: locker room

top-5: basketball court

construction site

Page 10: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

• Newchallengethisyear

• Eachpixeloftheimageisclassifiedintosomeclass

class labelsemantic mask

Scene

parsing

Page 11: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

• 22,000imagesfortrainingandvalidation,3,000imagesfortesting

• 150classesofobjects(car,person,table,etc)andstuff(sky,road,ceiling,etc)

00.020.040.060.080.1

0.120.140.160.18

Pixelfrequencyinthetrainingset

Page 12: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

• 22,000imagesfortrainingandvalidation,3,000imagesfortesting

• 150classesofobjects(car,person,table,etc)andstuff(sky,road,ceiling,etc)

Page 13: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

ConstructingADEDataset

• Annotatingeachobjectinstancesinascene

• Singleexpertannotatorforafewyearsofwork

http://groups.csail.mit.edu/vision/datasets/ADE20K/

Labelme AnnotationTool

Ms. Adela Barriuso

Page 14: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao
Page 15: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao
Page 16: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao
Page 17: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao
Page 18: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao
Page 19: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao
Page 20: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao
Page 21: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao
Page 22: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

Results75 validsubmissionsfrom22 teams

0.3

0.35

0.4

0.45

0.5

0.55

0.6

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75

Finalscore=(meanIoU +pixelaccuracy)/2forallthe75

submissions

Baseline(VGG-based):0.4567

Page 23: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

Results

Team Name FinalScore

SenseCUSceneParsing 0.5721

Adelaide 0.5674

360+MCG-ICT-CAS_SP 0.5556

SegModel 0.5465

CASIA_IVA 0.5433

FinalScore=(meanIoU +pixelaccuracy)/2

SenseCUSceneParsing

Hengshuang Zhao,Jianping Shi,

Xiaojuan Qi,Xiaogang Wang,Tong

Xiao,Jiaya Jia

Sensetime andCUHK,HongKong

Adelaide

Zifeng Wu,Chunhua Shen,Antonvan

enHengel

UniversityofAdelaide,Australia

360+MCG-ICT-CAS_SP

Rui Zhang,MinLin, ShengTang, YuLi, YunPeng

Chen, YongDong Zhang, JinTao Li, YuGang

Han, ShuiCheng Yan

Qihoo 360 ,MultimediaComputing

Group,InstituteofComputing

Technology,ChineseAcademyofSciences(MCG-

ICT-CAS),NationalUniversityofSingapore(NUS)

75 validsubmissionsfrom22 teams

DilatedNet 0.4567

FCN-8s 0.4480

SegNet 0.4079

Singlemodel

baselines

Page 24: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

DataConsistencyandHumanPerformance

• 61imagesfromval setarere-annotatedafter6months.

82.4%pixelsgotthesamelabel.

64.03% 64.77% 65.41%

73.67% 74.49% 74.73%

82.40%

50.00%

55.00%

60.00%

65.00%

70.00%

75.00%

80.00%

85.00%

PixelAccuracy

Page 25: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

DataConsistencyandHumanPerformance

• 61imagesfromval setarere-annotatedafter6months.

82.4%pixelsgotthesamelabel.

64.03% 64.77% 65.41%

73.67% 74.49% 74.73%

82.40%

50.00%

55.00%

60.00%

65.00%

70.00%

75.00%

80.00%

85.00%

PixelAccuracy

floor

sky

wall

building

20.30%pixelaccuracy

averageimage annotationmode

Page 26: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

96.55% 96.36% 96.27% 91.96% 91.04%

94.14% 95.63% 96.55% 94.19% 89.59%

93.71% 93.51% 94.84% 94.89% 94.54%

92.73% 91.88% 93.02% 92.62% 91.47%

92.25% 85.66% 90.47% 79.81% 84.85%

Image Ground-truth SenseCU… Adelaide 360+MCG… SegModel CASIA_IVA

Page 27: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

62.89% 51.23% 50.25% 45.66% 67.91%

56.19% 42.86% 55.82% 81.53% 48.69%

grandstand

54.08% 52.81% 53.70% 50.49% 38.97%

41.49% 40.72% 49.07% 42.28% 45.04%

boat

car

booth

Image Ground-truth SenseCU… Adelaide 360+MCG… SegModel CASIAIVA

Page 28: Challenges for Deep Scene Understanding - MIT CSAILpeople.csail.mit.edu/bzhou/publication/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao

ThanksalltheParticipantsandAudiences!

Hang

ZhaoSanja

Fidler(UToronto)

Adela

BarriusoAntonio

Torralba

Aditya

Khosla

Aude

Oliva

Xavier

Puig

Bolei

Zhou

http://places2.csail.mit.edu

http://sceneparsing.csail.mit.edu