Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for...

46
Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois at Urbana-Champaign GTC, Santa Jose, 2019 Contributors: Mani Golparvar Fard (University of Illinois) Milind Naphade (Nvidia), Murali Gopalakrishna (Nvidia), Amit Goel (Nvidia)

Transcript of Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for...

Page 1: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Video-Based Activity Forecasting

for Construction Safety Monitoring Use Cases

Speaker: Shuai TangUniversity of Illinois at Urbana-Champaign

GTC, Santa Jose, 2019

Contributors: Mani Golparvar Fard (University of Illinois)

Milind Naphade (Nvidia), Murali Gopalakrishna (Nvidia), Amit Goel (Nvidia)

Page 2: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Worker Dies From Falling 50 Feet

2References: California FACE Report #07CA009

Page 3: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

33

14 Worker Deaths Every Day In The US

20.7% of all worker deaths were in construction

OSHA estimates that eliminating top 4 hazards in construction

save 581 workersā€™ lives

Falls: 381 deaths (39.2%)

Struck By Object: 80 deaths (8.2%)

Electrocution: 71 deaths (7.3%)

Caught-In/Between: 50 deaths (5.1%)

Page 4: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

44

Page 5: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

'Carelessā€™ Operator Crushes Worker With Backhoe

5References: https://www.cdc.gov/niosh/topics /highwayworkzones/bad/pdfs/catre port2.pdf

Page 6: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Non-fatal Injuries In Construction

ā–Ŗ Safety incidentsā–Ŗ 971 fatal casesā–Ŗ 79,810 non-fatal cases involving days away from work

ā–Ŗ $1.3 trillion construction expenditure each year

ā–Ŗ Financial impact of safetyā–Ŗ Around $4 million cost per fatal case,ā–Ŗ Over $42,000 average cost per non-fatal case.

6

Page 7: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Motivation

7

Frequency

Safety inspections are taken typically weekly.

Accuracy

50% hazards not recognized by workers

Proactiveness

Safety measurements are often retrospective

Image sources: Google Image

Page 8: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Overreaching Goal:Visual-based activity forecasting towards

predictive safety monitoring

Page 9: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Opportunity - Growth In Visual Data

9

200-1,000 pictures per day ~1,000 pictures per day Time-Lapse pictures every 5-15min

~2,000 images per week 1-5 scans/month1-10 videos per day

Page 10: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

10Video sources: RAAMAC Lab

Page 11: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

11Video sources: Jun Yang, RAAMAC Lab

Page 12: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Video sources: RAAMAC Lab

Page 13: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Documentation, Intervention, Near-Miss Reporting

13

Near-miss ReportingRight-time Intervention

Image sources: Left: http://www.energysafetycanada.com/files/safety-alerts/Safety%20Alert%20-%2010.2018%20-%20Final.pdf Right: http://www.energysafetycanada.com/files/safety-alerts/Safety%20Alert-13%202016.pdf

Page 14: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

1414

Big Picture - Computer Vision & Jobsite Cameras

Detect, Track, Model Worker Activities

Understand Work Context

Predict Next Sequence of Activities

Page 15: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Social LSTM (Alahi et al. 2016)

15

Vanilla LSTM (Graves, 2013)T= 1 T= 2

x1y1x2y2

15Slide: Alexandre Alahi

Activity Forecasting ā€“ Computer Vision

Page 16: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

13

Social LSTMT= 1 T= 2

Details on social pooling for person 2 (in white)

Top view16

Activity Forecasting ā€“ Computer Vision

Social LSTM (Alahi et al. 2016)

Slide: Alexandre Alahi

Page 17: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

13

Social LSTMT= 1 T= 2

Details on social pooling for person 2 (in white)

17

Activity Forecasting ā€“ Computer Vision

Social LSTM (Alahi et al. 2016)

Occupancy mapSlide: Alexandre Alahi

Top view

Page 18: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

13

Social LSTMT= 1 T= 2

Details on social pooling for person 2 (in white)

18

Activity Forecasting ā€“ Computer Vision

Social LSTM (Alahi et al. 2016)

Slide: Alexandre Alahi

h1

h3

H2

Social tensorOccupancy mapTop view

Page 19: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Social LSTM learned to turn around a group

1

16

19Slide: Alexandre Alahi

Activity Forecasting ā€“ Computer Vision

Social LSTM (Alahi et al. 2016)ā— Black line is the ground truth trajectoryā— Gray line is the pastā— Heatmap is the predicted distribution

Page 20: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Social LSTM (Alahi et al. 2016)ā— Black line is the ground truth trajectoryā— Gray line is the pastā— Heatmap is the predicted distribution

Social LSTM learned to turn around a group

1

16

20Slide: Alexandre Alahi

Activity Forecasting ā€“ Computer Vision

Page 21: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

From Crowd Scenes To Construction Sites

21

Crowd scenes from UCY and ETH dataset Example construction sites, Google Image

Page 22: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

22

Construction sites often change drastically

D1 D5 D10

D14 D19 D21

From Crowd Scenes To Construction Sites

Page 23: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

2323

Approach ā€“ data-driven, context rich,

and sequence-to-sequence models

Page 24: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Model Architecture (Social LSTM)

For i ā€˜th trajectory at time t ā€¦ predict i ā€˜s location at t+1

(š‘„š‘” , š‘¦š‘”) Embed Layer

Social

Feature at t

LSTM Decoder

Mixture Density Network(MDN)

MDN output at t+1

Embed Layer

(š‘„š‘”+1, š‘¦š‘”+1)

Concatenation

Tensors

Model parameters

24

jā€™th bivariate Gaussian:

[ šœ‹š‘”+1š‘—

, šœ‡š‘”+1š‘—

, šœŽš‘”+1š‘—

, šœŒš‘”+1š‘—

]

Page 25: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Model Architecture (Social LSTM)

(š‘„š‘” , š‘¦š‘”) Embed Layer

Social

Feature at t

LSTM Decoder

Mixture Density Network(MDN)

MDN output at t+1

Embed Layer

(š‘„š‘”+1, š‘¦š‘”+1)

Concatenation

Tensors

Model parameters

25

Embed Layer

Social Feature at

t+1

LSTM

DecoderMDN

MDN output at

t+2

Embed Layer

(š‘„š‘”+2, š‘¦š‘”+2)

.

.

.

.

.

.

(š‘„š‘”+š‘†, š‘¦š‘”+š‘†)

For i ā€˜th trajectory at time t ā€¦ predict i ā€˜s location at t+1

Page 26: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Model Architecture (Ours)

26

For i ā€˜th trajectory at time t ā€¦ predict i ā€˜s location at {t+s1 , t+s2 , ā€¦ , t+sk }

.

.

.

.

.

(š‘„š‘” ,š‘¦š‘”) LSTM Encoder

OccuMap at t

LSTM Decoder MDN MDN output at t+s1

Trajectory feature at t

MDN output at t+sk

MDN output at t+s2Object Class of i

(š‘„š‘”+š‘†1 ,š‘¦š‘”+š‘†1)

(š‘„š‘”+š‘†2 ,š‘¦š‘”+š‘†2)

(š‘„š‘”+š‘†š‘˜, š‘¦š‘”+š‘†š‘˜)

Concatenation

Tensors

Model parameters

Page 27: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Model Architecture (Ours) - Occupancy Map

27

Page 28: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Color Code and Movement

Blue South West to North EastLime: North East to South WestRed: East to West

Yellow: North to South

Length: Average length of all

trajectories belonging to the cluster

Thickness: Cluster size (number of

Trajectories in the cluster)

Trajectory Features From Common Trajectories

28

Model Architecture (Ours)

Page 29: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

29

Model Architecture (Ours)

Iteratively Use Predicted Locations As Inputs Lead to Large Deviations

Page 30: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Model Architecture (Ours)

30

Concatenation

Tensors

Model parameters[ MDN parameters at t+s]

The jā€™th Gaussian Parameters:

[ šœ‹š‘”+š‘ š‘— , šœ‡š‘”+š‘ 

š‘— , šœŽš‘”+š‘ š‘— , šœŒš‘”+š‘ 

š‘— ]

Negative Log-likelihood (NLL) over all Gaussians of all traj.

Training time

Iš‘”+š‘  = argmaxš‘—

šœ‹š‘”+š‘ š‘—

Inference time

(š‘„š‘”+š‘  ,š‘¦š‘”+š‘ ) = šœ‡š‘”+š‘ š¼š‘”+š‘ 

.

.

.

.

.

(š‘„š‘” ,š‘¦š‘”) LSTM Encoder

OccuMap at t

LSTM Decoder MDN MDN output at t+s1

Trajectory feature at t

MDN output at t+sk

MDN output at t+s2Object Class of i

(š‘„š‘”+š‘†1 ,š‘¦š‘”+š‘†1)

(š‘„š‘”+š‘†2 ,š‘¦š‘”+š‘†2)

(š‘„š‘”+š‘†š‘˜, š‘¦š‘”+š‘†š‘˜)

For i ā€˜th trajectory at time t ā€¦ predict i ā€˜s location at {t+s1 , t+s2 , ā€¦ , t+sk }

Page 31: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

31

1

Image courtesy of Berni de Nina

Case Study At Nvidia Voyager Site

270 m (887 ft.) by 34 m (110 ft.)

Page 32: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Experiment Setup

ā–Ŗ Voyager dataset:

ā€¢ 1,464 mins (24.4 hrs) of 1080p videos

ā€¢ Trainval set (from 76 clips): person 1630, vehicle 1752

ā€¢ Test set (from 29 clips): person 143, vehicle 161

ā€¢ Traj. duration : [30, 2000] steps , endpts dist. > 50 pixels

ā–Ŗ TrajNet dataset:

ā€¢ 58 scenes from UCY, ETH and SSD dataset

ā€¢ 11,448 pedestrian traj.

ā€¢ 20 steps each traj., world coordinates in meter.

32

Page 33: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Implementation Details

ā–Ŗ Running on one RTX 2080 Ti GPU with Nvidia docker image

ā–Ŗ Optimization tricks:

ā€¢ gradient clipping to 50% gradient norm

ā€¢ Adam optimizer, lr = 0.005, lr decay to 50%

ā–Ŗ Dynamic length batches

ā–Ŗ Pre-computed features for accelerating training speed.

ā–Ŗ Training time:

ā€¢ Voyager: 1 hr for 1000 epochs with 3 MDN output heads

ā€¢ Trajnet: ~30 mins for 1700 epochs with 12 MDN output heads

33

Page 34: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Experimental Results ā€“ Voyager dataset

34

Group ID Method RMSE@10 RMSE@20 RMSE@40

Baselines

1 Linear Reg (š‘ = 1) 62.47 68.59 82.51

2 VAR (š‘ = 5) 46.85 90.27 163.02

3 MLP + Reg 14.17 27.08 50.16

4 LSTM+Reg 8.67 14.65 27.39

Ours

5 LSTM+MDN 7.42 13.26 25.256 LSTM+MDN (single output) 7.51 (0.22)* 13.30 (0.34) 25.20 (0.45)

7 LSTM+MDN+OccuMap 7.24 (0.02) 12.70 (0.008) 24.30 (0.01)

8 LSTM+MDN+Attribute 7.22 (0.0003) 12.95 (0.01) 24.74 (0.02)

9 LSTM+Traj. Feature 7.39 (0.03) 12.89 (0.05) 24.45 (0.03)

10 LSTM+MDN+OccuMap

+Attribute7.30 (0.09) 12.71 (0.005) 24.22 (0.004)

11 LSTM+MDN+OccuMap

+Attribute + Traj. Feature7.36 (0.04) 13.06 (0.03) 24.54 (0.008)

* p-values against method 5 (LSTM+MDN), p < 0.05 means two results are different with statistical significance

Experiment results and ablation study (error in pixels)

Page 35: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Experimental Results ā€“ TrajNet dataset

35

Group ID Method Average error Final error Mean error

Social

LSTM*

9 Occupancy LSTM 2.1105 3.12 1.101

10 Social LSTM 1.3865 2.098 0.675

Ours**

4 LSTM+Reg 1.039 1.382 0.696

5 LSTM+MDN 1.036 1.377 0.694

7 LSTM+MDN+OccuMap 1.028 1.370 0.686

*Unofficial Implementation from https://github.com/quancore/social-lstm

**cross validation result on train set because evaluation server not available

Tentative comparison between Social LSTM and Ours (error in meters)

Page 36: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Qualitative Results ā€“ Easy Example

36

Loc. at t t+10 t+20 t+40

Forecasted

Actual

Whole Traj.

x

y

Page 37: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Qualitative Results ā€“ Easy Example

37

Loc. at t t+10 t+20 t+40

Forecasted

Actual

Whole Traj.

x

y

Page 38: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Qualitative Results - Intermediate Difficulty

38

Loc. at t t+10 t+20 t+40

Forecasted

Actual

Whole Traj.

x

y

Page 39: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Qualitative Results - Intermediate Difficulty

39

Loc. at t t+10 t+20 t+40

Forecasted

Actual

Whole Traj.

x

y

Page 40: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Qualitative Results ā€“ Hard Example

40

Loc. at t t+10 t+20 t+40

Forecasted

Actual

Whole Traj.

x

y

Page 41: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Object Detection

+Object Tracking

Input video

Task: Forecast Entering Excavation Zone Events

1. Using trajectory forecasting model to predict person/vehicleā€™s future locations in 0.5/1.0/2.0 seconds

2. Matching predictions to human-defined excavation zones.

A Safety Application Prototype

41

Object detection + tracking:ā€¢ Mask RCNN (Resnet-101 backbone, Caffe2 Model zoo) for Person & Vehicle ā€¢ SORT for tracking Person & Vehicle objects

Page 42: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Admin panel to modify regions of interest

A Safety Application Prototype

42

Page 43: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Viewer panel

A Safety Application Prototype

43

Page 44: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Demo Video

A Safety Application Prototype

44

Page 45: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Summary

ā–Ŗ Improving construction safety requires more frequent, accurate and proactive inspections.

ā–Ŗ We show detection, tracking, and trajectory forecasting models are promising ways to improve predictive construction safety management.

45

Page 46: Video-Based Activity Forecasting for Construction Safety ......Video-Based Activity Forecasting for Construction Safety Monitoring Use Cases Speaker: Shuai Tang University of Illinois

Video-Based Activity Forecasting

for Construction Safety Monitoring Use Cases

Shuai [email protected]

GTC, Santa Jose, 2019