Alisha Rege ([email protected])cs229.stanford.edu › proj2016 › poster ›...

Viewpoint Invariant Person Classification in RGB-D Data

Alisha Rege ([email protected])

Purpose

Artificial Intelligence can play a key role in healthcare; however, due to patient confidentiality (HIPAA), we are unable to process this information without putting up some boundaries. This boundary comes in the form of RGB-D data; it prevents us from seeing a face or a distinguishing personal characteristic in videos. This project attempts to detect a person from any viewpoint in Stanford Health’s RGB-D data. The goal is to create a detection system that will be able to identify a person from any view point. This will allow nurses and doctors to sense problems such as if a person suddenly fell or if the person has not moved in days. A 6-layer CNN classifier is used to classify the object.

Dataset

• Dataset from Stanford’s Lucile Packard Children’s Hospital

• RGB-D data from 3 different viewpoints

• Hand-labelled Data• Previous project created bounding

boxes for objects• Large variations in viewpoints, object

appearance and pose, object scale

Convolutional Neural Network Implementation

Thank you to Stanford Health Center and Stanford Artificial Intelligence Vision Lab for the dataset

6 – Layer CNN w/ Dropout:[1][2]

• Input (56 x 56 x1)• Convolutional Layer (5x5 convolution w/ 32 Bias) & RELU• Pooling Layer (2 down samples)• Convolutional Layer (5x5 convolution w/ 64 Bias) & RELU• Pooling Layer (2 down samples)• Fully-Connected Layer (14x14x64 Inputs -> 1024 Outputs)• Out (1024 Inputs -> 2 classes)

Dataset Preprocessing

• Translating Video into frames• Cropping annotations to feed into network• Resizing all images to 56x56x1• Batch Size: 50• Cleaning data

Acknowledgement

Mean Average Precision (mAP)

Discussion

References

[1] Yann LeCun et al.: LeNet-5, convolutional neural networks. http://yann.lecun.com/exdb/lenet/. Retrieved: April 22, 2015.

[2] Martín Abadi, Ashish Agarwal, etc. TensorFlow: Large-scale machinelearning on heterogeneous systems, 2015. Software available from tensorflow.org.

Baseline

• SVM w/ HOG Descriptors• HOG Descriptors: Slides through image and calculates the number of

gradients in certain direction • Uses NMS – only one major object in certain pixel range• Skimage Implementation

Convolutional Layer:

𝑥"#ℓ = & & 𝜔()

*+,

)-.

𝑦("1()(#1))ℓ+,*+,

(-.

𝑦"#ℓ =σ(𝑥"#ℓ )nonlinearity

min7

12 𝑤

; + 𝜆 𝑤 ;;

Results

• mxm convolutional size • 𝑥"#ℓ is every pixel selected

Future Work

• Try different camera type to distinguish doctor/nurse/etc• Use information to detect anomolies

𝑚𝐴𝑃 =1

𝑛𝑢𝑚𝑒𝑛𝑡𝑟𝑖𝑒𝑠 & 𝐴𝑃(𝑐)IJ*KILM"KN

O-,

𝐴𝑃 𝑐 = ∑ Q R SMKT(R)UVWX∑ MKT(R)UVWX

%𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 1 −𝐿𝑎𝑏𝑒𝑙O_MMKOL − 𝐿𝑎𝑏𝑒𝑙`MKa"OLKa

𝐿𝑎𝑏𝑒𝑙O_MMKOL∗ 100

Classifier Training mAP

Training Accuracy

Testing mAP

Testing Accuracy

SVM w/ HOG descriptors .702 .609 .542 .532

CNN over cropped images .977 .971 .693 .590

CNN over entire image .723 .624 .560 .540

CNN with double representation .985 .981 .705 .650

CNN with unequal representation .985 .904 .693 .608

CNN with Histogram Equalization .988 .982 .715 .667

CNN on precise cropped images .999 .999 .912 .890

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 200 400 600 800

Mea

n A

vera

ge P

reci

sion

(mA

P)

Epoch

Epoch vs Mean Average Precison

Training Accuracy Testing Accuracy

Epoch: 410 Epoch is maximum valueHistogram Equalization:

Effect of Training Size on Output:

Viewpoint Number of Training Images

Testing mAP

Top-Down 544Y/377N .71Mid-top Wall 767Y/383N .81Hallway 374Y/284N .64

Alisha Rege ([email protected])cs229.stanford.edu › proj2016 › poster ›...

Documents

Transcript of Alisha Rege ([email protected])cs229.stanford.edu › proj2016 › poster ›...