DeepPose & Convolutional Pose...

DeepPose & Convolutional Pose Machines

Main Concepts1. CNN with regressor head.2. Object Localization.3. Bigger to smaller or Smaller to bigger view.4. Deep Supervised learning to prevent Vanishing gradient problem in deep nets

CNN with Regressor head slides: andrej et al

CNN with Regressor Head eg code in keras

conv = Convolution2D(64,10,10,)

flat = Flatten()(conv)

left_eye = Dense(1, activation='linear', name='A')(flat)

right_eye = Dense(1, activation='linear', name='B')(flat)

model = Model(input=frame_in, output=[left_eye,right_eye])

model.compile(loss='mse') #mse= L2 loss ,mean square error,

Object localization

Human Pose as Object Localization but our object is non-rigid

Bigger to small or Small to Big(View)1. First paper takes First Coarse view of full image then goes for magnified view

of parts.2. Second Paper first takes small Receptive field and as depth increases image

view increases.

Deep Supervision

What’s Deep Supervision?

Why it’s needed? Vanishing Gradient

Generally |wj|<1,So, gradient will decrease, As depth increases.

Are there other solutions?1. Yes, Proper weight initialization.

2. What LSTMS did for RNN, ResNet’s did for FeedForward NN.

Paper 1: DeepPose

Why is it Hard?

Background1. Pictorial structures: parts and tree based

relations between them based on some priors.

2. Non tree models: The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts

Problem Specification

Encode the locations of all “k” body joints in pose vector defined as y = (...,yi ,...) ,i ∈ {1,...,k},

where yi contains the x and y coordinates of the ith joint.

Bounding Box TrickLets define a bounding box, b = (bc,bw,bh) in absolute image coordinates

Where bc = center of box

bh = height of box

bw = width of box

Transporting point to BOX coordinate from Absolute image coordinate

Any point Yi from absolute to BOX coordinate system(b).

ExampleYi=(4,4) in absolute frame

BOX b=(bc=(2,2),bw=4,bh=4)

Yi in BOX coordinate=(½,1/2)

Similarly Y=(0,0) becomes(-½,-1/2)

Why Box Coordinate?For our network it becomes very easy to work in this normalized environment,

Variance in Images. Large, Small etc.

Can always take inverse to get back Absolute Coordinates wrt to image.

Some AssumptionsFrom Now On We will work in Normalized box coordinate system.

Our network will give us predictions in Normalized BOX coordinate system.

Our loss function(L2) will also work in BOX frame.

We will take inverse to get back absolute coordinates.

Algo: we will work in stages to improveBasicly AlexNet with K regressor heads

Stage:1Where N(x,b) in transformation in box b coordinate system,

B_0 is full image or by person detector

Ψ is neural Net with parameters Ө1

Stage 2 onwards:Note: For next stages we are only predicting displacement of improvement.

Note: bounding box evaluation, from prev state

Where σ is hyperparameter, diam(y) is distance btw opposite joints in human torso. Left shoulder right hip,

LOSS function:Everything happening in BOX frame

Dataset setupFLIC: Hollywood

Used Body detector to get initial bounding box. Face based.

LSP: Sports

Directly Used. Since Humans are tightly cropped in image.

Dataset ValidationFLIC: Hollywood

Scalar sigma=1.0

LSP: Sports

Scalar sigma=2.0

Number of stages:

S=3 after algo stopped improvement on validation dataset.

Metrics: Percent of Detected Joint (PDJ)A joint is considered to be detected if the distance between the predicted and the true joint is within a certain fraction of the torso diameter.

By varying this fraction detection rates are obtained for varying degrees of localization.

Metrics Percent Correct Parts

A candidate body part is labeled as correct if its segment endpoints lie within 50% of the length of the ground-truth annotated endpoints.

Penalizes shorter limbs a lot which are harder to detect

Some Results

Effects of iterative refinement

Take AwayUse DNN regression for accurate object localization.

Very Easier to implement

Iterative refinement works for non rigid objects as well.

Paper 2: Convolutional Pose Machines slides Shih-En Wei et al

Where gt is multiclass(k) predictor making belief maps.

Stage 1conv1_stage1 = layers.conv2d(image, 64, 9(kernel), 1(stride))

pool1_stage1 = layers.max_pool2d(conv1_stage1, 2(kernel), 2(stride))

conv2_stage1 = layers.conv2d(pool1_stage1, 64, 9(kernel), 1(stride))

conv5_stage1 = layers.conv2d(conv4_stage1, 64, 5(kernel), 1(stride))

conv6_stage1 = layers.conv2d(conv5_stage1, 64, 1(kernel), 1(stride))

conv7_stage1 = layers.conv2d(conv6_stage1, 10(p+1), 1(kernel), 1(stride))

Stage 2conv4_stage2 = layers.conv2d(pool3_stage2, 64, 5(kernel), 1(stride))

concat_stage2 = tf.concat([conv7_stage1, conv4_stage2])

Mconv1_stage2 = layers.conv2d(concat_stage2, 64, 11(kernel),1(stride))

conv6_stage3=layers.conv2d(Mconv1_stage2,10(p+1),1(kernel),1(stride))

model = Model(input=image, output=[conv6_stage3,conv7_stage1])

model.compile(loss='mse') #mse= L2 loss ,mean square error,

Note: feature sharing from stage 2 onwards

Training styles

Dataset CroppingTrain:

Roughly resize images to have same scale.

Crop or Pad image according to center positions of human to make it 368*368.

Rough scale estimates are provided if not estimate them using joints.

Testing:

They do resizing and cropping(padding)

MPII dataset

MPII dataset with different viewpoints

FLICBeats the prev paper, see black is way above blue.

Summary:Iterative refining is the key.

Vanishing Gradients, Use Deep Supervision

This architecture can be extended to multiple persons.

I don’t know why authors didn’t used ResNet.

Try to model human 3d model from single image.

DeepPose & Convolutional Pose...

Documents

Transcript of DeepPose & Convolutional Pose...

Pose2Mesh: Graph Convolutional Network for 3D Human Pose ...

Deep convolutional neural networks - web.cs.ucdavis.eduyjlee/teaching/ecs189g-spring2015/lee... · 6/2/2015 18 CNN for Regression DeepPose [Toshev and Szegedy CVPR 2014] CNN as a

DeepPose: Human Pose Estimation via Deep Neural Networks · 2020-03-03 · DeepPose: Human Pose Estimation via Deep Neural Networks Alexander Toshev toshev@google.com Google Christian

Lifting From the Deep: Convolutional 3D Pose Estimation From a …openaccess.thecvf.com/content_cvpr_2017/papers/Tome... · 2017-07-04 · Lifting from the Deep: Convolutional 3D

PoseCNN: A Convolutional Neural Network for 6D Object Pose ...roboticsproceedings.org/rss14/p19.pdf · PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered

Real-time Human Pose Estimation with Convolutional Neural ...jultika.oulu.fi/files/nbnfi-fe2019082125000.pdf · Real-time Human Pose Estimation with Convolutional Neural Networks.

Exploiting Spatial-temporal Relationships for 3D Pose Estimation …jsyuan/papers/2019/Exploiting_Spatial... · Graph Convolutional Networks ... Introduction 3D pose estimation that

Convolutional Pose Machines: A Deep Architecture for ... · We introduce Convolutional Pose Machines (CPMs) for the task of articulated pose estimation. CPMs inherit the beneﬁts

Recognizing Human Actions as the Evolution of Pose ... · Figure 2: The overview of the proposed method. a) Convolutional pose machines predict pose estimation map of each body part.

Learning Feature Pyramids for Human Pose Estimationis.ulsan.ac.kr/files/announcement/627/Learning Feature Pyramids for... · tion, convolutional pose machines [55] and stacked hour-glass

Hand Detection and Pose Estimation using Convolutional ...859561/FULLTEXT01.pdf · Convolutional networks was ﬁrst applied with great success to image classiﬁ- cation, sparking

ConvPoseCNN: Dense Convolutional 6D Object Pose EstimationConvPoseCNN: Dense Convolutional 6D Object Pose Estimation Catherine Capellen 1, Max Schwarz a and Sven Behnke b 1Autonomous

cs230.stanford.educs230.stanford.edu/projects_winter_2019/reports/15813424.pdf · [1] Alexander Toshev and Christian Szegedy. Deeppose: Human pose estimation via deep neural networks.

Numerical Coordinate Regression with Convolutional Neural ...ble to any coordinate regression problem. 2.1. Human pose estimation DeepPose [11] is one of the earliest CNN-based mod-els

Relative Camera Pose Estimation Using … Camera Pose Estimation Using Convolutional Neural Networks Iaroslav Melekhov 1, Juha Ylioinas , Juho Kannala , and Esa Rahtu2 1 Aalto University,

Weakly-supervised 3D Hand Pose Estimation from Monocular ...imi.ntu.edu.sg/NewsEvents/Events/PastSeminars/Documents/31_Jan… · Convolutional Pose Machines [Wei. et al. CVPR 2016]

An Efﬁcient Convolutional Network for Human … KOSTRIKOV, GALL, LEIBE: EFFICIENT CNN FOR HUMAN POSE ESTIMATION 1 An Efﬁcient Convolutional Network for Human Pose Estimation Umer

Human Pose Estimation and Activity Classiﬁcation …cs231n.stanford.edu/reports/2015/pdfs/cdong-paper.pdfHuman Pose Estimation and Activity Classiﬁcation Using Convolutional Neural

Machine Learning for Video Tracking of Passengers and ...Convolutional Pose Machine [1] Mask R-CNN [2] [1] Shih-En, Ramakrishna, Kanade, Sheikh. "Convolutional pose machines." IEEE

CISE - GitHub PagesEmploy a person detector and perform single-person pose estimation for each detection e.g. Stacked Hourglass Networks for Human Pose Estimation, Convolutional Pose