ASL2TXT Converting sign language gestures from digital images to text George Corser.

Post on 16-Dec-2015

231 views 0 download

Tags:

Transcript of ASL2TXT Converting sign language gestures from digital images to text George Corser.

ASL2TXT

Converting sign language gestures from digital images to text

George Corser

Presentation Overview

• Concept• Foundation: Barkoky & Charkari (2011)– Segmentation – Thinning

• My Contribution: Corser (2012)– Segmentation (similar to Barkoky)– CED: Canny Edge Dilation (Minus Errors)– Assumption: User trains his own phone

Concept

• Deaf and hearing people talking on the phone, each using their natural language

• Sign-activated commands like voice-activated

Situation: Drive Thru Window

1. Deaf person signs order2. Phone speaks order3. Confirmation on screen

Think:Stephen Hawking

Process Flow

• Requires several conversion processes• Many have been accomplished• Remaining: ASL2TXT

Goal: Find an Algorithm

• Find an image processing algorithm that recognizes ASL alphabet

= AWeb site

Barkoky: Segmentation & Thinning

Barkoky countsendpoints to

determine sign(doesn’t work for ASL)

Barkoky ProcessSegmentation1. Capture RGB image2. Rescale3. Extract using colors4. Reduce noise5. Crop at wrist6. Result: hand segment

Thinning7. Input: hand segment8. Apply thinning9. Find endpoints, joints10. Calculate lengths11. Clean short lengths12. Identify gesture by

counting endpoints

1. Capture RGB Image2. Rescale

% ---------- 1. Capture RGB imagea = imread('DSC04926.JPG');figure('Name','RGB image'),imshow(a);

% ---------- 2. Rescale image to 205x154a10 = imresize(a, 0.1);figure('Name','Rescaled image'),imshow(a10);

3. Extract Hand Using Colors

% ---------- 3. Extract hand using colorabw10 = zeros(205,154,1);for i=1:205, for j=1:154, if a10(i,j,2)<140 && a10(i,j,3)<100,

abw10(i,j,1)=255; end; end; end;figure('Name','Extracted'),imshow(abw10);

Note: Color threshold codediffers from Barkoky

Colors: Training Set Histograms

Colors: Training Set (2)

Excel

Red Green Blue

Colors: Test Set Histograms

4. Reduce Noise

% ---------- 4. Reduce noisefor i=2:204, for j=1:154, if abw10(i-1,j,1)==0

if abw10(i+1,j,1)==0, abw10(i,j,1)=0; end; end;

if abw10(i-1,j,1)==255 if abw10(i+1,j,1)==255,

abw10(i,j,1)=0; end; end;end; end;abw10 = imfill(abw10,'holes');

5. Identify Wrist Position

% ---------- 5. Identify wrist positionfor i=204:-1:1, for j=1:154,

if abw10(i,j,1)==255, break; end; end; if j ~= 154 && abw10(i+1,j,1)~=255, wristi=i+1; wristj=j+1; break; end; end;

Wrist Detection

• Algorithm searches bottom-to-top of image • Finds a leftmost white pixel above black pixel• Sets wrist position SE of found white pixel

Corser: Segmentation & CED

• Segmentation (similar to Barkoky)– Color threshold technique slightly different– American Sign Language (ASL) alphabet, not

Persian Sign Language (PSL) numbers• Image Comparison: Tried Several Methods– Full Threshold (Minus Errors)– Diced Segments (Minus Errors)– Endpoint Count Difference– CED: Canny Edge Dilation

ASL Training Set

Hit-or-miss: 23% Barkoky: 8%

ASL Test Set

MATLAB

A

A

B

B

C

C

D

D

E

E

F

F

G

G

H

H

I

I

J

J

K

K

L

L

M

M

N

N

O

O

P

P

Q

Q

R

R

S

S

T

T

U

U

V

V

W

W

X

X

Y

Y

Z

Z

Z

Hybrid Algorithm Example

% ---------- MATLAB Code -------------------matchtotal = 0;if abs(x10range - x20range) < 20, matchtotal = matchtotal + 10;end;if abs(y10range - y20range) < 20, matchtotal = matchtotal + 11;end;matchtotal = matchtotal - abs(h10 - h20);% ----- h10, h20 are vector magnitudes -----

Erosion Subtraction

Canny Edge

Canny Edge Dilation Code

% ---------- MATLAB Code -------------------

se = strel('disk',5);a10 = edge(a10,'canny');a20 = edge(a20,'canny');a10 = imdilate(a10,se);a20 = imdilate(a20,se);

% ----- Then calculate matches minus errors

Experimental Results

Technique CorrectFull Threshold (Minus Errors) 19% (27%)Diced Segments (Minus Errors) 23% (27%)Barkoky Endpoint Count Diff. 8%Hybrid - Height/Width/Endpoints 19%Erosion Subtraction 15%Canny Edge Dilation (Minus Errors) 12% (35%)

Disadvantages

• Dependent on lighting conditions• Fails with flesh-tone backgrounds• Requires calibration to a specific user• Limited applications: text messaging,

activation (“sign” similar to voice activation)• ASL numbers (A=10, D=1, O=0, V=2, W=6)• Alphabet is tiny portion of full translation:

complete translation maybe many years away

Future Work

• Barkoky claims flesh tones can be detected, but I have yet to replicate (even Barkoky changed his color detection scheme)

• Could write letter-by-letter algorithm• Could use range camera to compute distance

of finger instead of shape of hand• Motion analysis or edge count• Many possibilities… we’ve only just begun!

Cue: music http://www.youtube.com/watch?v=__VQX2Xn7tI

The End