Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

55
Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen

Transcript of Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Page 1: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Face and Speech Identification System (FASIS)

George Liao, Andrew Au, Ching-Hsin Chen

Page 2: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Overview Project Overview Ztitch Solutions Team Motivation Design Solution Design Alternatives Software Design Hardware Design Finance Schedule Future Work What we learned Conclusion Acknowledgements / Questions Demo overview

2

Page 3: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Ztitch Solutions Team

3

Andrew Au (Team Leader): 5th year computer engineering student 16 months of development experience at Nokia & Sierra Wireless 4 months NSERC research assistant for Dr. Jie Liang Freelance mobile developer; published “Ztitch” app for Windows

Phone 7

George Liao: 5th year electronics engineering student Experience in MATLAB image processing Software debug and test Audio Processing

Ching-Hsin (Danny) Chen: 5th year electronics engineering student 12 months of research experience at Broadcom Hardware designer QA and debugging

Page 4: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Motivation

4

Number of smart phones worldwide ~200M [April 2010 Park Associates]

Mobile internet usage will exceed fixed line internet by 2014 [Morgan Stanley]

Steady growth in demand for mobile applications. Value market estimated ~$14.5B USD by 2012. [CNET]

Page 5: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Motivation

5

Despite high smart phone demand, there hasn’t been much innovation in the area of mobile log-in and security Username/password scheme is difficult on a phone Example: [email protected] / enter123

Any process/method which allows the user execute a task faster is highly desirable. Example: PayPal – fast payment system Google – efficient search engine SMS – fast messaging protocol

It’s all about fast and efficiency

Page 6: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Motivation (cont’d)

6

Our goal:

Implement a new method of secured mobile log-in

Eliminate the need for tedious typing on tiny touch screens or keypads

Secured, fast, and efficient

Page 7: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Design Solution

7

Face recognition Ease of access It’s quick to snap a photo But we need a secondary solution to make it more secured...

Voice recognition Providing a spoken phrase is also quick

Page 8: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Design Solution (cont’d)

8

We combine face and voice recognition as following Note: our original goal was to use mug shot to grant

access to the server, but there are still some concerns in our mind about the security issues. In alternative here is the steps that we have

1) User snaps picture of face using phone, which relays image to server via cellular internet connection

2) Server recognizes the face, and requests voice password

3) User speaks specific keyword as a password to phone, which relays the speech data to the server via VOIP

Page 9: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Design Solution (cont’d)

9

Processing will be done remotely on the server as an online service

Reason: More secured than client processing Independent of phone’s processing power Easier to apply software upgrades (updating a server vs.

updating thousands of users’ phones)

1

2 (send voice and face image)

3 (grant access)

Page 10: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Design solution (cont’d)

10

That was a simplified model

There are many other details to be considered, i.e. : Image compression Key encryption / decryption Reducing ambient noise during voice recognition Face localization Handling multiple failed attempts Image and voice data

For the proof-of-concept, we don’t have time to do all of this, only some of these plus the basic model

Page 11: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Design Solution (cont’d)

11

In our model, the face is the identifier, replacing username

The spoken-phrase replaces the password

“enter123”

Page 12: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Design Alternatives

12

Besides face and voice recognition, the other alternatives are:

1)Conventional typed username/password Slow and tedious as mentioned before

2) Fingerprint Requires hardware modification to existing phone Our system requires only software, but the demo

prototype has hardware modification for the purpose of 3rd party control of the phone during demo

3) Eye-Iris Complex and requires a special camera

Page 13: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Design Alternatives (cont’d)

13

Besides server-side processing, the alternative is client-side processing

Client-side processing is executing the face and voice recognition on the phone, rather than the server

Main disadvantage: Identifier & password are stored on phone and

therefore vulnerable to mobile thefts

Page 14: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design

14

The software is divided into three parts:

1) Face Localization

2) Face Recognition

3) Voice Recognition

Page 15: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

15

The first step of the software is face localization, or tracking where it is

Where is my face in this image? The computer does not know!

Page 16: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

16

There are a few different methods of face localization, but some of them require additional equipment such as two cameras (stereo). Many research papers in this area.

Our method is simple and fast. Can be done in real-time.

First, we define the range of color that is the human skin color

Page 17: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

17

Second, we filter out all the color in the image that matches my definition of skin color

Filter

Page 18: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

18

Third, we remove the noises in the new image.

Noise removal

Page 19: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

19

Unfortunately, removing the noises also removes some data

So fourth, we expand with dilation

Expand

Page 20: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

20

Finally we have a face “blob”, and we can determine the center of this blob in x-y coordinates by stacking

Stack up the pixels for the two axis

Page 21: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

21

Now the problem of face localization is solved, and the computer knows where my face is

However, this is a simple case only...

Crop

Page 22: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

22

What if there are multiple faces in the image?

We can use the same steps as before except replace pixel stacking with Hough circle detection

Page 23: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

23

Same as before:

Page 24: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #1 Face Localization

24

An algorithm called Circular Hough Transform is used Detects the edge points that lie along the outline

of a circle We can generalize this method to

detect arbitrary shapes Slower than previous method, but covers more

scenarios

Page 25: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #2 Face Recognition

25

We choose to use a method of face recognition called Eigenface

Easy to implement and fits our tight development schedule

Can be upgraded to the Eigenfeatures for higher accuracy (as part of our future work)

Page 26: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #2 Face Recognition

26

First, add a set of images of the user’s face to the database

Usually 5 or more images with slight variations in angle and lighting conditions

We add our first image:

Compute mean face

Image #1

Mean face

Page 27: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #2 Face Recognition

27

Now, we add a second image to the database

Compute mean face

Image #1 Mean

faceImage #2

Page 28: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #2 Face Recognition

28

Now, we add a third image

Compute mean face

Image #1 Mean

faceImage #2 Image

#3

Page 29: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #2 Face Recognition

29

We can add a few more images until we finally have our database, a.k.a. training set

Now, we execute face recognition as follows: Compare the input image with the mean face, and

find the difference from face space, and the difference

If the error is above a certain threshold:recognition fails

If the error is below a certain threshold:recognition successful

Page 30: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #2 Face Recognition

30

Calculate two values: difference, and difference from face space

In this example Difference = 4418.3 Difference from face space = 316.4081

normalize((input-mean)-projection)

Mean face (from database)

Input image

Page 31: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #2 Face Recognition

31

We set our threshold values via trial and error. From our test:

When the input face image is the real owner Difference < 500 Difference from face space < 5000

When the input face image is NOT the real owner 500 < Difference < 2000 5000 < Difference from face space < 10000

When the input image is not a face 2000 < Difference 10000 < Difference from face space

Page 32: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #2 Face Recognition

32

Our results parallel the results from other Eigenface recognition researchers

The following is from cnx.org [Rice University]

Page 33: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Software Design #3 Voice Recognition

33

The brain of our voice recognition is the Microsoft Speech SDK which is free and comprehensive

Does not require the developer to have extensive knowledge in voice pattern science Provides a high level application programming

interface (API) for third party developers to use speech recognition in their applications

FASIS

Speech SDK 5.1

Page 34: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

34

We stress that the final commercialized product requires no hardware modification

We need to modify the hardware in the prototype to control the phone OS. We do not have the underlying permission to control the

phone’s functionality, such as sending an image automatically, or signalling it to lock/unlock

Page 35: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

35

In this prototype, the hardware are: The phone itself Hardware board to relay image to the server The PC acting as server Microcontroller to control the phoneSummary:

Transceiver

Microcontroller

Page 36: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

36

The phone: Nokia N96 Non-touch screen 320x240 resolution front facing camera

The transceiver: RS232 serial connection The board uses the MAX3222 IC

Low power consumption and high data rate Requires four 0.1μF external charge pump capacitors Guaranteed 120kbps while maintaining standard RS232

levels 2 receivers and 2 drivers

Page 37: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

37

The Server: Executes the software design Run using mainly MATLAB Also needs drivers to talk to transceiver and MCU Alerts owner of intruders via email (sends picture

attachment)

Page 38: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

38

The BOE kit Parallax Board of Education (BOE) kit Comes with MCU + bread board for our custom circuit to wire

the phone with the MCU USB connection for programming and communication during

run-time. We wired the phone’s buttons to the MCU so that we can control

those buttons using the PC

Page 39: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

39

Basic STAMP 2 Module: Processor Speed: 20 MHz RAM Size: 32 Bytes Number of I/O Pins: 16 + 2 dedicated serial PBASIC Commands: 42 Package: 24-pin DIP

Page 40: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

40

When switch is on, metal spring makes contact with two wires, allowing current to flow.

Phone’s internal MCU cycles the input lines B0 – B3: If you pull B1 High (and the others low) if some key is

pressed, the voltage is transferred to the corresponding row wire, so if you get A1 as output you know that the buttons SW1 is pressed.

Page 41: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

41

We can create a current to simulate the key press as follows: General-purpose I/O pins P0-P15: each can sink 25 mA

and source 20mA. The HIGH command sets the specified pin to 1 (a +5

volt level) and then sets its mode to output. HIGH 14

Page 42: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

42

Main:HIGH 0PAUSE 500LOW 0PAUSE 500END

Page 43: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Hardware Design

43

The integrated hardware:

Page 44: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Finance

44

The cost of this project was substantially reduced because Nokia provided us with the N96

Many software are also free for students via DreamSpark (Visual Studio 2010)

Total cost came to about $250 CAD

Page 45: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Finance

45

The overhead cost for commercializing this product is low because it is entirely software based

We can either accept a one-time fee, or an annual subscription fee from users

Most of the expense comes from hosting dedicated servers to execute the software algorithms and storing user’s training sets (face images)

There are many dedicated hosting services available for a monthly fee of ~$100 / month, allowing us to basically rent these expensive equipment located elsewhere

Page 46: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Schedule

46

Keynotes: Project began last semester Research took longest, then development Documentation cost a lot of time, but well worth it

Page 47: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Future Work

47

Improve the algorithms Eigenfeatures - combines facial metrics, which is

measuring distance between facial features, with the Eigenface approach

Further enhance localization methods

Collaborate with Symbian to get low level OS access Symbian is the Nokia phone’s operating system, and

FASIS needs permission from the company in order for FASIS to become a reality

Setup our dedicated servers This demo uses a laptop, but the final product requires

commercial grade servers to handle thousands of users

Page 48: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Future Work

48

Generalize the system for other brands, not just Nokia

Page 49: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

What We Learned

49

Professional documentation (ENSC305)

Group dynamics and team management

How to create a product from scratch From research to commercialization

Programming Low-level (Microcontroller, C) High-level (SAPI, C#, .NET) Scripting (Batch files, MATLAB)

Page 50: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Conclusion

50

The Face and Speech Identification System (FASIS) fills the need for a rapid secured mobile log-in solution to eliminate tedious typing on small touchscreens/keypads

Efficient while maintaining a level of security

With further improvements, we firmly believe that FASIS could become a marketable product considering the current trend in the mobile industry...

Page 51: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Conclusion

51

There are 200 million smart phones in the world, and this number is rising rapidly...

...even if we capture only 1% of the market, our business can become huge

Page 52: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Acknowledgements

52

Ali & Carlyn Excellent feedback and comments in our marked

documents

Dr. Rawicz & Mike Excellent feedback during oral progress reports The idea for voice recognition

Nokia Vancouver Simon Wong, who provided us with the phone

Microsoft Free software tools for students via the DreamSpark

program

Page 53: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Questions

53

?

Page 54: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

54

Thank you

-Ztitch Solutions

Page 55: Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Live Demonstration

55

Overview:

1. Face localization

2. FASIS: Try to authenticate real owner (Andrew Au)

3. FASIS: Try to authenticate non-face object (hand)

4. FASIS: Try to authenticate an audience member