Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Face and Speech Identification System (FASIS)

George Liao, Andrew Au, Ching-Hsin Chen

Overview Project Overview Ztitch Solutions Team Motivation Design Solution Design Alternatives Software Design Hardware Design Finance Schedule Future Work What we learned Conclusion Acknowledgements / Questions Demo overview

2

Ztitch Solutions Team

3

Andrew Au (Team Leader): 5th year computer engineering student 16 months of development experience at Nokia & Sierra Wireless 4 months NSERC research assistant for Dr. Jie Liang Freelance mobile developer; published “Ztitch” app for Windows

Phone 7

George Liao: 5th year electronics engineering student Experience in MATLAB image processing Software debug and test Audio Processing

Ching-Hsin (Danny) Chen: 5th year electronics engineering student 12 months of research experience at Broadcom Hardware designer QA and debugging

Motivation

4

Number of smart phones worldwide ~200M [April 2010 Park Associates]

Mobile internet usage will exceed fixed line internet by 2014 [Morgan Stanley]

Steady growth in demand for mobile applications. Value market estimated ~$14.5B USD by 2012. [CNET]

Motivation

5

Despite high smart phone demand, there hasn’t been much innovation in the area of mobile log-in and security Username/password scheme is difficult on a phone Example: [email protected] / enter123

Any process/method which allows the user execute a task faster is highly desirable. Example: PayPal – fast payment system Google – efficient search engine SMS – fast messaging protocol

It’s all about fast and efficiency

Motivation (cont’d)

6

Our goal:

Implement a new method of secured mobile log-in

Eliminate the need for tedious typing on tiny touch screens or keypads

Secured, fast, and efficient

Design Solution

7

Face recognition Ease of access It’s quick to snap a photo But we need a secondary solution to make it more secured...

Voice recognition Providing a spoken phrase is also quick

Design Solution (cont’d)

8

We combine face and voice recognition as following Note: our original goal was to use mug shot to grant

access to the server, but there are still some concerns in our mind about the security issues. In alternative here is the steps that we have

1) User snaps picture of face using phone, which relays image to server via cellular internet connection

2) Server recognizes the face, and requests voice password

3) User speaks specific keyword as a password to phone, which relays the speech data to the server via VOIP


9

Processing will be done remotely on the server as an online service

Reason: More secured than client processing Independent of phone’s processing power Easier to apply software upgrades (updating a server vs.

updating thousands of users’ phones)

1

2 (send voice and face image)

3 (grant access)

Design solution (cont’d)

10

That was a simplified model

There are many other details to be considered, i.e. : Image compression Key encryption / decryption Reducing ambient noise during voice recognition Face localization Handling multiple failed attempts Image and voice data

For the proof-of-concept, we don’t have time to do all of this, only some of these plus the basic model


11

In our model, the face is the identifier, replacing username

The spoken-phrase replaces the password

“enter123”

Design Alternatives

12

Besides face and voice recognition, the other alternatives are:

1)Conventional typed username/password Slow and tedious as mentioned before

2) Fingerprint Requires hardware modification to existing phone Our system requires only software, but the demo

prototype has hardware modification for the purpose of 3rd party control of the phone during demo

3) Eye-Iris Complex and requires a special camera

Design Alternatives (cont’d)

13

Besides server-side processing, the alternative is client-side processing

Client-side processing is executing the face and voice recognition on the phone, rather than the server

Main disadvantage: Identifier & password are stored on phone and

therefore vulnerable to mobile thefts

Software Design

14

The software is divided into three parts:

1) Face Localization

2) Face Recognition

3) Voice Recognition

Software Design #1 Face Localization

15

The first step of the software is face localization, or tracking where it is

Where is my face in this image? The computer does not know!


16

There are a few different methods of face localization, but some of them require additional equipment such as two cameras (stereo). Many research papers in this area.

Our method is simple and fast. Can be done in real-time.

First, we define the range of color that is the human skin color


17

Second, we filter out all the color in the image that matches my definition of skin color

Filter


18

Third, we remove the noises in the new image.

Noise removal


19

Unfortunately, removing the noises also removes some data

So fourth, we expand with dilation

Expand


20

Finally we have a face “blob”, and we can determine the center of this blob in x-y coordinates by stacking

Stack up the pixels for the two axis


21

Now the problem of face localization is solved, and the computer knows where my face is

However, this is a simple case only...

Crop


22

What if there are multiple faces in the image?

We can use the same steps as before except replace pixel stacking with Hough circle detection


23

Same as before:


24

An algorithm called Circular Hough Transform is used Detects the edge points that lie along the outline

of a circle We can generalize this method to

detect arbitrary shapes Slower than previous method, but covers more

scenarios

Software Design #2 Face Recognition

25

We choose to use a method of face recognition called Eigenface

Easy to implement and fits our tight development schedule

Can be upgraded to the Eigenfeatures for higher accuracy (as part of our future work)


26

First, add a set of images of the user’s face to the database

Usually 5 or more images with slight variations in angle and lighting conditions

We add our first image:

Compute mean face

Image #1

Mean face


27

Now, we add a second image to the database

Compute mean face

Image #1 Mean

faceImage #2


28

Now, we add a third image

Compute mean face

Image #1 Mean

faceImage #2 Image

#3


29

We can add a few more images until we finally have our database, a.k.a. training set

Now, we execute face recognition as follows: Compare the input image with the mean face, and

find the difference from face space, and the difference

If the error is above a certain threshold:recognition fails

If the error is below a certain threshold:recognition successful


30

Calculate two values: difference, and difference from face space

In this example Difference = 4418.3 Difference from face space = 316.4081

normalize((input-mean)-projection)

Mean face (from database)

Input image


31

We set our threshold values via trial and error. From our test:

When the input face image is the real owner Difference < 500 Difference from face space < 5000

When the input face image is NOT the real owner 500 < Difference < 2000 5000 < Difference from face space < 10000

When the input image is not a face 2000 < Difference 10000 < Difference from face space


32

Our results parallel the results from other Eigenface recognition researchers

The following is from cnx.org [Rice University]

Software Design #3 Voice Recognition

33

The brain of our voice recognition is the Microsoft Speech SDK which is free and comprehensive

Does not require the developer to have extensive knowledge in voice pattern science Provides a high level application programming

interface (API) for third party developers to use speech recognition in their applications

FASIS

Speech SDK 5.1

Hardware Design

34

We stress that the final commercialized product requires no hardware modification

We need to modify the hardware in the prototype to control the phone OS. We do not have the underlying permission to control the

phone’s functionality, such as sending an image automatically, or signalling it to lock/unlock

Hardware Design

35

In this prototype, the hardware are: The phone itself Hardware board to relay image to the server The PC acting as server Microcontroller to control the phoneSummary:

Transceiver

Microcontroller

Hardware Design

36

The phone: Nokia N96 Non-touch screen 320x240 resolution front facing camera

The transceiver: RS232 serial connection The board uses the MAX3222 IC

Low power consumption and high data rate Requires four 0.1μF external charge pump capacitors Guaranteed 120kbps while maintaining standard RS232

levels 2 receivers and 2 drivers

Hardware Design

37

The Server: Executes the software design Run using mainly MATLAB Also needs drivers to talk to transceiver and MCU Alerts owner of intruders via email (sends picture

attachment)

Hardware Design

38

The BOE kit Parallax Board of Education (BOE) kit Comes with MCU + bread board for our custom circuit to wire

the phone with the MCU USB connection for programming and communication during

run-time. We wired the phone’s buttons to the MCU so that we can control

those buttons using the PC

Hardware Design

39

Basic STAMP 2 Module: Processor Speed: 20 MHz RAM Size: 32 Bytes Number of I/O Pins: 16 + 2 dedicated serial PBASIC Commands: 42 Package: 24-pin DIP

Hardware Design

40

When switch is on, metal spring makes contact with two wires, allowing current to flow.

Phone’s internal MCU cycles the input lines B0 – B3: If you pull B1 High (and the others low) if some key is

pressed, the voltage is transferred to the corresponding row wire, so if you get A1 as output you know that the buttons SW1 is pressed.

Hardware Design

41

We can create a current to simulate the key press as follows: General-purpose I/O pins P0-P15: each can sink 25 mA

and source 20mA. The HIGH command sets the specified pin to 1 (a +5

volt level) and then sets its mode to output. HIGH 14

Hardware Design

42

Main:HIGH 0PAUSE 500LOW 0PAUSE 500END

Hardware Design

43

The integrated hardware:

Finance

44

The cost of this project was substantially reduced because Nokia provided us with the N96

Many software are also free for students via DreamSpark (Visual Studio 2010)

Total cost came to about $250 CAD

Finance

45

The overhead cost for commercializing this product is low because it is entirely software based

We can either accept a one-time fee, or an annual subscription fee from users

Most of the expense comes from hosting dedicated servers to execute the software algorithms and storing user’s training sets (face images)

There are many dedicated hosting services available for a monthly fee of ~$100 / month, allowing us to basically rent these expensive equipment located elsewhere

Schedule

46

Keynotes: Project began last semester Research took longest, then development Documentation cost a lot of time, but well worth it

Future Work

47

Improve the algorithms Eigenfeatures - combines facial metrics, which is

measuring distance between facial features, with the Eigenface approach

Further enhance localization methods

Collaborate with Symbian to get low level OS access Symbian is the Nokia phone’s operating system, and

FASIS needs permission from the company in order for FASIS to become a reality

Setup our dedicated servers This demo uses a laptop, but the final product requires

commercial grade servers to handle thousands of users

Future Work

48

Generalize the system for other brands, not just Nokia

What We Learned

49

Professional documentation (ENSC305)

Group dynamics and team management

How to create a product from scratch From research to commercialization

Programming Low-level (Microcontroller, C) High-level (SAPI, C#, .NET) Scripting (Batch files, MATLAB)

Conclusion

50

The Face and Speech Identification System (FASIS) fills the need for a rapid secured mobile log-in solution to eliminate tedious typing on small touchscreens/keypads

Efficient while maintaining a level of security

With further improvements, we firmly believe that FASIS could become a marketable product considering the current trend in the mobile industry...

Conclusion

51

There are 200 million smart phones in the world, and this number is rising rapidly...

...even if we capture only 1% of the market, our business can become huge

Acknowledgements

52

Ali & Carlyn Excellent feedback and comments in our marked

documents

Dr. Rawicz & Mike Excellent feedback during oral progress reports The idea for voice recognition

Nokia Vancouver Simon Wong, who provided us with the phone

Microsoft Free software tools for students via the DreamSpark

program

Questions

53

?

54

Thank you

-Ztitch Solutions

Live Demonstration

55

Overview:

1. Face localization

2. FASIS: Try to authenticate real owner (Andrew Au)

3. FASIS: Try to authenticate non-face object (hand)

4. FASIS: Try to authenticate an audience member

Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.

Documents

Transcript of Face and Speech Identification System (FASIS) George Liao, Andrew Au, Ching-Hsin Chen.