Video-Based Face Spoo ngrocha/teaching/2012s2/mo447/... · 2012. 9. 11. · Summary 1 / 39...

Post on 07-Sep-2021

1 views 0 download

Transcript of Video-Based Face Spoo ngrocha/teaching/2012s2/mo447/... · 2012. 9. 11. · Summary 1 / 39...

Summary

1 / 39

Video-Based Face Spoofing

Detection through Visual Rhythm

Analysis

Allan S. Pinto1 Helio Pedrini1 WilliamSchwartz 2 Anderson Rocha1

1Institute of ComputingUniversity of Campinas

2Department of Computer ScienceUniversidade Federal de Minas Gerais

XXV SIBGRAPI - Conference on Graphics,Patterns and Images

Summary

2 / 39

Summary

1 Introduction and Motivation

2 Contributions

3 Related Work

4 Proposed Method

5 Experiments

6 Results

7 Conclusion and Future Work

8 Acknowledgment

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

3 / 39

Introduction and Motivation

What is biometrics?

Technology to recognize human of automatic andunique mannerFingerprint, geometric and vein of the hand, face,iris, voice, etc.

Recent advances in the area of pattern recognitionapply in face recognition

Access control, surveillance and criminalidentification, etc.

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

4 / 39

Introduction and Motivation

However, several attack techniques have beendeveloped to deceive the biometric systems

Attacks can occur:

By manipulation of scores of the recognitionsystemWhen a person tries to masquerade as someoneelse falsifying the biometric data that are capturedby the acquisition sensor

Spoofing Attack

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

5 / 39

Introduction and Motivation

But in practice, what is easier? Manipulation ofthe scores or present a biometric fake data foracquisition sensor?

showing a photography of a valid usershowing a video of a valid usershowing a 3D facial model of a valid user

Our face is the biometric data more exposed

Download on Facebook (photo), YouTube (video),Personal Website (photo)

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

6 / 39

Contributions

First method proposed for the video-basedspoofing attack detection

Creation of a dataset (available upon acceptance a)composed of 700 videos

100 videos of valid access600 videos of fake access attemptsAll videos with 640× 480 pixel resolution and 25fps

a http://www.ic.unicamp.br/∼rocha/pub/communications.html

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

7 / 39

Contributions

Creation of the robust and simple method that canbe easy embedded in a biometric system inoperation

Can execute parallel to recognition system,requiring less time to validate access

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

8 / 39

Related Work

There are many works to solve the photo-basedspoof attack detection

The methods seek to find differences between areal and fake biometric data

Based on attribute of the images as texture, color,light reflection, optical flow analysis, among othersTopic quite explored

Competition on counter measures to 2-D facial spoofingattacks

In this competition, we were the second best groupof researchers in the world, with only one missclassification a

aW. R. Schwartz, A. Rocha, and H. Pedrini, “Face Spoofing Detection through

Partial Least Squares and Low-Level Descriptors,” in Intl. Joint Conference onBiometrics, Oct. 2011, pp. 1–8.

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

9 / 39

Related Work

We can categorize currents anti-spoofing methodsin four non-disjoint groups

Data-driven characterizationUser behavior modelingUser interaction needPresence of additional devices

Non-intrusive methods without extra devices andhuman involvement may be preferable

Could be easily integrated into an existingbiometric system, where usually only a genericwebcam is deployed

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

10 / 39

Proposed Method

MotivationThere are artifacts that are added to the biometricsamples during the viewing process of the videosin the display devices

Distortion, flickering, moiring, among others

There are noise signatures that are added duringthe recapture process

Our hypothesis is that both noise and artifacts aresufficient to detect the face liveness

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

11 / 39

Overview

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

12 / 39

Step One

Firstly, we calculate the noise residual video(Vnoise) for all videos in training set

Filtering Process

V(t)noise = V (t) − f(V (t)

copy) ∀ t ∈ T = {1, 2, . . . , t},(1)

where V (t) ∈ N2 is the t-th frame of V and f a filteringoperation.

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

13 / 39

Step Two

Next, we calculate the Fourier spectrum in thelogarithmic range and with origin at the center ofthe frame of all noise residual video (Vnoise)

2D Discrete Fourier Transform

F(υ, ν) =

M−1∑x=0

N−1∑y=0

V(noise)(x, y)e−j2π[(υx/M)+(νy/N)] (2)

Fourier Spectrum

|F(υ, ν)| =√

R(υ, ν)2 + I(υ, ν)2

S(υ, ν) = log(1 + |F(υ, ν)|) (3)

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

14 / 39

Step Two

Example of Fourier spectrum video frame

(a) Valid video

(b) Attack video consid-ering a Gaussian filter

(c) Attack video consid-ering a Median filter

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

15 / 39

Step Three

We calculate visual rhythms of each Fourierspectrum video

Visual Rhythm is a technique that can capture thetemporal information and summarize the videocontents in a singe image

Considering a video V in the domain 2D + t with tframes of dimension M ×N pixels, the visualrhythm is a simplification of the video V

lines or columns of each frame t are sampled andconcatenated to form a new image, called visualrhythm

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

16 / 39

Step Three

Example of a visual rhythm

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

17 / 39

Step Three

Visual Rhythm

Two types of visual rhythm is generate for eachvideo

Vertical visual rhythm, formed by the centralvertical linesHorizontal visual rhythm, formed by the centralhorizontal lines;

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

18 / 39

Step Three

Example of horizontal visual rhythms (rotated in 90degrees)

(d) Valid video (e) Attack attempt video

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

19 / 39

Step Three

Example of vertical visual rhythms

(f) Valid video (g) Attack attempt video

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

20 / 39

Step Four

Visual Rhythm as a Texture Map

Gray-level co-occurrence matrices (GLCM) toextract textural information of the visual rhythmA GLCM is a structure that describes thefrequency of occurrence of gray levels betweenpairs of pixels at a distance d = 1 in a givenorientation θ ∈ {0◦, 45◦, 90◦, 135◦}

We extract 12 measures summarizing texturalinformation from four matrix

angular second moment, contrast, correlation, sumof squares, inverse difference moment, ...

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

21 / 39

Step Four

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

22 / 39

Step Four

angular second moment:∑G−1

i=0

∑G−1i=0 p(i, j)2

correlation:∑G−1

i=0

∑G−1i=0

ijp(i,j)−µxµy

σxσy

contrast:∑G−1

i=0

∑G−1i=0 (i− j)2p(i, j)

...

where p is the hd,θ matrix normalized

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

23 / 39

Step Five

Machine Learning

We use two machine learning technique for classifythe patterns that are extracted from the visualrhythms using the texture descriptor GLCM

Partial Least Square (PLS)Support Vector Machine (SVM)

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

24 / 39

Dataset Creation

Extension upon of Print-Attack Dataset

200 videos of valid access200 videos of spoof attacks using printedphotographsAll videos with 320× 240 pixel resolution

Creation of Attack attempt video

All videos that represent a valid access were upsample to 640× 480 pixel resolutionShown in 6 monitors and captured with a SonyCyberShot digital camera

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

25 / 39

Dataset Partitioning

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

26 / 39

What is the Influence of the Monitors?

To verify the influence of the monitors under ourmethod we performed the experiments as follow:

Train with Real 1 and Fake 1 groups and test withReal 2 and Fake 2 groupsTrain with Real 2 and Fake 2 groups and test withReal 1 and Fake 1 groups

Finally, we calculate the average and standarddeviation

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

27 / 39

Analysis of the Filtering Process and

Visual Rhythm

Filtering process analysis

We use either Gaussian or Median filter (linearand non-linear filter, respectively) in the filteringprocess

Median with size of 3× 3Gaussian with σ = 2 and size of 3× 3

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

28 / 39

Analysis of the Filtering Process and

Visual Rhythm

Visual rhythm analysis

The visual rhythms were calculate using the first 2seconds (50 frames)

Vertical visual rhythm: 30 columns of pixelsHorizontal visual rhythm: 30 rows of pixels

We did experiments using the horizontal andvertical visual rhythms separated and combined

Table 1: Number of features (dimensions) using either thedirect pixel intensities as features or the GLCM-basedtexture information features.

Descriptor DimensionalityNome

Horizontal Vertical Horizontal + Vertical

Pixel Intensity 960,000 720,000 1,680,000

GLCM 48 48 96

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

29 / 39

Classification Techniques

Partial Least Square (PLS)

We did experiments considering different numberof factors (the only parameter of this method)

Support Vector Machine (SVM)

K(xi, xj) = xTi xj (Linear kernel)

K(xi, xj) = eγ||xixj ||2 , γ > 0 (RBF kernel)Grid Search for tuning the parameter C and γ

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

30 / 39

Results

Table 2: Obtained results in terms of area under the receiveroperating characteristic curve (AUC) considering the SVMclassification technique and Gaussian filter. SVM was notable to calculate a classification hyperplane when usingdirect pixel intensities as features.

Type of Visual SVM Linear SVM RBF

Rhythms Intensity GLCM Intensity GLCM

– x = 98.4% – x = 99.9%Vertical

– σ = 1.60% – σ = 0.10%

– x = 99.6% – x = 99.7%Horizontal

– σ = 0.50% – σ = 0.10%

Horizontal – x = 100.0% – x = 100.0%

and Vertical – σ = 0.0% – σ = 0.0%

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

31 / 39

Results

Table 3: Obtained results in terms of AUC considering theSVM classification technique and Median filter. SVM wasnot able to calculate a classification hyperplane when usingdirect pixel intensities as features.

Type of Visual SVM Linear SVM RBF

Rhythms Intensity GLCM Intensity GLCM

– x = 99.7% – x = 99.6%Vertical

– σ = 0.20% – σ = 0.10%

– x = 99.9% – x = 100.0%Horizontal

– σ = 0.10% – σ = 0.0%

Horizontal – x = 100.0% – x = 100.0%

and Vertical – σ = 0.0% – σ = 0.0%

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

32 / 39

Results

Table 4: Obtained results in terms of AUC considering thePLS classification technique and Gaussian filter.

Type of Visual PLS

Rhythm Intensity GLCM

x = 99.9% x = 98.2%Vertical

σ = 0.20% σ = 0.40%

x = 100.0% x = 98.9%Horizontal

σ = 0.0% σ = 1.50%

Horizontal x = 100.0% x = 99.9%

and Vertical σ = 0.0% σ = 0.10%

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

33 / 39

Results

Table 5: Obtained results in terms of AUC considering thePLS classification technique and Median filter.

Type of Visual PLS

Rhythm Intensity GLCM

x = 100.0% x = 99.5%Vertical

σ = 0.0% σ = 0.70%

x = 100.0% x = 99.9%Horizontal

σ = 0.0% σ = 0.10%

Horizontal x = 100.0% x = 100.0%

and Vertical σ = 0.0% σ = 0.0%

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

34 / 39

Summary

The visual rhythm calculated on a logarithmicscale Fourier Spectrum

Effective alternative to summarize videos and animportant forensic signature for detectingvideo-based spoofing

The filtering process do not have influence in ourmethod

The obtained results using the Median andGaussian filter are statistically comparable

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

35 / 39

Summary

The monitors do not have influence in our method

Although the standard deviation showed in Table 2 is1.60% and 0.50% using vertical and horizontal visualrhythms, respectively

The combination of these features resulted in a perfectclassification (100.0%± 0.0)

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

36 / 39

Summary

The monitors do not have influence in our method

Although the standard deviation showed in Table 4 is1.50% and 0.40% using vertical and horizontal visualrhythms, respectively

The combination of these features resulted in a nearlyperfect classification (99.9%± 0.10%)

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

37 / 39

Conclusion and Future Work

Fourier spectrum of video noise signatures and theuse of visual rhythms

Able to properly capture discriminativeinformation to distinguish between valid and fakeusers for video-based spoofing

The extraction of feature descriptors with GLCMprovided a compact representation while keepingthe method discriminability

Many classification techniques have memoryallocation problems when dealing withhigh-dimensional feature spaces

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

38 / 39

Conclusion and Future Work

Finally, directions for future work include

The exploration of new video summarizationapproaches as well the use of more monitors andreal videosAdditional tests could be performed consideringtablets and smart phonesThe investigation of illumination influences on theproposed methodNew experiments upon a new Dataset (Videos inFull High Definition quality)

Introduction andMotivation

Contributions

Related Work

Proposed Method

Experiments

Results

Conclusion andFuture Work

Acknowledgment

39 / 39

Acknowledgment