CHAPTER 5 RESULTS AND DISCUSSIONS 5.1....

77

CHAPTER 5

RESULTS AND DISCUSSIONS

5.1. INTRODUCTION

This chapter presents the sequence in which a video is retrieved based

on different combinations of query input for video search.

5.2. SCHEMATIC SEQUENCE OF VIDEO RETRIEVAL

Fig. 5.1. Training ANN

78

Table 5.1. Features used for training ANN

Input to ANN Target output to ANN

Image Text Audio Video number

Frame number

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

…..

27

28

19

….

38

1 Mean (Red color map) 2 Mean (Green color map) 3 Mean (Blue color map) 4 Number of objects matching templates 5 Contrast 6 Correlation 7 Energy 8 Homogeneity 9 -28 Characters of word 29-38 Cepstrum values of portion of audio

5.3. IMAGE AS A QUERY

Step 1: A set of image is available in a folder. The image is browsed and

input as query to the proposed modules.

Step 2: The query image is enhanced, intensity adjusted. The various

objects in the image are labeled using ‘BWLABEL’. The region properties for

each object in the segmented image are obtained. The properties are

mentioned are as follows:

1. Area

2. MajorAxisLength

3. MinorAxisLength

4. Eccentricity

5. Orientation

6. Convex Area

79

7. Filled Area

8. Euler Number

9. EquivDiameter

10. Solidity

11. Extent

12. Perimeter

13. Centroid

14. Bounding Box

Objects from an image can be extracted, if closed boundaries for the objects

are present.

Objects cannot be extracted if the image contains information like

cloud, water, textured lawns etc. In such case, gray level co-occurrence

matrix properties like, contrast, energy, homogeneity are obtained from the

image.

When the objects are present in the given image, the contents of the

objects within the available bounding box is compared with the templates

present as indexing file. Hence, if circles, irregular shapes are present in the

image, they can be compared with the template.

A separate template file is created for each video. Fifty videos have

been used. Features of the Video are extracted and used as inputs to ANN

for training. The contents of each video template is presented in Table 5.2

Table 5.2. Contents of template Template Contents Video 1-50

1. Numerical values of the plots presented in column 2 of Table 3.1

2. Words presented in column 2 of Table 3.2 3. Numerical values given in the Figure 3.1

80

Category

Video name

3 different frames

Frame number

Sequence of frame numbers in increasing order

Audio name

words Gold wave screen shot

BIRDS America's Got Talent ‐ Echo of Animal Gardens V19

953

1.jpg 1.wav Parrot1

1019

2.jpg 2.wav Parrot2

1830 3.jpg 3.wav Parrot3

Genius Bird (1)v51

315 4.jpg 4.wav crows

679 5.jpg 5.wav Crow2

81

1584 6.jpg 6.wav Crow3

How to Cycle Downhill V38

1002 7.jpg 7.wav Cycle1



Dramatic 747 Take Off From Bournemouth Airport V32

30 10.jpg 10.wav Plane1

82



13 13.jpg 13.wav Peng1

Cookie the Little Penguin V2

666 14.jpg 14.wav Penguin2

83

1466 15.jpg 14.wav Penguin3

Deer Attacks dog V4

417 16.jpg 15.wav Deer1



African Lion Attack! 51

113 19.jpg 18.wav Lion1

84



Sachin Tendulkar on Frankly Speaking v53

1017 22.jpg 21.wav Sachin1

1072 23.jpg 22.wav Sachin2

1582 24.jpg 23.wav sachin

85

Ravichandran Ashwin to Umar V42

9 25.jpg 24.wav Umar1



Astronomy

Massive Diamond Planet Orbits Neutron Sta V17

07 28.jpg 27.wav Annetta1


86


Tossbow returning BOOMERANGv54

24 31.jpg 31.wav Boom1



Phone Meet the new Windows Phone_ v55

289 34.jpg 34.wav Phone1

87



MK Gandhi's Speech v56

1342 37.jpg 37.wav Gandhi1



88

Sports How To 'Panna' Football Lessonv58

365 40.jpg 40.wav Football1



Aircarft Huey Helicopter taking off at the Ulster Airshow v59

304 43.jpg 43.wav Helicopter1


89


Still Don't believe in UFO's_ v60

158 46.jpg 46.wav Ufo1



Ship cruise ship almost tips 61

9 49.jpg 49.wav Ship1

90



Robot

Russia's New Killer Robots62

9 52.jpg 52.wav Robot1



91

animals marine fish feeding_ reef aquarium63

37 55.jpg 55.wav Fish1



news Hurricane Sandy_ Super storm's Path v40

2499 58.jpg 58.wav Storm1



92

5.4. DYNAMIC TIME WARPING

The key frame identification procedure is presented in this thesis.

Speeches of four persons while reading the statement in today’s 2011

match between India versus West indies, India won by six wickets are video

recorded in the normal environment. The persons (Table 5.4) are ‘Prasanna’-

Author (‘Pra’), ‘Purushothaman’ (‘Pur’), ‘Rajeswari’ (‘Raj’), ‘Shwetha’

(‘Shwe’). This short video is embedded at different frame location of

different videos.

The statement was read two times at different instances by ‘prasanna’

and stored as (‘Pra1’) and (‘Pra2’). The recordings of ‘Purushothaman’ as

(‘Pur1’), ‘Rajeswari’ as (‘Raj1’), and ‘Shwetha’ as ‘Shwe’ were done. The

recordings are in stereo format 16 bit. Table 5.5 shows, the number of

speech recorded at two different instances.

Table 5.4. Speech combination matrix

Person

name

‘Pra1’ ‘Pra2’ ‘Pur1’ ‘Raj1’ ‘Shwe’

‘Pra1’ √ √ √ √ √

‘Pra2’ √ √ √ √ √

‘Pur1’ √ √ √ √ √

‘Raj1’ √ √ √ √ √

‘Shwe’ √ √ √ √ √

93

Table 5.5. Recordings of speech at

different instances

Person name Speech1 Speech2

‘Prasanna’ √ √

‘Purushothaman’ √

‘Rajeswari’ √

‘Shwetha’ √

Table 5.6. Plotting combinations Person Name

Comparison of matching Score

DTW Error Plot

Pra1 √ √ Pra2 √ √ Pur1 √ √ Raj1 √ √ Shwe √ √

In Row 1 of Table 5.6, ‘Pra1’ has been kept as reference which is

available in the recorded video. Speech of ‘Pra2’, ‘Pur1’, ‘Raj1’, ‘Shwe’ has

been tested to retrieve ‘Pra1’.

In Row 2 of Table 5.6, Pra2’ has been kept as reference which is

available in the recorded video. Speech of ‘Pra1’, ‘Pur1’, ‘Raj1’ and ‘Shwe’

has been tested to retrieve ‘Pra2’.

In Row 3 of Table 5.6, ‘Pur1’ has been kept as reference which is

available in the recorded video. Speech of ‘Pra1’, ‘Pra2’, ‘Raj1’,’Shwe’ has

been tested to retrieve ‘Pur1’.

In Row 4 of Table 5.6, Raj1 has been kept as reference which is

available in the recorded video. Speech of ‘Pra1’, ‘Pra2’, ‘Pur1’, ‘Shwe’ has

been tested to retrieve ‘Raj1’.

94

In Row 5 of Table 5.6, ‘Shwe’ has been kept as reference which is

available in the recorded video. Speech of ‘Pra1’,’Pra2’,’Pur1’,’Raj1’ has been

tested to retrieve Pra1.

0 0.5 1 1.5 2 2.5

x 104

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

Samples

Am

plitu

de

Prasanna1Prasanna2Puru1Rajeswari1Shwetha1

Fig. 5.2. Speech of four candidates

Figure 5.2 shows the plots of speeches of all the 4 candidates.

95

500 1000 1500

200

400

600

800

1000

1200

1400

1600

1800

Prasanna vs prasanna , same recording

500 1000 1500

200

400

600

800

1000

1200

1400

1600

1800

Fig. 5.3. DTW matching score (‘Pra1’-‘Pra1’)

Figure 5.3 presents the matching score of ‘Prasanna’ speech1 with the

same speech to show that the matching score is perfect.

96

500 1000 1500

200

400

600

800

1000

1200

1400

1600

1800

Prasanna1 vs Prasanna2, different recording

500 1000 1500

200

400

600

800

1000

1200

1400

1600

1800

Fig. 5.4. DTW matching score

If speech 2 of ‘Prasanna’ is used to reterive the ‘Prasanna’ frame from

the videos, then the matching score is deviating as shown in Figure 5.4.

97

500 1000 1500

200

400

600

800

1000

1200

1400

1600

1800

Prasanna1 vs Puru1, different recording

500 1000 1500

200

400

600

800

1000

1200

1400

1600

1800


If speech 1 of ‘Purushothaman’ is used to reterive the ‘Prasanna’ frame

from the videos, then the matching score is deviating as shown in Figure

5.5.

98

500 1000 1500

200

400

600

800

1000

1200

1400

1600

1800

Prasanna1 vs Rajeswari1, different recording

500 1000 1500

200

400

600

800

1000

1200

1400

1600

1800


If speech 1 of ‘Rajeswari’ is used to reterive the ‘Prasanna’ frame from


99

500 1000 1500

500

1000

1500

2000

2500

shwetha1 vs Prasanna1,different recording

500 1000 1500

500

1000

1500

2000

2500


If speech 1 of ‘Shwetha’ is used to reterive the ‘Prasanna’ frame from


100

5.4.1.Retrieval of ‘Prasanna’ with ‘Prasanna-1’ speech versus other

candidates

0 500 1000 1500 20000

500

1000

1500

2000

WaveFile-1

Wav

eeFi

le-2

Prasanna1-Prasanna1Prasanna1-Prasanna2Prasanna1-Puru1Prasanna1-Raj1Prasanna1-Shwe

Fig. 5.8. Comparisons of matching score

Figure 5.8 presents the matching scores of all the four candidates. The

blue color line indicates a perfect matching if the speech 1 of ‘Prasanna’ is

used for frame retrieval. There is a deviation of the matching scores if other

three person’s speeches are used to retrieve speech1 of ‘Prasanna’.

101

0 500 1000 1500 2000-100

0

100

200

300

400

Frames

Err

or


Fig. 5.9. DTW Error plot

Figure 5.9 presents the amount of deviations present in the speech of

all the four candidates when compared to the first candidate speech. X-axis

represents the frame with 512 samples each and Y-axis represent amount of

deviations relatively with respect to the reference value ’0’. There is lot of

deviation for ‘Prasanna–Rajeswari’ and ‘Prasanna–Shwetha’. The speech

utterances of ‘Rajeswari’ and ‘Shwetha’ cannot be used to retrieve the

frames of ‘Prasanna’ as the matching is not within the limit. However,

speech of ‘Purushothaman’ can be used to retrieve the frames of ‘Prasanna’.

102

5.4.2. Retrieval of ‘Prasanna’ with ‘Prasanna-2’ speech versus other

candidates

0 200 400 600 800 1000 1200 1400 1600 18000

200

400

600

800

1000

1200

1400

1600

1800

WaveFile-1

Wav

eeFi

le-2


Fig. 5.10. Comparisons of matching score for ‘Prasanna-2’-other

candidates

Figure 5.10 presents the matching scores of all the four candidates.

The green color line indicates a perfect matching if the speech 2 of

‘Prasanna’ is used for frame retrieval. There is a deviation of the matching

scores if other three person’s speeches are used to retrieve speech1 of

‘Prasanna’.

103

Fig. 5.11. DTW error plot for ‘Prasanna-2’ and other candidates

Figure 5.11 presents the amount of deviations present in the speech of

all the four candidates when compared to the ‘Prasanna-2’ speech.

104

5.4.3. Training RBF

1 2 3 4 5 6 7 8 9 100

20

40

60

80

Number of centres in RBF

Per

cent

age

of

wor

ds r

ecog

nize

d

Fig. 5.12. Impact of number of centers in training the audio files

Figure 5.12 shows the percentage of words recognized for different

number of centers. Each center corresponds to a word pattern.

105

5.4.4. Testing RBF

1 2 3 4 5 6 7 8 9 100

2

4

6

8

10

Words

oupu

t of

RB

F

Rbf targetRbf outputError

Fig. 5.13.Performance of RBF for word matching

Figure 5.13 presents graph for the performance of RBF in matching the

audio. The blue dotted line shows the matching of the relevant words. The

black color line shows the error between the target and the outputs

obtained.

106

5.5. VIDEO RETRIEVAL

5.5.1. Only image is used as input query

0

10

20

30

40

50

60

1 5 9 13 17 21 25 29 33 37 41 45 49

Exp

ecte

d vi

deo

retr

ieve

d

Random similar frame from each video

BPA

RBF

Fig. 5.14. Video retrieved for given random image

Figure 5.14 shows the performance of RBF and BPA for retrieval of video

given a random frame of image not used for training the ANN.

107

Fig. 5.15. Video retrieved for the frame used in training

Figure 5.15 shows the performance of RBF and BPA for retrieval of

video given a same frame of image used for training the ANN.

108

5.5.2. Only plain text is used as input query

Fig. 5.16. Video retrieved for given random text in the video


video given a random text not used for training the ANN.

109

Fig. 5.17. Video retrieved for the text used in training


video given a same frame of text used for training the ANN.

110

5.5.3. Only audio is used as input query

Fig. 5.18. Video retrieved for given random audio in the video


video given a random audio not used for training the ANN.

111

Fig. 5.19. Video retrieved for the audio used in training


video given a same audio used for training the ANN.

112

5.5.4. Multimodel approach

Fig. 5.20. Video retrieved for the combination of image + text+ audio used in training

Figure 5.20 shows the performance of RBF and BPA for retrieval of video

given a combination of image + text + audio used for training the ANN.

5.6. SUMMARY

This chapter has presented the performance of ANN algorithms in

retrieving selected videos using image, text, audio as inputs each

separately. The performance of ANN algorithms are presented for a

combined image, text and audio as input query for retrieving an expected

video.

CHAPTER 5 RESULTS AND DISCUSSIONS 5.1....

Documents

Transcript of CHAPTER 5 RESULTS AND DISCUSSIONS 5.1....