10.1007_s11042-006-7715-8

Gradual shot boundary detection using localizededge blocks

Hun-Woo Yoo & Han-Jin Ryoo & Dong-Sik Jang

# Springer Science + Business Media, LLC 2006

Abstract A new algorithm for gradual shot boundary detection is proposed in thispaper. The proposed algorithm is based on the fact that most of gradual curves canbe characterized by variance distribution of edge information in the frame sequences.Average edge frame sequence is obtained by performing Sobel edge detection.Features are extracted by comparing variance with those of local blocks in theaverage edge frames. Those features are further processed by the opening operationto obtain smoothing variance curves. The lowest variance in the local frame sequenceis chosen as a gradual detection point. Experimental results show that the proposedmethod provides 87.0% precision and 86.3% recall rates for six selected videos.

Keywords Gradual shot detection . AGI (Average Gradient Image) . Variance .

Parabolic curve . Local blocks . Opening

1. Introduction

Digital video is becoming an increasing common data type in the new generation ofmultimedia databases. Many broadcasters are switching to digital formats forbroadcasting, and some of them already have a significant amount of video materials

Multimed Tools Appl (2006) 28: 283–300DOI 10.1007/s11042-006-7715-8

H.-W. Yoo (*)Center for Cognitive Science,Yonsei University,134 Shinchon-Dong, Seodaemun-Ku, Seoul 120–749, South Koreae-mail: [email protected]

H.-J. RyooDepartment of Electronics and Computer Engineering,Korea University,Sungbuk-gu Anam-Dong 5 Ga 1, Seoul 136–701, South Koreae-mail: [email protected]

D.-S. JangDepartment of Industrial Systems and Information Engineering,Korea University,Sungbuk-gu Anam-Dong 5 Ga 1, Seoul 136–701, South Koreae-mail: [email protected]

Springer

available in digital formats for previewing. Improved compression technologies andincreased Internet bandwidth have made a webcasting a real possibility. The ever-growing amount of digital videos pose new challenges, both of the storage andaccess, as vast repositories are being built at an increasing pace.

A key step for managing a large video database is to segment the video sequencesinto shots. Video segmentation makes the video data more manageable by imposingon it a hierarchy. It also forms the first step to understanding video content bydividing it into shots on which content analysis can be performed.

This segmentation process is generally referred to as shot boundary detection. Ashot is a sequence of frames generated during a continuous camera operation andrepresents a continuous action in time and space. Video editing procedures produceabrupt and gradual shot transitions. A cut is an abrupt shot change that occurs in asingle frame. A gradual change occurs over multiple frames and is the product offade-ins, fade-outs, or dissolves (where two shots are superposed). Figures 1 and 2show examples of abrupt and gradual changes.

There have been tremendous works reported in past few years on shot boundarydetection in the literature [1–16, 18–25]. Earlier works concentrate mainly on anabrupt cut. Therefore, recent related works geared toward gradual shot boundarydetection. The detection of gradual changes is more difficult than that of abruptcuts. This is because a difference sequence is temporally well separated for cuts,whereas, it is not at any time of the sequence for gradual changes.

Fig. 1 Abrupt shot changes (cuts)

Fig. 2 Gradual shot changes (a) fade-in; (b) fade-out; (c) dissolve

284 Multimed Tools Appl (2006) 28: 283–300

Springer

In this paper, a new gradual scene detection algorithm is proposed. The proposedalgorithm is based on the fact that most of gradual curves can be characterized bythe variance distributions of edge information in the frame sequences and duringdissolve will show parabolic shapes. We first obtain average edge frame sequence byapplying a Sobel operation to original frame sequence and extract feature sequenceshowing a distinct parabolic variance curve by comparing full frame variance sequencewith those of local nine sub-blocks in the average edge frames. This feature sequence isfurther processed by the opening operation to obtain a smoothing curve. The localminimum in a certain size of sliding window is chosen as a gradual detection point.

Our contributions in this paper are threefold. First, in theoretical gradual(dissolve) transitions, consecutive variances over frames are shown like a paraboliccurve. However, in an actual case, due to the noises and motions in a video, avariance graph is not sufficiently pronounced. Our method approximates the vari-ance sequence to an ideal curve by obtaining most distinct parabolic sequenceextracted from local regions of the video frames. Second, the proposed algorithmachieves robust detection by smoothing the saw-like variance sequence through themorphological opening operation and a time-local analysis with a certain size ofsliding window. Third, it is tested on video data that its performance is shown to bemore accurate and reliable when compared with two commonly used algorithms—DCD and twin comparison.

2. Related researches on gradual changes

& Twin comparison method [24]. It is the first attempt to detect and classifyabrupt and gradual changes. In the approach, dual threshold values are appliedto the difference of intensity histogram in order to detect gradual transitions.The method requires two thresholds: higher one, Th, for detecting cuts and alower one, T1, for detecting gradual transitions. First, the threshold Th is usedto detect high discontinuity values corresponding to cuts, and then the thresholdT1 is applied to the rest of the discontinuity values. If a discontinuity value ishigher than T1, it is considered to be the start of the gradual transition. At thatpoint, the summation of consecutive discontinuity values starts and goes on untilthe cumulative sum exceeds the threshold Th. Then, the end of the gradualtransition is set at the last discontinuity valued included in the sum. However,one of the major problems in this approach is that many false positives can begenerated when thresholds are not properly assigned.

& Plateau detection [22]. Yeo and Liu noted that the comparison based solely onsuccessive frames will not be adequate for the detection of gradual transitions.They used the difference between a current frame and a following kth frame.It obtains first the sequence of delayed inter-frame distances Dk

i ¼ dðXi;XiþkÞ� �

.If we choose k greater than the length of the gradual transition, the sequence

Dki

� �exhibits a plateau of maximal width. A significant plateau at location i is

characterized by a sequence of similar values Dkj ; j ¼ i� s; :::; iþ s, which are

consistently higher than the preceding or successive values. The value of s isproportional to the difference between k and the transition length. The methodapplies to linear and nonlinear gradual transitions; it is the shape of the rises andfalls at the plateau boundaries.

Multimed Tools Appl (2006) 28: 283–300 285

Springer

& Algorithm by Meng et al. [14]. In a compressed domain, an intensity variance ofsuccessive frames is used to detect gradual changes. This method exploits theDCT DC coefficients and motion vectors. Theoretically, as most dissolves show aparabolic shape, the authors tried to use the depth and width of that curve.However, in actual cases, due to the noises and motions in a video, the graph isnot sufficiently pronounced.

& Algorithm by Song et al. [18]. A chromatic video edit model for gradualtransitions is built based on the assumption that discontinuity values belongingto such a transition form a pattern consisting of two piece-wise linear function oftime, one decreasing and one increasing. Such linearity does not apply outsidethe transition area. The authors search for close-to-linear segments in the seriesof discontinuity values by investigating the first and second derivative of theslope in time. A close-to-linear segment is found if the second derivative is lessthan a pre-specified percentage of the first derivative.

& Feature-based detection [23]. This algorithm is based on calculating edgechange fraction in temporal domain. During a cut or a dissolve, new intensityedges appear far from location of old edges. Edge pixels that appear/disappearfar from existing edge pixels are considered as entering/exiting edge pixels.Cuts, fades, and dissolves can be detected by counting the entering and exitingedge pixels, while wipes can be detected by looking at their spatial distribution.The algorithm is based on the following steps:

1. Frames Ft and Ft+1 are aligned using a global motion compensationalgorithm.

2. Edges are computed by applying the Canny algorithm to a smoothed versionof the frames.

3. The binary edge maps are dilated by radius r, so that the condition on themutual distance of edge pixels can be easily verified by set intersection.

4. The fraction of entering edge pixels rin and exiting pixels rout are computed.Shot changes are detected by looking at the edge change fraction r = max

( rin, rout). A cut leads to a single isolated high value of r while the otherscene breaks lead to an interval where r’s value is high. During a fade-inthe value >in is much higher than rout. The reverse happens for fade-outs.A dissolve is characterized by a predominance of rin during the first phaseand rout during the second phase. The technique works properly also onheavily compressed image sequence. This approach presents high accuracy,but it takes a large amount of computation time.

& Algorithm by Truong et al. [20]. It tried to improve cut detection accuracy byutilizing an adaptive threshold computed from a local window on the luminancehistogram difference curve. Also, based on the mathematical models forproducing ideal fades and dissolves, the existence of these effects wereexamined. In that procedure, constraints on the characteristics of frameluminance mean and variance curves were derived to eliminate false positivescaused by camera and object motions during gradual transitions.

& Detection based on spatio-temporal distribution of the macro block types [9]. Itperformed dissolve detections based on the spatio-temporal distribution of themacro block types in MPEG-compressed videos. The ratio of forward macroblocks in the B-type frames and the spatial distribution of forward/backward


Springer

macro blocks is utilized for detecting dissolve changes. After finding suchsequence of frames two heuristic rules are applied:

1. The global color distributions of the frames at which the dissolve starts andterminates are very different.

2. The duration of a dissolve transition is typically more than 0.3 s.

& Machine learning approach [12]. A novel dissolve detection algorithm usingmachine learning and multi resolution concept was proposed. The approach isless concerned about actual features used for dissolve detection, but more with ageneral framework for recognizing gradual transitions. First, a huge number ofdissolve examples are created from a given video database using a dissolve synthe-sizer. Then these examples are used to train a heuristically optimal classifier whichis then employed in a multi-resolution search for dissolves of various durations.

& DCD method [13]. Variance, gradient magnitude, and double chromaticdifference (DCD) of image sequence were used for dissolve detection. The firststep of the DCD segments the video into non-overlapping categories of Bpotentialdissolve’’ and Bnon-dissolves’’ using edge-based or pixel-based statistics. Thesecond step of the DCD detector uses this segmentation to define one syntheticdissolve per potential-dissolve segment, beginning and ending at the first and lastframe of the segment, respectively. From these starting and ending frames, thecenter frame of a synthetic dissolve is formed and compared to the interveningfootage. If the shape of the comparison error over time is parabolic shaped, thepotential-dissolve segment is accepted.

Other algorithms related to gradual transition detection are found in [3, 4]and good surveys can be found on [1, 2, 5–7, 11, 19]. Table 1 is the summary ofexisting and proposed algorithms.

3. Abrupt shot boundary detection

In order to detect the gradual shot boundary, an abrupt cut is detected first. Anexistence of gradual shot changes is examined between neighboring cuts. A full-decoded MPEG video sequence is used to achieve more accurate detection. In thispaper, we used a histogram correlation metric to detect cuts as follows.

Let mk, Ak be the average and variance of the pixel intensities in kth frame. Then,the inter-frame correlation between two consecutive frames k, and (k+1) is des-cribed in the following.

Cor k; kþ 1ð Þ ¼

PH�1

i¼0

PW�1

j¼0

Xk i½ � j½ � �mkð Þ X kþ1ð Þ i½ � j½ � �m kþ1ð Þ� �

�k� kþ1ð Þ

� 1 � Cor k; kþ 1ð Þ � 1

ð1Þ

where, W and H are the width and height of a frame, and Xk[i][j] is the pixelintensity at (i, j) coordinate in kth frame.

If Cor (k, k+1) is under certain threshold, i.e., low correlation, the associated framek + 1 is declared as a cut. It is contrary to a general frame difference metric wherethe cut is declared if the difference exceeds certain threshold, i.e., high difference.


Springer

4. Gradual shot boundary detection

The gradual shot boundary tends to have a high correlation between consecutiveframes but accumulates small changes over multiple frames as time passes, whichresult in a different shot, i.e., occurrence of gradual change. It has no distinctcharacteristics between two successive frames. In an ideal dissolve case, a varianceof pixel intensities in a frame is distributed over the frames as figure 3(a) [14]. It

Table 1 Summary of existing and proposed gradual shot boundary detection algorithms

Name of methods Approach

Twin comparison [24] A lower threshold T1 for cuts and a higher threshold Th for gradual

boundaries are used.

Many false positives can be generated when thresholds are not

properly assigned.

Plateau detection [22] Plateaus detection is performed in k-interval difference sequence.Dk

i ¼ d Xi;Xiþkð ÞAlgorithm by Meng

et al [14]

Assuming the intensity variance of successive frames show a

parabolic shape, it tries to detect the depth and width of the

curve.

However, in actual cases, the ideal parabolic curve is not

pronounced.

Algorithm by Song et al [18] It searches for close-to-linear segments in the series of

discontinuity values by investigating the first and second

derivative of the slope in time. A close-to-linear segment is

found if the second derivative is less than a pre-specified

percentage of the first derivative.

Feature-based detection [23] Gradual changes are detected by examining the fraction of existing

edge pixels. r = max (rin, rout)

It presents high accuracy, but takes a large amount of computation

time.

Algorithm by Truong et al

[20]

Based on the mathematical models for producing ideal fades and

dissolves, the existence of gradual changes are examined.

However, in actual cases, the ideal cases do not exist.

Detection based on

spatio-temporal

distribution of the macro

block types [9]

The ratio of forward macro blocks in the B-type frames and the

spatial distribution of forward/backward macro blocks is

examined.

Two heuristic rules are applied:

&Global color distributions of the frames at which the dissolve

starts and terminates are very different.

&Duration of a dissolve transition is typically more than 0.3 s.

Machine learning method

[12]

Many dissolve examples are trained to obtain a heuristically

optimal classifier.

DCD method [13] It is based on the fact that variance, gradient magnitude, and

double chromatic difference (DCD) of sequence show a

parabolic-like shape.

However, in actual cases, the ideal parabolic curve is not pronounced.

Proposed method It approximates the variance sequence to an ideal curve by

obtaining most distinct parabolic sequence extracted from local

frame regions.

It smoothes the saw-like variance sequence through morphological

opening and a local sliding window to achieve robust detection.


Springer

looks like a parabolic curve. A fade-in, where new scene is gradually shown up withthe increase of pixel intensity may be like figure 3(b) and a fade-out, where scene isgradually disappeared with the decrease of pixel intensity be like figure 3(c).

Some of earlier gradual detection methods are performed on compresseddomain. These methods have an advantage of fast detection since a full-decodedprocedure was not necessary. However, lost information by using compressed datayields a distorted parabolic shape and is an obstacle to obtain robust detection.Other method has used a mean of pixel intensities in a full-decoded frame since itcan provide little distorted sequence. However, this method has a drawback that itcan search the transition point only when the mean frame sequence has highdifference among neighboring frames. In fact, generally, video sequence is notfollowing the ideal case of figure 3 due to the noises and camera motions. Hence, inthis paper, we try to approximate frame difference sequence to an ideal curve byusing Baverage edge frames’’ to obtain more robust gradual detection.

4.1. Average edge image

An average edge image is a reconstructed image using only pixels, which haveintensities more than an average intensity of a Sobel edge-detected image [17].Sequence of these images has distinct and smooth variance distribution compared tothat of gray images as shown in figure 4. This is somewhat similar to the effectiveaverage gradient (EAG) in [13]. The extraction of the average edge image is by thefollowing steps.

Step 1: Convert a color image (frame) to a gray image (frame).

YðLuminanceÞ ¼ 0:299Rþ 0:587Gþ 0:114B ð2Þ

Fig. 3 Variance distribution over gradual changes (a) dissolve; (b) fade-in; (c) fade-out


Springer

where, Y is an intensity in the gray image and R, G, and B are red, green,and blue components in a RGB color image.

Step 2: Obtain the edge image by applying a Sobel edge mask to the original imagewith threshold 100. A detailed edge detection procedure is explained infigure 5.

fGradientðx; yÞ ¼f ðx; yÞ; if f ðx; yÞ � 100

0; if f ðx; yÞ � 100

�ð3Þ

where, f (x, y) is a gray value at coordinate (x, y) and fGradient (x, y) is a grayvalue at coordinate (x, y) after applying a threshold.

Step 3: Compute an AG (Average Gradient).

AG ¼X

x;y

fGradientðx; yÞ=X

x;y

pðx; yÞ ð4Þ

where, pðx; yÞ ¼ 1; if fGradientðx; yÞ � 00; if fGradientðx; yÞ ¼ 0

�

Fig. 4 Variance distribution in the video source of Missing_You.mpg (a) distribution of a gray image(b) distribution of an average edge image


Springer

Step 4: Extract an average edge image using the average gray value (AG) as a newthreshold.

fAGðx; yÞ ¼fGradientðx; yÞ; if fGradientðx; yÞ � AG0; if fGradientðx; yÞ � AG

�ð5Þ

4.2. Feature extraction

In order to maximize a property of a gradual transition, we extract nine variancesfrom nine equal-sized, non-overlapping blocks (see figure 6) in the average edgeimage. The reason for computing variances of localized blocks is that we try toobtain a new distinct sequence, which shows more gradual change properties thanthat of an overall frame variance. Complexity of contents (for example, edgeinformation in our research) will be different according to the spatial location withina frame. Hence, we search the blocks, which maximize the depth and width ofparabolic curve. For example, figure 7 shows variance sequences of overall frameand three sub-blocks. Distinct gradual sequence is obtained first by computing

Fig. 5 Sobel edge detection: (a) Sobel mask operation; (b) applying direction of the mask on anoriginal image; (c) Sobel mask for x–y directions

Fig. 6 Sub-block image


Springer

maximum and minimum difference sequences between overall and each block usingEq. (6) and by intersecting two sequences using Eq. (7).

Smax ¼ max jTk � Skijð Þ; Smin ¼ minðjTk � SkijÞ ð6Þ

AGIðT; SÞ ¼ minðSmax; SminÞ ð7Þ

where, Tk is variance of kth overall frame, and Ski is variance of ith block in kth

frame.Equations (6) and (7) try to find the block that maximizes the depth and width of

a parabolic curve in each frame and take variance of corresponding block forcomputing the gradual point. We refer the result sequence to as AGI (AverageGradient Image) sequence. The result AGI sequence is shown in figure 8. Thissequence shows more distinct parabolic shape compared with the variance sequenceof frames.

Fig. 7 Variance distribution of the average edge image and the three sub-blocks (Missing_You.mpg)

Fig. 8 Sequence of AGI (Missing_You.mpg)


Springer

4.3. Computation of local variance

We now have to pick out a gradual point using AGI sequence. In order toinvestigate the amount of changes over AGI frame sequence, we use the variancefor every 30 frames using Eq. (8). In general, since gradual changes are proceedingover 30–60 frames (1–2 s), we chose 30 as a sliding local window where the existenceof gradual change is examined. For similar shot sequence, the variance is almostconstant, while for gradual sequence, it shows near to parabolic characteristics, i.e.,gradually decreasing at starting frame of changes and gradually increasing at theframe of new shot shown up (for the dissolve case).

varðiÞ ¼ 1

L� 1

XiþL

k¼i

ðAGIðkÞ �meanðkÞÞ2

meanðiÞ ¼ 1

L

XiþL

k¼i

AGIðkÞð8Þ

where, i = 1,2,..., n j L (frame number), L is the total number of frames in a window(30 frames), AGI (k) is the variance of the AGI in kth frame.

4.4. Filtering

Even though the feature variance sequence have an ideal variance curve, anadditional filtering procedure for reducing distortions within the curve is needed inorder to obtain more accurate detection. In this paper, we smooth the curve byapplying a morphological opening operation. Sequence after the opening on theAGI sequence shows softer curve as figure 9. The opening is performed by thefollowing equation.

OpeningðnÞ ¼ ½ðf]BÞ & B�ðnÞ ð9Þ

where, f ] B (n) = max [ f (n), f (n T 1), f (n T 2)], f & B (n) = min [ f (n), f (n T 1),f (n T 2)], n = 1,2,...m (frame number), f (n) is the variance in nth frame. B is thestructuring element of one-dimensional array (window size 5).

Fig. 9 Sequence after the opening operation on the AGI sequence in figure 8 (Missing_You.mpg)


Springer

4.5. Detection of gradual change frames

One of frames during gradual change has a minimum value in the local parabolicsequence. We detect a gradual change point based on the sequence width and depth,i.e., frame interval and variance difference (see figure 9). We declare the gradualchange point if Eq. (10) is satisfied. Two thresholds, 30 for width and 0.03(normalized value) for depth are heuristically chosen.

Dfvariance ¼ j�local max½i� 1� � �local min½i�j � 0:03

Dframe ¼ jFrmlocal max½i� 1� � Frmlocal max½i�j � 30ð10Þ

where, i = 1,2,...,n are frame numbers that have local minimum, Alocalmin[i] andAlocalmax[i] is variances of ith frame that have the local minimum and maximum,respectively, Frmlocalmax[i] are a frame number that has local maximum.

5. Experimental result

Experiments are performed on the IBM Pentium PC using Microsoft Visual C++. Agraphic user interface (GUI) is shown in figure 10.

Video data for experiments are one music video, two commercials, two movies,and one drama. We selected these videos because those contain many gradualframes. For the evaluation of the proposed detection algorithm, the precision andrecall were computed using Eqs. (11)–(12).

Precision ¼ NCORRECT

NCORRECT þNFALSE� 100 ð11Þ

Fig. 10 GUI (Graphic User Interface)


Springer

Recall ¼ NCORRECT

NSCD ¼ NCORRECT þNMISSEDð Þ � 100 ð12Þ

where, NCORRECT is the number of correctly detected frames, NFALSE is the numberof falsely detected frames, NMISSED is the number of missed frames, and NSCD istotal number of frames where transitions are occurred.

In order to detect the gradual shot boundary, an abrupt cut is detected first. Anexistence of gradual shot changes is examined between neighboring cuts. As Eq. (1)

Fig. 11 Performance comparison with different number of sub-blocks in terms of (a) precision;(b) recall

Table 2 Experimental results with six video sources

MPEG file Type NTOTAL NSCD NCORRECT NMISSED NFALSE Precision Recall

Missing you Music video 1,686 20 16 4 0 100 80

Sin noodle Commercial 477 6 4 2 0 100 75

White valentine Movie 1,200 5 4 1 1 80 80

Illwolgie Movie 1,035 10 10 0 1 91 100

Posco Commercial 441 4 4 0 1 80 100

GaeulDongWha Drama 765 6 5 1 2 71 83

Average 87.0 86.3

Where NTOTAL is total number of frames related to the MPEG file.


Springer

shows, the correlation has a value between j1 and 1. Correlation 1 means a perfectmatch. In our research, threshold 0.82 for an abrupt cut was heuristically chosen.

In the experiments for change detection, average precision and recall are 87.0 and86.3%, respectively. Results are described in Table 2.

We also performed experiments using different number of sub-blocks. Figure 11shows the performance comparison. As figure 11 show, dividing into 3�3 yields thebest overall performance and dividing into 1�1 i.e., the use of frame sequence shows

Fig. 12 False detection due to the continuous distribution of local minimum (left is falsely detectedframe and right is actual change frame) in (a) White_Valentine.mpg; (b) GaeulDongWha.mpg

Fig. 13 False detection due to object and camera movement (left is falsely detected frame and rightis actual change frame) in (a) Illwolgie.mpg; (b) Posco.mpg


Springer

worst result. It means using localized variance information is effective for detectionpurpose.

It is not surprising that decrease on the number of sub-blocks does not muchimpact on performance improvement because it has similar variance sequence tothat of frame. In the contrary, increase on the number of sub-blocks tends to adddistortions of variance sequence at the same time.

Most of non-detected frames (NMISSED) are due to the fact that those havealmost constant distribution of edge information between neighboring shots, hencedo not show a distinct dissolve curve. Some of false-detected frames (NFALSE) inmovies such as BWhite Valentine’’ and BGaeulDongWha’’ have consecutivedistribution of local minimum between gradual-changed frame and neighboringframe. Those declare the change at the 20–30 frames before actual transition point.Figures 12 and 13 show detected frames (left) through the proposed algorithm andactual change frame (right), respectively. For the BIlwolgie’’ case (figure 13(a)),falsely detected frames (left) are obtained due to the object movement within thesame scene. For the BPosco’’ case (figure 13(b)), falsely detected frame (left) are dueto continuing camera movement over many frames where actual transitions occur.

We compared the proposed algorithm with well-known twin comparison method[24] and DCD (Double Chromatic Difference) method [13], which show robustnesson object and camera movements. Edge histogram for twin comparison and full-decoded edge image for DCD method (not DC image) are used for the properevaluation. Experiments on six video data are depicted in figure 14. In precision, the

Fig. 14 Comparison with otheralgorithm (Twin & DCD) interms of (a) precision;(b) recall


Springer

proposed algorithm was superior to others except for the video 4 where the DCDshowed the best performance. In recall, the proposed algorithm was superior invides 2, 4, 5, and 6. However, the twin comparison showed best performance invideos 1 and 3. In average precision, the proposed algorithm showed bestperformance (87%) and the twin comparison (77.2%) and the DCD (74.3%) werefollowed. Also, in average recall, the proposed algorithm showed best performance(86.3%) and the twin comparison (82%) and DCD (65.3%) were followed.

Through the experiments, we noticed the twin comparison was sensitive tothresholds. Two thresholds should be adequately assigned to obtain improvedresults. In the experiments, in order to apply equal thresholds to six videos, we settwo thresholds based on the linear summation of an average and a standarddeviation over total histogram difference (That is, Th = average 3.0 � standarddeviation and T1 = average 0.4 � standard deviation).

6. Conclusions and further research

We proposed a new gradual shot boundary detection algorithm in this paper. Theproposed algorithm tried to approximate variance sequence to an ideal paraboliccurve that was shown in typical gradual transition. That was obtained by using mostdistinct parabolic sequence extracted from the nine local sub-block framesequences. Experiments on six video sources showed the proposed algorithmyielded better detection performance than well-known twin comparison and DCDmethods. For average precision, the proposed algorithm showed performance of87%. The twin comparison and the DCD showed 77.2 and 74.3%, respectively. Inaverage recall, it also showed best performance (86.3%) and the twin comparison(82%) and the DCD (65.3%) were followed.

Experimental results were encouraging, but it is worth stressing some problemsencountered. Camera motion, object motion, and extensive content change withinthe shot should be considered for high performance. The method of handling theseproblems along with the proposed algorithm could yield better results. Thealgorithm is performed on full-decoded frame. Therefore, it took more times thanin compressed domain. Fast detection along with handling the distortion ofcompressed data is necessary.

Future work includes more testing on different types of videos and efforts on shottransitions on the basis of camera motion. For other direction, we are underresearch to investigate emotions caused by various video effects to performemotion-based video scene retrieval.

Acknowledgments This work was supported by Korea Research Foundation Grant (KRF-2002–005-H 20002). Comments and suggestions from the reviewers were greatly appreciated.

References

1. Ahanger G, Little TDC (1996) A survey of technologies for parsing and indexing digital video. JVis Commun Image Represent 7(1):28–43

2. Brunelli R, Mich O, Modena CM (1999) A survey on the automatic indexing of video data. J VisCommun Image Represent 10(1):78–112

3. Covell M, Ahmad S (2002) Analysis-by-synthesis dissolve detection. In: Proc ICIP, vol 1. pp 23–25


Springer

4. Fernando WAC, Canagarajah CN, Bull DR (1999) Fade and dissolve detection in uncompressedand compressed video sequences. In: Proc ICIP, vol 3. pp 299–303

5. Ford RM, Robinson C, Temple D, Gerlach M (2000) Metrics for shot boundary detection indigital video sequences. Multimedia Syst 8(1):37–46

6. Gargi U, Kasturi R, Strayer SH (2000) Performance characterization of video-shot-changedetection methods. IEEE Trans Circuits Syst Video Technol 10(1):1–13

7. Hanjalic A (2002) Shot boundary detection: unraveled and resolved? IEEE Trans Circuits SystVideo Technol 12(2):90–105

8. Jain AK, Vailaya A, Xiong W (1999) Query by video clip. Multimedia Systems: Special Issue onVideo Libraries 7(5):369–384

9. Jun SB, Yoon K, Lee HY (2000) Dissolve transition detection algorithm using spatio-temporaldistribution of MPEG macro-block types. ACM International Conference on Multimedia 391–394

10. Lee SW, Kim YM, Choi SW (2000) Fast scene change detection using direct feature extractionfrom MPEG compressed videos. IEEE Trans Multimedia 2(4):240–254

11. Lienhart R (1999) Comparison of automatic shot boundary detection algorithms. StorageRetrieval for Media Database SPIE 3656:290–301

12. Lienhart R (2001) Reliable dissolve detection. Storage and Retrieval for Media Database SPIE4315:219–230

13. Lu HB, Zhang YJ, Yao YR (1999) Robust gradual scene change detection. InternationalConference on Image Processing 3:304–308

14. Meng J, Juan Y, Chang SF (1994) Scene change detection in a MPEG compressed videosequence. In: Proc. SPIE/IS&T Symp. Electronic Imaging Science and Technology: DigitalVideo Compression: Algorithms and Technologies, vol 2419. pp 14–25

15. Nagasaka A, Tanaka Y (1991) Automatic video indexing and full-motion search for objectappearances. In: Proc. IFIP TC2/WG2.6 Second Working Conf. on Visual Database System. pp113–127

16. Otsuji K, Tonomura Y, Ohba Y (1991) Video browsing using brightness data. Vis CommunImage Process SPIE-1606:980–989

17. Shapiro L, Stockman GC (2001) Computer vision. Prentice Hall18. Song S, Kwon T, Kim W (1998) Detection of gradual scene changes for parsing of video data. In:

Proc IS&T/SPIE, vol 3312. pp 404–41319. Truong BT (1999) Video genre classification based on shot segmentation. Honours Thesis,

Curtin University of Technology, Western Australia, November 199920. Truong BT, Dorai C, Venkatesh S (2000) New enhancements to cut, fade, and dissolve detection

processes in video segmentation. ACM International Conference on Multimedia pp 219–22721. Xing W, Lee JC (1998) Efficient scene change detection and camera motion annotation for

video classification. Comput Vis Image Underst 71(2):166–18122. Yeo BL, Liu B (1995) Rapid scene analysis on compressed video. IEEE Trans Circuits Syst

Video Technol 5(6):533–54423. Zabih R, Miller J, Mai K (1999) A feature-based algorithm for detecting and classifying

production effects. Multimedia Syst 7(2):119–12824. Zhang HJ, Kankanhalli A, Smoliar SW, Tan SY (1993) Automatic partitioning of full motion

video. ACM Multimedia Systems 1(1):10–2825. Zhang HJ, Wu J, Zhang D, Smoliar SW (1997) An integrated system for content-based video

retrieval and browsing. Pattern Recogn 30(4):643–658


Springer

Hun-Woo Yoo is a research professor at the Center for Cognitive Science at Yonsei University. He

received his B.S. and M.S. degrees in Electrical Engineering from Inha University, Korea and a Ph.D.

degree in Industrial Systems and Information Engineering at Korea University, Korea. From 1994 to

1997, he has worked as a research engineer at the Manufacturing Technology Center of LG

Electronics. His current research interests include multimedia information retrieval, computer vision,

and image processing.

Han-Jin Ryoo received a B.S. degree from Korea Military Academy, Korea and M.S. degree in

Industrial Systems and Information Engineering from Korea University, Korea. Currently he is a

Ph.D. candidate in Electronics and Computer Engineering at Korea University. His research interests

are face detection/recognition, multimedia communication and content based image search.

Dong-Sik Jang is a professor in the department of Industrial Systems and Information Engineering at

Korea University. He received a B.S. degree in Industrial Engineering from Korea University, Korea,

M.S. degree from University of Texas, and a Ph.D. degree in Industrial Engineering from Texas A&M

University. Dr. Jang’s research interests are computer vision, multimedia communication and artificial

intelligence.


Springer

10.1007_s11042-006-7715-8

Documents

Transcript of 10.1007_s11042-006-7715-8