IPCV 1001 011 Video Analytic Algorithm for Handout Extraction From Video Lectures

8/2/2019 IPCV 1001 011 Video Analytic Algorithm for Handout Extraction From Video Lectures

http://slidepdf.com/reader/full/ipcv-1001-011-video-analytic-algorithm-for-handout-extraction-from-video-lectures 1/8

Canadian Journal on Image Processing & Computer Vision Vol. 1, No. 1, February 2010

1

Video Analytic Algorithm for Handout Extraction from

Video Lectures

Ali Javed, Hafiz Adnan Habib

Abstract

Whiteboards have been very popular in corporate

sectors and in educational institutions because of its

usefulness and contribution towards modern day

corporate life. Besides traditional whiteboards,

digital whiteboards have been developed in the

recent past, which acts as a bridge for interaction

between human and computer. Despite its usefulness

digital whiteboards have some drawbacks also, like

the requirements of specialized pens and erasers so

that’s why systems have been developed to capture

contents from traditional whiteboards, therebysupporting the efforts to use traditional whiteboards.

We have targeted educational sector for our research

work. We have designed a system for capturing the

content on whiteboard and produce handouts using a

digital camera. Our video analytic algorithm

captures video lectures, detects and tracks the

instructor movement to overcome the problem of

instructor occlusion, extracts the whiteboard region,

scans the whiteboard for text extraction and finally

generates the lecture handouts on the decision of

erasing the whiteboard.

Keywords

Average pixel intensity, Block tagging, Erase event,

Lateral Histogram

1. Introduction

Whiteboards holds an important place in corporate

sector and in educational institutions for presentation

and group discussion because it is economical and

easy to use. Besides the traditional whiteboards,

digital whiteboards are also being used to capture the

contents. Digital whiteboards have intelligent

surfaces which remotely sense marker interaction

with normal whiteboard surfaces. Mostly they use

ultrasonic or infrared sensors. Mimio [16] is one suchsystem which uses ultrasonic sensors for whiteboard

content capturing, less expensive as compared to

other digital whiteboards but the area it captures is

limited. There are a lot of limitations exists with

digital whiteboards, they are very expensive, requires

specialized input devices resulting less interaction

between human and digital whiteboards. Therefore

traditional whiteboards are a better alternative for

capturing contents.

The proposed system in this research work uses a

traditional whiteboard and a static video camera to

record the contents. The proposed video analytic

algorithm starts with the detection of the whiteboard.

Whiteboard scanning is performed by dividing region

into blocks; it scans each block and updates the

contents as soon as text appears on each block. The

proposed algorithm captures the contents of

whiteboard as a series of key frames which includes

newly written text. Instructor occlusion in front of the

whiteboard is also encountered.

Some whiteboard capturing systems for traditional

whiteboards exists but they lack the decision making

of handout generation, they just save images and

display the captured contents. The proposed video

analytic algorithm in this research work addresses

these limitations by processing the video sequence

intelligently and analyzes the erase event to generate

handouts.

The paper is organized as follows: In section 2, we

have discussed systems developed previously related

to this work. In section 3 we have presented our

system architecture. In section 4 we have discussed

our proposed methodology. In section 5 we have

shown experimental setup and results. In section 6 wehave presented the conclusion and future work.

2. Literature Review

Whiteboard content capturing can be done

dynamically by recording video or in the form of

static images by taking snapshots. We have presented

the system which captures Whiteboard contents

dynamically. Paul, Richards and Allen [1] have

presented a system which captures static images and

extracts key frames from all the images captured,

instructor’s occlusion in front of whiteboard is

encountered through image differentiation. Imagerefinement has been achieved through image

averaging. It removes the problem of duplicate image

saving but they have drawbacks as images are not

clean and a lot of unnecessary saved images which

consumes a lot of memory and don’t generate

handouts. Zhang and Li-Wei [2] proposed a

methodology in which they take snapshots in chunks

from left to right and top to bottom then rectifies the




image, makes the whiteboard uniformly white,

extracts point of interest, matches feature points with

previous image and after reaching on last image

stitch all images. Instructor occlusion is not

considered here.

Some systems developed previously regarding

whiteboard contents capturing uses one capturing

device and some of them use multiple capturing

devices, Richards [4] presented his system for

whiteboard contents capturing in which he used one

static and one pan tilt zoom camera. User have to

draw grid lines on whiteboard, system scans those

grids to find the regions not occluded by the

instructor using Background Subtraction.

Normalized Cross Correlation technique has been

used to find the updated image.

Whiteboard content capturing systems have not just

designed for in class interaction or post class study

but also for sending data at remote locations through

video conferencing, where it needs to send data to

remote locations after enhancing the whiteboardcontents. Li-Wei and Zhang [5] have presented one

such system which captures pen strokes on physical

whiteboards in real time using a video camera. It

segments images into foreground objects, pen strokes

and whiteboard, extracts newly written pen strokes

by calculating the age of each cell. Whiteboard

contents have communicated to remote participants

in real time after enhancing the whiteboard contents.

In [7], Paul, Richards and Allen proposes a system

PAOL using high resolution cameras and a wireless

microphone to automatically create multimedia flash

presentations, which includes edited instructor’s

video, enhanced images of all the materials onwhiteboard. Their proposed capture algorithm

analyzes the video stream, identifies significant

events, enhances those contents and either store the

images or transmits it to remote locations.

Prabhu, Pradeep, punitha and Raman [6] proposes a

system for whiteboard content capturing through

foreground object detection. They used connected

component labeling algorithm to detect the

foreground objects from the scene and clustered

motion blocks to detect moving object, then they

replace the detected object with whiteboard content.

Stroke classification has been achieved through

thresholding. The output frame containing thewhiteboard contents is post processed to remove the

blocking artifacts and also to enhance the stroke

quality.

There can be situations where the captured contents

are difficult to recognize, so to overcome this

problem Markus, Gernot and Gerhard [3] proposed a

system for handwritten text recognition. Their

proposed methodology is to divide the whiteboard

region into blocks of 40x40, and corrects the image

intensity and slant in preprocessing stage. Hidden

Markov Model has been used for text recognition.

Digital whiteboards have also used for contents

capturing but digital whiteboards have drawbacks,

they are very costly, have limited size and aspect

ratio, require specialized input devices and erasers

which can be susceptible to interference. Qian,

Yuanyuan, Hong, Lei and Cuixia [8], have presented

an intelligent pen-based whiteboard system which

allows the instructors to write ink annotations

continuously, draw graphics freely and demonstrate

dynamic geometric graphics with pen ink, pen

strokes and pen gestures. This system includes

“stroke collector”, which captures signals of pen

movement, “stoke recognizer”, which recognizes pen

strokes and “geometry compute agent” which

processes sketches. But limitation exists as their

system needs a specialized digital pen.

3. System Architecture

The proposed system captures video recording of

lectures in image acquisition phase. It extracts the

whiteboard from the frame and enhances the

whiteboard region in the preprocessing phase. Human

detection phase includes instructor detection and

instructor’s motion estimation. Whiteboard scanning

has been achieved after segmenting it into blocks.

Whiteboard scanning includes the process of block

tagging in which each block is tagged as either “text

block” in case of text exists on the block or

“whiteboard block” in case no text exists on the

block. In the decision making phase it generates

handouts on the decision of erasing the whiteboardand text deletion check as shown in the system flow

chart in Figure 1.

2




Fig 1 System Flow Chart

4. Proposed Methodology

The proposed system captures the whiteboard

contents using a static video camera. The proposed

system takes the lecture video as its input. It detects

the whiteboard from the scene and enhances the

whiteboard region. Background subtraction is used to

detect the instructor in front of whiteboard using

referenced image of whiteboard without instructor.

Background subtraction is most popular and effective

method for human detection in which frames are

subtracted and moving object is detected as

foreground. Silhouette based approach is used to

segment the instructor from the scene. Instructor

tracking is required to capture contents from the

region occluded by the instructor in the previous

frames, which have been achieved through bounding

box created from the lateral histogram projection.

Whiteboard scanning has been achieved by dividing

the whiteboard region into blocks. Text regions are

detected by calculating the average pixel intensity

value of the block. The proposed technique for text

detection is to tag the block as Text block or

Whiteboard block by calculating the average pixel

intensity values and on the basis of thresholding it is

decided whether the current block contains text or

not. We have proposed a technique of “Block

Tagging” for text extraction from the whiteboard.

Block Tagging has been achieved by labeling theblock as tagged text block, in case it contains text and

tagged whiteboard block in case it does not contain

any text. The proposed system has also encountered

the case for text updating on each block. In the

decision making phase of our video analytic

algorithm, handouts are generated after intelligently

analyzing the erase event perform by the instructor.

The hand motion of the instructor is continuously

analyzed to identify the erase event. In case of erase

operation whiteboard region is examined for text

deletion. If all the text is deleted then it generates

handout after collecting all tagged text block.

4.1. Whiteboard Capturing

Whiteboard capturing has been achieved by using

Sony static camera which is capturing video at a

resolution of 640 x 480 pixels. In our case camera is

fixed and capturing video at a frame rate of 25

frames/sec. whiteboard content capturing is not an

easy task because factors like noise, reflection,

human shadows and instructor occlusion ultimately

affects the content capturing process.

4.2. Whiteboard Extraction

Whiteboard is the region of interest for textextraction from the scene, so it needs to extract

whiteboard from the scene. Whiteboard extraction

from the scene has been achieved through detecting

the four prominent edges in the image. We have

studied various edge detection approaches like Sobel

and prewitt operator [10] , canny edge detector [17],

we have also studied Hough Transform [10], [13] for

line detection, but Hough Transform needs a lot of

amendments to detect the boundary of the whiteboard

3




successfully from the scene. Canny Edge detector has

been applied on the first frame captured from the

video without instructor occlusion for whiteboard

detection. The image is converted into binary by

applying threshholding to get the edge image so that

the boundary of the whiteboard can be detected. The

resultant edge image after thresholding is shown in

figure 2(b). The image is scanned from the top of the

image in left to right direction till it reaches at the

bottom of the image. This scanning process is applied

to detect the four corners of the whiteboard. The

corner detection is used to detect the boundary of the

whiteboard. The whiteboard region inside the

boundary is then filled. Now this image is used as a

referenced image for whiteboard extraction and the

incoming frames are compared with this image to

extract the whiteboard region as shown in figure 2(d).

Fig 2 (a) Original Image, (b) Edge Image, (c) Image after regionfilling, (d) Final whiteboard extracted image.

4.3. Whiteboard Enhancement

Whiteboard enhancement has been achieved in the

preprocessing phase by applying Low pass filtering

to reduce the noise in the video. Average Filtering

has been applied for smoothing image. Average Filter

of 5x5 has been used for image smoothing and noise

removal.

4.4. Instructor Detection and Tracking

Systems based on observing humans andunderstanding their activities to function requires

methods for detecting people from a given input

image or video. Instructor detection and tracking in

front of the whiteboard is one of the main parts of our

system. In this research work various human

detection approaches and previous work regarding

human detection and tracking has been studied. Dalal

& Triggs [14] presented a human detection algorithm

with some good detection results. Their method uses

a dense grid of Histograms of Oriented Gradients

(HoG), computed over blocks of size 16 × 16 pixels

to represent a detection window. Jiang, Shao and Gao

[15] presents a method human detection in wide

angle camera images. Their proposed methodology

excludes other objects in motion to detect human

including cases of partial occlusions of humans,

rotations of the head and variation of skin colors. In

this approach human characteristic is not used in

detection, but rather exclude other possible moving

objects.

Background subtraction is used to detect the

instructor from the scene. Background subtraction is

a popular method to isolate the moving parts of a

scene by segmenting it into background and

foreground. The basic phenomenon behind this is

subtraction between two images, in which one is the

referenced image against which the incoming image

is subtracted to detect the differences. The

intersection set is said to be background and the

difference is foreground or region of interest. Here inthis research work the referenced image is the one

without instructor in front of the whiteboard. The

incoming frame is subtracted from the referenced

image to detect the instructor from the scene.

Silhouette length width ratio based approach is

applied for instructor’s segmentation from the scene.

The length–width ratio is derived from lateral

histogram projection which includes vertical and

horizontal projection histograms of segmented people

[9]. The procedure is started with the computation of

horizontal histogram projection. It searches for the

row with the maximum number of pixels, once the

row with the maximum number of pixels hasdetected, and then it scans from the maximum row to

the top of the image. If the row data is less than a

threshold, it is considered as the top of the object and

labeled as HT. Then scan from the maximum row to

the image bottom. If the row data is less than a

threshold, it is considered as the bottom of the object

and labeled as HB. Secondly vertical image

histogram projection is performed. Finding the left

HL and right HR object boundaries are calculated in

the same way. The width of the object is found by

calculating the difference between the left boundary

(HL) and the right boundary (HR), and the height of

the object is found by calculating the difference

between the top boundary (HT) and the bottom

boundary (HB) of the segmented object. A bounding

box is created from these calculations of width and

height of the moving object, which is then used to

track the instructor’s motion as shown in the figure 3.

4




Fig 3 (a) Original Image, (b) Background Subtracted Image

showing instructor detection, (c) Histogram Segmented Image

showing instructor tracking.

4.5. Whiteboard Scanning

Whiteboard scanning is performed on a series of richkey frames which includes newly written text.

Instructor detection and tracking in front of the

whiteboard is required to extract text from those

regions occluded by the instructor in the previous

frames. Whiteboard image without instructor

occlusion is obtained once instructor detection and

tracking achieves successfully. This updated image

contains the recent whiteboard pixels without

instructor occlusion in front of the whiteboard.

Whiteboard scanning is achieved by dividing the

whiteboard region into blocks of 20 x 20 pixels and

scanning is performed by processing each block.

The motion of instructor is tracked for scanningwhiteboard blocks to check whether it contains text

or not. The technique of calculating the average pixel

intensity is applied to detect the differences between

the current frame and the referenced frame, the

referenced frame in this case contains only

whiteboard without any text written on it and without

instructor occlusion in front of the whiteboard, and

on the basis of thresholding it is finally decided

whether the block contains text or not. Once the

instructor moves away from the whiteboard region

after writing text then that region is scanned to isolate

the blocks containing text and the blocks without any

text. The technique of Block Tagging is proposed

which labels the block as a text block in case itcontains text or the whiteboard block in case no text

exists on a particular block. Block diagram for

whiteboard scanning is shown in figure 4.

Fig 4 Block diagram for Whiteboard Scanning

The Proposed algorithm is also designed for the case

of updating text on the whiteboard. To save the

updated tagged text block frame differencing is used

to calculate the difference between the current frame

and the previous frame. Once the difference is

detected the particular block is binarized and the

number of black pixels is counted. In this proposed

algorithm if black pixel values increases or decreasesby 5% then it is considered as an updated image and

the tagged text block is updated. The pseudo code to

update the text block is presented in the figure 5.

Fig 5 pseudo code for text block updation

4.6. Handout Generation

The proposed algorithm generates handout on the

decision of erasing the whiteboard. The hand motion

of the instructor is continuously checked to detect the

erase event. CAMSHIFT Algorithm [11] has been

studied and applied to track the motion of Instructor’shand to check for erase event but CAMSHIFT

Algorithm has one major problem when it is applied

for hand tracking. In the case of face and hand

appearing at the same time in a video frame,

CAMSHIFT tracks the face not hand, that’s why

CAMSHIFT Algorithm is not used in this research

work.

Optical flow is used in our research work to detect

the flow of hand motion. Optical flow is the apparent

5




motion of the brightness pattern in an image. It

generally corresponds to the motion field of the

captured scene in the image so that it can be used to

distinguish the moving objects [12]. Motion gradient

image calculation is used to implement the optical

flow technique. Timed motion history images are

used to detect the motion in an image by checking a

defined number of previous frames. To detect the

erase event only small motions needs to be analyzed

because small motion occurs in cases of writing text

and erasing text but large motions occurs in case

when instructor moves in front of the whiteboard

quickly. Small motions are identified using motion

history images to detect the differences between

frames. Motion Gradient Image is used to check the

flow of hand movement. It is calculated by taking the

motion history image and sobel operator is applied on

the motion history image to get the angle of

movement. In the case of writing text the angle of

movement does not changes frequently but in case of

erase operation it changes frequently. In our case if the angle lies between 90 and 270 degrees then it

means the hand movement for erase operation is on

the left side and in case angle lies between -270 to 90

degrees it shows the hand movement for erase

operation on the right side. This calculation gives the

optical flow of hand movement in a particular

direction.

Once the hand movement for erase operation has

detected then the whiteboard is analyzed for text

deletion. Whiteboard analysis for text deletion also

examines whether the whole text on the whiteboard

has been deleted or not because the proposed system

generates handout once the instructor erases thewhiteboard completely.

5. Experimental Setup and Results

The Application interface and results of the proposed

system are shown in the current section.

5.1 Application Interface

The proposed system is designed to operate with asingle static camera which is fixed at a particular

location. Sony 8.1MP static camera is used for

Whiteboard contents capturing which captures video

at a resolution of 640 x 480 pixels having a frame

rate of 25 frames per second. A low cost video

camera is used for capturing video sequences because

one of the objectives of this research was to design an

economical system. The distance between the camera

and the whiteboard is roughly 5 to 6 meters.

5.2 Results

The performance of our proposed Video Analytic

Algorithm is tested on different videos recorded in

Lecture Rooms.

5.2.1 Whiteboard Extraction

Whiteboard edge image has acquired after applying

canny edge detector and then the boundary of the

whiteboard has been calculated after finding the

corners of the whiteboard as shown in Fig 6.

Fig 6 (a) Original Image, (b) Edge Image, (c) Image after region

filling, (d) Final whiteboard extracted image.

5.2.2 Instructor Detection and Tracking

The results for instructor detection and tracking are

shown in Fig 7, which is achieved by using silhouette

length width ratio based approach to segment theinstructor from the scene and tracks the instructor

motion through lateral histogram projection by

creating the bounding box.

6




Fig 7 a) Original Image, (b) Background Subtracted Image

showing instructor detection, (c) Histogram Segmented Image

showing instructor tracking.

5.2.3 Handout Generation

The proposed algorithm in this work generates

handout on the decision of erasing the whiteboard.The result of hand movement detection for erase

event is shown in Figure 8. The final handout

generated image obtained in binary form after

processing by the proposed Video Analytic

Algorithm is shown in Figure 9.

Fig 8 (a) Erase Event Detected, (b) Message Display for Erasing

Fig 9 Final Handout Image

6. Conclusion & Future work

For efficient utilization of traditional whiteboards, we

have designed a system which captures contents from

the whiteboard and generate handouts on the event of

erasing the whiteboard. Earlier systems were

developed to capture whiteboard contents but they

lack the decision making of handouts generation. Our

proposed system extends the solution by generating

handouts on the decision of erasing the whiteboard

contents.

In the future we are planning to make our system

more efficient by improving the procedure to produce

better results in the presence of noise and reflections,

and also to implement this system for chalkboards.

More over we are also looking to generate handout in

arranged form.

Acknowledgments

I would like to thanks my supervisor Dr. Hafiz

Adnan Habib for his excellent guidance, untiring help

and endless support throughout this result orientedand well directed research work.

References

[1] Dickson, P.E.; Adrion, W.R.; Hanson, A.R,

“Whiteboard Content Extraction and Analysis

for the Classroom Environment,” Multimedia,

2008. ISM 2008. Tenth IEEE International

Symposium on 15-17 Dec. 2008.

[2] Z. Zhang and L. w. He, "Note-taking with a

camera: whiteboard scanning and image

enhancement," IEEE International Conferenceon Acoustics, Speech, and Signal Processing

(ICASSP '04), 2005.

[3] M. Wienecke, G. A. Fink, and G. Sagerer,

"Towards Automatic Video-based Whiteboard

Reading," Seventh International Conference on

Document Analysis and Recognition

(ICDAR'03), 2005.

[4] Richard Yi Da Xu, “A Computer Vision based

Whiteboard Capture System,” Applications of

Computer Vision, 2008. WACV 2008. IEEE

Workshop on 7-9 Jan. 2008

[5] L. He and Z. Zhang, "Real-Time Whiteboard

Capture and Processing Using a Video Camera

for Remote Collaboration," IEEE Transactions

on Multimedia, vol.9, pp. 198 - 206, 2007.

[6] Prabhu N, Pradeep Kumar R,Punitha T, Raman

Srinivasan, Whiteboard Documentation through

Foreground Object Detection and Stroke

7




Classification, “ IEEE International Conference

on Systems, Man and Cybernetics (SMC 2008)”

[7] P. E. Dickson, W. R. Adrion, and A. R. Hanson,

“Automatic capture and presentation creation

from multimedia lectures,” In Frontiers in

Education, 2008. FIE 2008. 38th Annual, Oct

2008.

[8] Qian Li , Yuanyuan Liu, Hong Xu, Lei Ren ,

Cuixia Ma, “An Intelligent Interactive Pen-based

Whiteboard for Dynamic Geometry Teaching”,

Information Technologies and Applications in

Education, 2007. ISITAE '07. First IEEE

International Symposium, 23-25 Nov. 2007

[9] E.R Davies, Hole Detection, “Machine Vision”.

3rd Edition

[10] Rafael Gonzalez, Richard Woods, Image

Enhancement, Image Segmentation “DigitalImage Processing”. 2

nd Edition, Prentice Hall,

Newyork,2002.

[11] Haiting Zhai ,Xiaojuan Wu, Hui Han, “Research

of a real time hand tracking algorithm,” Neural

Networks and Brain, 2005. ICNN&B '05.

International Conference on Volume 2, 13-15

Oct. 2005

[12] Polley R. Liu, Max Q.-H. Meng, Peter X. Liu,

Fanny F.L. Tong, Xiaona Wang,” Optical Flow

and Active Contour for Moving Object

Segmentation and Detection in Monocular

Robot”, Robotics and Automation, 2006. ICRA

2006. Proceedings 2006 IEEE International

Conference on 15-19 May 2006

[13] Nakanishi, M. Ogura, T. , “Real-time line

extraction using a highly parallel Hough

transform board,” Image Processing, 1997.

Proceedings., International Conference Volume:

2 on 26-29 Oct 1997

[14] N. Dalal and B. Triggs. “Histograms of oriented

gradients for human detection,” Conference on

Computer Vision and Pattern Recognition

(CVPR), 2005.

[15] Zhuo-Lin Jiang, Shao-Fa Li, Dong-Fa Gao. “A

time saving method for human detection in wide

angle camera images,” Fifth International

Conference on Machine Learning and

Cybernetics, Dalian, 13-16 August, 2006.

[16] S. LP. Interactive whiteboard, virtual

whiteboard, whiteboards, sanford brands -

mimio. http://www.mimio.com/ , Nov 2007.

[17] Canny edge detection,

http://en.wikipedia.org/wiki/Canny_edge_detect

or

Author’s BiographyEngr. Ali Javed is serving as a Lecturer in Software

Engineering Department

at University of

Engineering &

Technology Taxila,

Pakistan since

September, 2007. He has

received his MS degree

in Computer Engineering

from the University of

Engineering &

Technology Taxila, Pakistan in January,2010. He hasreceived B.Sc. degree in Software Engineering from

University of Engineering & Technology Taxila,

Pakistan, in September, 2007. His areas of interest

are Digital Image Processing, Computer vision,

Video Summarization, Machine Learning, Software

Design and Software testing.

Dr. Hafiz Adnan Habib

is currently serving as an

Associate Professor in

Computer Engineering

Department at University

of Engineering &Technology Taxila,

Pakistan. He received the

Ph.D degree in Electrical

ersity of Engineering &

Technology Taxila, Pakistan, in 2007. His area of

interest are Gesture Recognition solutions for HCI in

smart environments, Gesture Recognition for Safety,

Monitoring, Support of elder persons and patients

and Video Summarization.

Engineering from Univ

8

http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=4409224






http://www.mimio.com/

http://en.wikipedia.org/wiki/Canny_edge_detector




http://www.mimio.com/







IPCV 1001 011 Video Analytic Algorithm for Handout Extraction From Video Lectures

Documents

Transcript of IPCV 1001 011 Video Analytic Algorithm for Handout Extraction From Video Lectures