Multi Speaker Detection using audio and video sensors

18
Multi Speaker Detection And Tracking Using Audio And Video Sensor Using Gesture Analysis By: Abhishek M K Under the guidance of: Manjunath Raikar Asst.Prof Dept of CSE

Transcript of Multi Speaker Detection using audio and video sensors

Page 1: Multi Speaker Detection using audio and video sensors

Multi Speaker Detection And Tracking Using

Audio And Video Sensor Using Gesture Analysis

By: Abhishek M K Under the guidance of:

Manjunath Raikar Asst.Prof

Dept of CSE

Page 2: Multi Speaker Detection using audio and video sensors

CONTENTS

• Introduction• What is E-Learning class?• Working• Block diagram• Types of virtualization• Conclusion• References

Page 3: Multi Speaker Detection using audio and video sensors

INTRODUCTION• E-learning uses the concept of video conferencing

for interaction between students and tutors in different locations.

• The tutor’s actual presence is in a real classroom and the students can view their tutor through a video in a virtual classroom.

• Audio and video sensors are used to make the E-learning classroom more efficient.

Page 4: Multi Speaker Detection using audio and video sensors

• Audio sensors such as microphone are used to receive audio input and video-sensors such as cameras are used to receive video signals.

• Gestures are used as a form of non-verbal communication.

• Multiple students asking questions at the same time can be answered by using gesture analysis.

Page 5: Multi Speaker Detection using audio and video sensors

What is e-learning class

• The main objective of our work is to make E-learning classrooms as similar to normal classrooms.

• Multispeaker detection is enabled in the system and tutor’s gestures are used to make decisions.

• Both the real and the virtual classroom has cameras, as well as audio sensors.

Page 6: Multi Speaker Detection using audio and video sensors

CONTINUED…

• Students who have questions will either raise their hand or talk.

• These audio video sensors will collaboratively work together and detect the first event either in the virtual or real classroom.

• The PTZ camera will zoom in onto a particular location and the focus will be on a specific student.

Page 7: Multi Speaker Detection using audio and video sensors

Working• The speaker is identified by using a microphone array

and PTZ camera.

• The speaker who first talks is identified either from virtual or real classroom using audio/video signals.

• The PTZ camera and the audio sensors are used to track the students who want to speak.

• Students who gesture or speak will be put in a queue, with priority given to who gestured/speak first.

Page 8: Multi Speaker Detection using audio and video sensors

CONTINUED…

• As the student who first gestures or speaks will become the focus of the camera.

• The virtual classroom is a place where the students need a screen to view the professor.

• We need three cameras for taking pictures.

• The students are localized using audio and video sensors.

Page 9: Multi Speaker Detection using audio and video sensors

Fig 1: The tutor is taking class.His video will be displayed in remote classroom and remote students video will be displayed in real classroom

Fig 2: A student in the remote classroom raises his hand for doubt.His face is focussed in the real classroom as he produces the first interrupt

Page 10: Multi Speaker Detection using audio and video sensors

Block diagram

Real Classroom

Audio-sensor

Video-sensor

Human voice

detector

Detecting hand

Gesture

Virtual Classroom

Audio-sensor

Video-sensor

Human voice

detector

Detecting hand

Gesture

Priority Detection System

Localization

Tutor’s Gesture Analysis

Video Sensor Focus

Page 11: Multi Speaker Detection using audio and video sensors

• The Audio sensors will sense the students who are asking doubts and the video sensors will sense the images of the students.

• The audio sensor will be fed to human voice detecting system for detecting human voice and the video sensor will be used to detect hand raise of the students.

• Then we need to use priority detecting system to detect which event happens first.

Page 12: Multi Speaker Detection using audio and video sensors

• After it’s prioritized, the camera will focus the particular student who asks doubts first.

• The real and remote classrooms are connected via internet.

CONTINUED…

Page 13: Multi Speaker Detection using audio and video sensors

TYPES OF VIRTUALIZATION

• Audio Virtualization• Video Virtualization

Page 14: Multi Speaker Detection using audio and video sensors

Audio virtualization• For Audio Localization we are using the concept of estimating

time delay between pair of microphones.

• Cross correlation between audio signals is used for getting the time delay.

• Steps for audio localization Obtain audio signals Convert to frames calculate average energy of frames If it is above a threshold it is speech Cross correlate to find the time delay

Page 15: Multi Speaker Detection using audio and video sensors

Video virtualization• The students hand raise gesture as well as professors gestures

needs to be find out for taking decision in E-class.

• The Gesture analysis Algorithm works on basis of comparison between the reference frames with the frame to be checked.

• For creating reference image, we need to train the gestures of different category and save in a database.

• The captured image is compared with each of the reference frame.

• Those who get the maximum correlation will be detected as the match.

Page 16: Multi Speaker Detection using audio and video sensors

Conclusion• The main purpose of the project is to make the E-

Learning classroom more natural by effectively using gesture analysis of tutor .

• E-learning classroom is a challenge but it will make the classroom more similar to a real classroom.

Page 17: Multi Speaker Detection using audio and video sensors

References• [1] Remote Student Localization using Audio and Video

Processing for Synchronous Interactive E-Learning Balaji Hariharan, Aparna Vadakkepatt, Sangeeth Kumar Amrita Centre for Wireless Networks and Applications, Amrita Vishwa Vidyapeetham Kerala, India.

• [2] Sensors for Gesture Recognition Systems-IEEESignal Berman, Member, IEEE, and Helman Stern, Member, IEEE.

• [3] Robust Joint Audio-Video Localization in Video Conferencing Using Reliability Information David Lo, Rafik A. Goubran, Member, IEEE, Richard M. Dansereau, Member, IEEE, Graham Thompson, and Dieter Schulz .

Page 18: Multi Speaker Detection using audio and video sensors

THANK YOU…..