Automatic subtitle generation

60
Ihr Logo Supervisor: Submitted by: K. Rajalakshmi Tanya Saxena(10503894) Abhinav Mathur(10503858) MAJOR PROJECT omatic Subtitle Generat from Videos

Transcript of Automatic subtitle generation

Page 1: Automatic subtitle generation

Ihr Logo

Supervisor: Submitted by:

K. Rajalakshmi Tanya Saxena(10503894)

Abhinav Mathur(10503858)

MAJOR PROJECT

Automatic Subtitle Generation from Videos

Page 2: Automatic subtitle generation

Video has become one of the most popular multimedia artefacts used on PCs and the Internet. In a

majority of cases within a video, the sound holds an important place. From this statement, it appears essential to

make the understanding of a sound video available for people with auditory problems as well as for people with

gaps in the spoken language. The most natural way lies in the use of subtitles.

However, manual subtitle creation is a long and boring activity and requires the presence of the user.

Consequently, the study of automatic subtitle generation appears to be a valid subject of research.

PROBLEM STATEMENT...

Page 3: Automatic subtitle generation

The system should take a video file as input and generate a subtitle file (srt/txt) as output. The Three modules are:-

Audio Extraction:

The audio extraction routine is expected to return a suitable audio format that can be used by the speech recognition

module as pertinent material. It must handle a defined list of video and audio formats. It has to verify the file given in input so that it can evaluate the extraction feasibility. The audio track has to be

returned in the most reliable format.

INTRODUCTION...

Page 4: Automatic subtitle generation

Speech Recognition:

The speech recognition routine is the key part of the system. Indeed, it affects directly performance and results

evaluation. First, it must get the type of the input file then, if the type is provided, an appropriate processing method is chosen. Otherwise, the routine uses a default configuration. It must be

able to recognize silences so that text delimitations can be established.

Subtitle Generation:

The subtitle generation routine aims to create and write in a file in order to add multiple chunks of text corresponding to

utterances limited by silences and their respective start and end times. Time synchronization considerations are of main

importance.

Page 5: Automatic subtitle generation

BENEFITS OF USING SUBTITLES....

The major benefit is that the viewer does not need to download the subtitle from internet if he wants to watch the video with subtitle.

Captions help children with word identification, meaning, acquisition, and retention.

Captions can help children establish a systematic link between the written word and the spoken word.

Captioning has been related to higher comprehension skills when compared to viewers watching the same media without captions.

Page 6: Automatic subtitle generation

Captions provide missing information for individuals who have

difficulty processing speech and auditory components of the visual media (regardless of whether this difficulty is due to a hearing loss).

Captioning is essential for children who are deaf and hard of hearing, can be very beneficial to those learning English as a second language, can help those with reading and literacy problems, and can help those who are learning to read.

CONTINUED....

Page 7: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 7

OVERALL ARCHITECTURE

OFTHE PROJECT

Page 8: Automatic subtitle generation

#1

FLOW DIAGRAM

Page 9: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 9

Page 10: Automatic subtitle generation

#2

USE CASE DIAGRAM

Page 11: Automatic subtitle generation
Page 12: Automatic subtitle generation

#3

ACTIVITY DIAGRAM

Page 13: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 1 3

AUDIO EXTRACTION…

Page 14: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 1 4

SPEECH RECOGNITION…

Page 15: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 1 5

SUBTITLE GENERATION…

Page 16: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 1 6

TECHNOLOGY &

TOOLS USED

Page 17: Automatic subtitle generation

FFMPEG…

H E R E C O M E S Y O U R F O O T E R P A G E 1 7

FFMPEG libraries are used to do most of our multimedia tasks quickly and easily say, audio compression, audio/video format conversion, extract images from a video and a lot more. It can be used by developers for transcoding, streaming and playing.

It is very stable framework for transcoding of videos and audio.

Page 18: Automatic subtitle generation

JAVA SPEECH API…

It allows developers to incorporate speech technology into user interfaces for their Java programming language applets and applications. This API specifies a cross-platform interface to support command and control

recognizers, dictation systems and speech synthesizers. . Sun has also developed the JSGF(Java Speech Grammar Format) to provide cross-platform grammar of speech

recognizers .

Page 19: Automatic subtitle generation

CURRENT PROBLEMS…

H E R E C O M E S Y O U R F O O T E R P A G E 1 9

Robustness.

Automatic generation of word lexicons.

Finding the theoretical limit for FSM implementations of ASR systems.

Optimal utterance verification-rejection algorithms.

Accuracy and Word Error Rate.

Filling up missing offset samples with silence.

Synchronize between tracks.

Page 20: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 2 0

FUNCTIONAL REQUIREMENT

S

Page 21: Automatic subtitle generation

All MPEG standard formats are supported like MP2, MP3 etc. for audio/video.

Audio of any format can be extracted but speech recognition is done only in English.

The extracted text from the audio/video is in the .srt format. The text displayed will have a readable format

Captions appear on-screen long enough to be read. It is preferable to limit on-screen captions to no more than two lines. Captions are synchronized with spoken words.

User can convert the extracted audio in any suitable format supported under MPEG standards.

Page 22: Automatic subtitle generation

NON-FUNCTIONAL

REQUIREMENTS

Page 23: Automatic subtitle generation

System Requirements – The software is compatible on all the Operating Systems. The user needs to install the .exe file of the software in their PCs. 

Security – The system has no security constraints.  Performance – The text is synchronized with the song.  Maintainability – The software is easy to maintain. Reliability - The software will provide a good level of

precision. Modifiability- The software cannot be modified by

external user. Scalability- The software is scalable as a number of

users can utilize it for their benefits simultaneously.

Page 24: Automatic subtitle generation

PROPOSED ALGORITHMS

Page 25: Automatic subtitle generation

MP3 ALGORITHM…

1. Initialize i=0, j=1.

2. tincr = 1.0 / sample_rate

3. dstp = dst, c = 2 * M_PI * 440.0;

4. Generate sin tone with 440Hz frequency and duplicated channels

5. Check if i < nb_samplesIf it is true then generate ths sine wave and store it in dstp = sin(c * *t)

6. Check if j < nb_channels

7. Store the packets in the destination buffer.

8. Increment dstp += nb_channels and t += tincr

9. Repeat till the dst buffer is filled with nb_samples, generated starting from t

Page 26: Automatic subtitle generation

MFCC (MEL FREQUENCY CEPSTRAL COEFFECIENT)

Check if Delta frequency which is the ratio between sample rate and number of fft points if (deltaFreq == 0) { Print “deltaFreq has zero value"; }Check if the left and right boundaries of the filter are too close. if ((Math.round(rightEdge - leftEdge) == 0)|| (Math.round(centerFreq - leftEdge) == 0) || (Math.round(rightEdge - centerFreq) == 0)) { throw new IllegalArgumentException("Filter boundaries too close"); } Find how many frequency bins we can fit in the current frequency range. numberElementsWeightField =(int) Math.round((rightEdge - leftEdge) / deltaFreq + 1); Initialize the weight field. if (numberElementsWeightField == 0) { throw new IllegalArgumentException("Number of elements in mel" + " is zero."); } weight = new double[numberElementsWeightField];

Page 27: Automatic subtitle generation

CONTINUED…

filterHeight = 2.0f / (rightEdge - leftEdge);

Now compute the slopes based on the height.

leftSlope = filterHeight / (centerFreq - leftEdge);

rightSlope = filterHeight / (centerFreq - rightEdge);

Now let's compute the weight for each frequency bin.

for (currentFreq = initialFreq, indexFilterWeight = 0; currentFreq <= rightEdge; currentFreq += deltaFreq, indexFilterWeight++) {

if (currentFreq < centerFreq) {

weight[indexFilterWeight] = leftSlope * (currentFreq - leftEdge); } else {

weight[indexFilterWeight] = filterHeight + rightSlope * (currentFreq - centerFreq);

}}

Convert linear frequency to mel frequency

private double linToMelFreq(double inputFreq) {

return (2595.0 * (Math.log(1.0 + inputFreq / 700.0) / Math.log(10.0))); }

Page 28: Automatic subtitle generation

IMPLEMENTATION

Page 29: Automatic subtitle generation

#1

AUDIO EXTRACTION

Page 30: Automatic subtitle generation
Page 31: Automatic subtitle generation

#2

SPEECH RECOGNITION

Page 32: Automatic subtitle generation
Page 33: Automatic subtitle generation

#3

SUBTITLE GENERATION

Page 34: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 3 4

Page 35: Automatic subtitle generation

RISK AND ITS IMPACT

Page 36: Automatic subtitle generation

Risk

ID

Classification Description of

Risk

Risk Area Probability Impact RE

(P*I)

1. Product

Engineering

Word Error Rate Performance L H M

2. Product

Engineering

Aliasing Performance M M M

3. Development

Environment

Bitrate of

extracted audio

more than that of

input audio

Testing

Environment

L L L

4. Product

Engineering

Accuracy and

Speed

Performance L H M

5. Program

Constraint

Format not

recognized

External Input L H M

Page 37: Automatic subtitle generation

RISK AND MITIGATION

PLANS

Page 38: Automatic subtitle generation

Risk ID Description of Risk Risk Area Mitigation

1. Word Error Rate Performance Having an effecient

database (Training

Set).

2. Aliasing Performance Resampling the

samples at a fix

frequency.

3. Bitrate of extracted audio

more than that of input audio

Testing Environment Encode and Decode

audio at the bitrate of

the input audio.

4. Accuracy and Speed Performance Synchronization

5. Format not recognized External Input Input audio/video

supported by MPEG

standard formats.

Page 39: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 3 9

BLACK BOX TESTING

Page 40: Automatic subtitle generation

Test Case ID Input Expected Output Status

1. 1.1 File.mp3 File.mp3 Pass

1.2 File.mp4 File.mp3 Pass

1.3 File.mp2 File.mp3 Pass

1.4 File.au File.au Pass

1.5 File.aac File.aac Pass

1.6 File.wav File.wav Pass

1.7 File.flac File.flac Pass

1.8 File.wma (format not supported by

MPEG standards)

File.wma Fail

1.9 File.als (format not supported by

MPEG standards)

File.als Fail

Page 41: Automatic subtitle generation

2. 2.1 File.wav (Words present in the

dictionary)

Speech Recognized.

Text Printed.

Pass

2.2 File.mp3 (not a .wav file) Speech Recognized.

Text Printed.

Fail

2.3 File.au (not a .wav file) Speech Recognized.

Text Printed.

Fail

2.4 File.flac (not a .wav file) Speech Recognized.

Text Printed.

Fail

2.5 File.wav (Words not found in the

Dictionary)

Speech Recognized.

Text Printed.

Fail

3. 3.1 File.srt (Incorrect Timecode) Subtitles generated but

synchronized with the video

Fail

3.2 File.srt (Correct Timecode)

File.avi

Subtitles generated and

synchronized with the video file

File.avi

Pass

3.3 File.txt (not containing the

Timecode)

Subtitles generated and

synchronized with the video

Fail

3.4 File.srt (Correct Timecode)

File.mp4

Subtitles generated and

synchronized with the video file

File.mp4

Pass

3.5 File.srt (Correct Timecode)

File.wma

Subtitles generated and

synchronized with the video file

File.wma

Pass

Page 42: Automatic subtitle generation

WHITE BOX TESTING

Page 43: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 4 3

AUDIO EXTRACTION…

Page 44: Automatic subtitle generation

CC=E-N+2Where,

E=No. of Edges(80)

N=No. of Nodes(72)

CC=80-72+2=10

CYCLOMATIC COMPLEXITY…

Page 45: Automatic subtitle generation

SPEECH RECOGNITION…

Page 46: Automatic subtitle generation

CC=E-N+2Where,

E=No. of Edges(80)

N=No. of Nodes(72)

CC=98-91+2=9

CYCLOMATIC COMPLEXITY…

Page 47: Automatic subtitle generation

ERROR & EXCEPTION

HANDLING

Page 48: Automatic subtitle generation

Test Case ID Components Debugging Technique

1.8 Audio Extraction Backtracking Debugging

1.9 Audio Extraction Backtracking Debugging

2.2 Speech Recognition Backtracking Debugging

2.3 Speech Recognition Backtracking Debugging

2.4 Speech Recognition Backtracking Debugging

2.5 Speech Recognition Print Debugging

3.1 Subtitles Generation Print Debugging

3.3 Subtitles Generation Backtracking Debugging

Page 49: Automatic subtitle generation

Test Case ID Input Expected Output Status

1.8 File.au (format

supported by MPEG

standards)

File.au Pass

1.9 File.mp4 (format

supported by MPEG

standards)

File.mp3 Pass

2.2 File.wav Speech Recognized.

Text Printed.

Pass

2.3 File.wav Speech Recognized.

Text Printed.

Pass

2.4 File.wav Speech Recognized.

Text Printed.

Pass

2.5 File.wav (Words found

in the Dictionary)

Speech Recognized.

Text Printed.

Pass

3.1 File.srt (Correct

Timecode)

Subtitles generated and

synchronized with the

video

Pass

3.3 File.srt Subtitles generated and

synchronized with the

video

Pass

Page 50: Automatic subtitle generation

RESEARCH WORK

Page 51: Automatic subtitle generation

DETAILED STUDY OF INPUT AND EXTRACTED FILES…

Time Taken

for Extract

ion (in ms)

Size Bitrate Size Bitrate(MB) (kbps) (MB) (kbps)

1 Despicable.avi

10.8 1628 8.24 1411 00:49 0.6 24%

2 Time.mp4 48.1 1663 44.4 1536 04:02 3.12 8%

3Florida.mp4 76 2723 39.3 1411 03:54 1.08 48%

4International.mp4 79.1 2673 41.7 1411 04:08 1.3 47%

5 Justin.mp4 43.2 1615 41 1536 03:44 1.54 5%

6 Love.mp4 67.1 2112 44.8 1411 04:26 1.98 33%

7 Jojo.avi 61.8 2183 39.9 1411 03:57 1.86 35%

8 Baby.mp4 43.2 1615 41 1536 03:44 3.34 5%

9 Never.mp4 52.5 1657 48.5 1536 04:25 2.15 8%

10 Beep.avi 51.4 1628 38.4 1411 03:48 01:58 25%

Average 53.3 1950 38.7 1461 03:41 1.71 24%

Reduction Rate

S. No.

Input FileBefore Audio

ExtractionAfter Audio

Extraction

Length of the

input/output file

(min:sec)

Page 52: Automatic subtitle generation

COMPARISON BETWEEN THE SIZE OF THE INPUT FILE AND THE EXTRACTED FILE

Despi

cabl

e.av

i

Time.

mp4

Florid

a.m

p4

Inte

rnat

iona

l.mp4

Just

in.m

p4

Love.

mp4

Jojo

.avi

Baby.

mp4

Nev

er.m

p4

Beep.

avi

0

40

80

Size Before Ex-traction(MB)Size After Ex-traction(MB)

Input Files (.mp4/.avi)

Siz

e o

f fi

le (

in M

B)

From the above graph we can observe that the size of each input file is reduced as the audio has been extracted from the input video. The

maximum reduction rate of the size of the file is 0.48 and the minimum reduction is 0.05 giving an average reduction rate of 24%.

Page 53: Automatic subtitle generation

COMPARISON BETWEEN THE BITRATE OF THE INPUT FILE AND THE EXTRACTED FILE

Despi

cabl

e.av

i

Time.

mp4

Florid

a.m

p4

Inte

rnat

iona

l.mp4

Just

in.m

p4

Love.

mp4

Jojo

.avi

Baby.

mp4

Nev

er.m

p4

Beep.

avi

0100020003000

Bitrate Before Ex-traction(kbps)Bitrate After Ex-traction(kbps)

Input Files (.mp4/.avi)

Bit

rate

(in

kbp

s)

The bitrates of each of the input files range from 1615kbps to 2723kbps and the bitrates of the extracted files reduces to a minimum of

1411kbps and maximum of 1536kbps giving an average bitrate of 1461kbps.

Page 54: Automatic subtitle generation

TIME TAKEN FOR EXTRACTION OF INPUT FILE

0

1

2

3

4

Time Taken for Ex-traction (in ms)

Input Files (.mp4/.avi)

Tim

e(

in m

s)

The time taken to extract each files vary from 0.6 ms to 3.34 ms with the average extraction time of 1.71 ms

Page 55: Automatic subtitle generation

H E R E C O M E S Y O U R F O O T E R P A G E 5 5

CONCLUSION

Page 56: Automatic subtitle generation

The ASG aims at automatically generating the text for the input audio/video.

It supports all the MPEG standards. The video and subtitles are synchronized. User can extract audio in any MPEG standard formats. Audio of any format can be extracted but speech

recognition

Page 57: Automatic subtitle generation

[1] B. H. Juang; L. R. Rabiner, “Hidden Markov Models for Speech Recognition” Journal of

Technometrics, Vol.33, No. 3. Aug., 1991.

[2] Hong Zhou and Changhui Yu , “Research and design of the audio coding scheme ,” IEEE

Transactions on Consumer Electronics, International Conference on Multimedia

Technology(ICMT) 2011.

[3] Seymour Shlien,”Guide to MPEG-1 Audio Standard”, Broadcast Technology, IEEE

Transactions on Broadcasting, December 1994.

[4] Justin Burdick, “Building a Regionally Inclusive Dictionary for Speech Recognition”,

Computer Science and Linguistics, Spring 2004.

[5] Anand Vardhan Bhalla, Shailesh Khaparkar, “Performance Improvement of Speaker

Recognition System”,International Journal of Advanced Research in Computer Science

and Software Engineering, Volume 2, Issue 3, March 2012.

[6] Petr Pollak, Martin Behunek, “Accuracy of MP3 Speech Recognition Under Real-World

Conditions”, Electrical Engineering, Czech Technical University in Prague, Technick´a 2.

REFERENCES…

Page 58: Automatic subtitle generation

[7] Yu Li, LingHua Zhang, “Implementation and Research of Streaming Media System and AV Codec Based on Handheld Devices” 12th IEEE International Conference on Communication Technology (ICCT), 2010. [8] Ibrahim Patel1 Dr. Y. Srinivas Rao, “Speech Recognition Using HMM with MFCC- An Analysis using Frequency Spectral Decomposition Technique”, Signal & Image Processing: An International Journal(SIPIJ), Vol.1, No.2, December 2010. [9] Jorge Martinez, Hector Perez, Enrique Escamilla, Masahisa Mabo Suzuki,” Speaker recognition using Mel Frequency Cepstral Coefficients (MFCC) and Vector Quantization (VQ) Techniques”, 22nd International Conference on Electrical Communications and Computers (CONIELECOMP), 2012. [10] Sadaoki Furui, Li Deng, Mark Gales,Hermann Ney, and Keiichi Tokuda,, ” Fundamental Technologies in Modern Speech Recognition”, Signal Processing, IEEE Signal Processing Society, November 2012. [11] Youhao Yu “Research on Speech Recognition Technology and Its Application”, Electronics and Information Engineering, International Conference on Computer Science and Electronics Engineering, 2012.

CONTINUED…

Page 59: Automatic subtitle generation

Abhinav Mathur, Tanya Saxena, “Generating Subtitles Automatically using Audio Extraction and Speech

Recognition”, 7th International Conference on Contemporary Computing (IC3), 2014. (Under Review).

PUBLICATION…

Page 60: Automatic subtitle generation

THANK YOU