Ai based character recognition and speech synthesis
-
Upload
ankita-jadhao -
Category
Engineering
-
view
459 -
download
2
Transcript of Ai based character recognition and speech synthesis
![Page 1: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/1.jpg)
Seminar on
“ AI Based Character Recognition and Speech Synthesis”
Developed By:
Kalyani Hadke Rani Kubetkar
Shreya Surjuse Ankita Jadhao
Kruttika Sorte
Guided By
Prof. H. N. Datir
![Page 2: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/2.jpg)
Artificial Intelligence based
Character Recognition and Speech Synthesis
![Page 3: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/3.jpg)
NEED!!!We are facing so many problem in our daily life like, if we capturing the image some time we can not get proper image and not recognize the words.Lots of people have the problem of illiteracy .So we wish that this image should be converted to text for various purposes.While studying, we don’t read the text as a regular practice. So we wish that this text can be converted into audio.Apart which we wish should be captured in image & converted into audio.As generally we prefer hearing songs,
![Page 4: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/4.jpg)
Introduction to CR and SS
• Optical Character Recognition (OCR) is an electronic or mechanical converter.
• OCR converts scanned images or text into machine code.
• Speech Synthesis is the artificial production of human speech.• Speech synthesizer – a computer system used for this purpose.• TTS engine performs:• Language into speech• Symbolic linguistic representation to speech
![Page 5: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/5.jpg)
• Image
OCR
• Recognized text
TEXT• Speech
engine
speech
• Image
OCR
• Recognized text
TEXT• Recognized
text
TEXT• Speech
engine
speech
Overview
![Page 6: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/6.jpg)
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
recognition Network testing
Pre-processing explanation
![Page 7: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/7.jpg)
De-noising
De-skew
Binarization
Pre-processing
![Page 8: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/8.jpg)
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
recognition Network testing
Pre-processing explanation
DFD For Character Recognition System
![Page 9: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/9.jpg)
Image segmentation Decompose sequence of characters in individual
symbols. Directly affects the rate of recognition of script. Locate and identify boundaries of image.
1. External segmentation2. Internal segmentation
SEGMENTATION
![Page 10: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/10.jpg)
. .
Image segmentation is the process of partitioning an image into multiple segments ,so as to change the representation of an image into something that is more meaningful and easier to analyze.
1
23
4
. External Segmentation: determine the character lines in the text.
Image segmentation is the process of partitioning 1
![Page 11: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/11.jpg)
I m a g e
Internal Segmentation: decompose an image of sequence of characters to images of individual symbols
![Page 12: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/12.jpg)
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
recognition Network testing
Pre-processing explanation
DFD For Character Recognition System
![Page 13: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/13.jpg)
• Mapping of symbol image into a corresponding two dimensional binary matrix
• Issue – deciding the size of matrix• Sampling strategy for mapping the symbol
image
Image Digitization - Matrix matching
![Page 14: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/14.jpg)
Input alphabet ‘ a ‘
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
Segmented grid
Digitization
![Page 15: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/15.jpg)
• To feed matrix data to the network it must be linearize to a single dimension
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
…………...0 1 1
![Page 16: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/16.jpg)
N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAMENEURAL
NETWORK
14
1
13
5Image of scanned document
Sub-images of individual letter from document
Binary representation of sub-images. E.g 0 is white and 1 is black.
A supervised neural network that has been trained to recognize images of characters.
Neural network output numeric values corresponding to the recognized characters.
File contains the text of the scanned document.
![Page 17: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/17.jpg)
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
recognition Network testing
Pre-processing explanation
![Page 18: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/18.jpg)
Artificial neural network consists of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems analogous to the biological neurons in the brain. Neurons communicated with weighted links
NEURON NEURONWeighted link
X1
Xn
Output
Wk1
Wkp
SummationSigmoid function
![Page 19: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/19.jpg)
• Feed-forward neural network • A multilayer perceptron • Teaching and adaption of ANN• Implementation the ANN
![Page 20: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/20.jpg)
Neural Network
Input SignalOutput signal
Input layerFirst hidden layer
Second hidden layerOutput layer
![Page 21: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/21.jpg)
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Recognition Network testing
Pre-processing explanation
![Page 22: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/22.jpg)
Neural Network
Input SignalOutput signal
Binary converted image
Obtained text of scanned image
Back-propagation for Error calculationERROR
![Page 23: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/23.jpg)
N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAMENEURAL
NETWORK
14
1
13
5
Sub-images of individual letter from document
Binary representation of sub-images. E.g 0 is white and 1 is black.
A supervised neural network that has been trained to recognize images of characters.
Neural network output numeric values corresponding to the recognized characters.
File contains the text of the scanned document.
Image of scanned document
![Page 24: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/24.jpg)
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
![Page 25: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/25.jpg)
• TTS-Text to Speech engine• a computer-based system that read any text
aloud.• TTS engine consist of Front-end - NLP Back-end -DSP
Speech Synthesis
![Page 26: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/26.jpg)
Modules of Text-to-Speech
Natural language processing
Text PreprocessingText Analysis
Linguistic Analysis
Digital signal
processing
SpeechSynthesizer
TEXT SPEECH
Prosody
Phonemes
Figure 1. A simple but general functional diagram of a TTS system
Input Output
![Page 27: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/27.jpg)
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
![Page 28: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/28.jpg)
• This step called high-level, front-end or text-to-phoneme.
• It consists of the following parts: Text analysis Automatic Phonetization Prosody generation
NLP Module
![Page 29: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/29.jpg)
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
![Page 30: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/30.jpg)
NLP Module
Text Analysis
A pre-processing
A morphological analysis
A contextual analysis
A syntactic-prosodic
Text analysis
![Page 31: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/31.jpg)
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
![Page 32: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/32.jpg)
NLP Module
Automatic Phonetization
Rule-Based
Dictionary-based
Hybrid-approach
Automatic Phonetization
![Page 33: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/33.jpg)
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
![Page 34: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/34.jpg)
NLP Module
Prosody Generation
Pitch
Intonation
Ryhthm
ProsodyGeneration
![Page 35: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/35.jpg)
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
![Page 36: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/36.jpg)
DSP component• Low level phoneme to speech• There are two main technologies used for the
generating synthetic speech waveforms: • Concatenative synthesis • Formant synthesis
![Page 37: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/37.jpg)
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
![Page 38: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/38.jpg)
Formant Synthesis• Formant synthesis – rule-based synthesis• does not use any human speech samples at runtime.• Wave-form created using an acoustic model of the
human vocal tract.• Generates artificial, somewhat robotic speech
![Page 39: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/39.jpg)
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
![Page 40: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/40.jpg)
Concatenative synthesis
• Based on the concatenation of segments of recorded speech.
• Gives the most natural sounding synthesized speech.
![Page 41: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/41.jpg)
Concatenative Synthesis
Diphone Concatenation
Synthesis
Unit Concatenation
Synthesis
Somewhat robotic speech, sonic glitches natural speech
SUBTYPES
![Page 42: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/42.jpg)
• Unit Concatenation Synthesis– Algorithm
• Break language down to small units (phonemes, syllables, etc.)• Create a large database of recorded speech• Each unit is labeled: pitch, duration, prosody, position in syllable, etc.
Labeling is synthesizer-dependant• Target utterance is selected at runtime by determining the best chain
of units (HMM, Decision Tree)• Use DSP to smooth transitions between units
Approaches To Wave-form Generation Concatenative
![Page 43: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/43.jpg)
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning Network
Input Image
File containing
Text of scanned document
NLP DSP SPEECH
TEXT
TTS Engine
TEXT ANALYSIS
Auto PHONEME
Prosody Generation
Formantsynthesis
Concatenative synthesis
![Page 44: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/44.jpg)
Advantages• Machine Language Translation
• Information Retrievals
• Visual Issue (Difficulty seeing text)
• Motor Issue(Difficulty handling a book or paper)
![Page 45: Ai based character recognition and speech synthesis](https://reader036.fdocuments.net/reader036/viewer/2022062302/58892bcc1a28ab77528b70ed/html5/thumbnails/45.jpg)
QUESTIONS????