LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract...
Transcript of LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract...
![Page 1: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/1.jpg)
LATIN-NASTALIQUE SCRIPT
CLASSIFICATION SYSTEMCLASSIFICATION SYSTEMMuhammad Usman Ghani
Research Officer-III
Center for Language Engineering
NASTALIQUE SCRIPT
CLASSIFICATION SYSTEMCLASSIFICATION SYSTEM
![Page 2: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/2.jpg)
Latin script is also used for terminology illustration or other
purposes in Urdu books and Magazines.
The script detection system isolates Nastalique and
The Nastalique script is recognized through
script is recognized by the Tesseract OCR.
Font size independent approach is used.
INTRODUCTION
Font size independent approach is used.
Latin script is also used for terminology illustration or other
The script detection system isolates Nastalique and Latin script.
is recognized through Urdu OCR and Latin
Tesseract OCR.
![Page 3: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/3.jpg)
SYSTEM OVERVIEW
![Page 4: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/4.jpg)
Features Extraction
� Dimensional Features
� Morphological Features
Classification: C4.5 Decision Tree algorithm
SCRIPT CLASSIFICATION
Classification: C4.5 Decision Tree algorithm
![Page 5: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/5.jpg)
Dimensional Features
� Height
� Width
� Area
� Height-to-Width Ratio
Centroid Composite Value
FEATURES EXTRACTION (1)
� Centroid Composite Value
FEATURES EXTRACTION (1)
![Page 6: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/6.jpg)
Morphological Features
FEATURES EXTRACTION (2)(2)
![Page 7: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/7.jpg)
Script type of first ligature in a line is changed to script type of next two CCs, if these two CCs have same script type.
Script type of last ligature in a line is changed to script type of previous two CCs, if these two CCs have same script type.
If a ligature having script type Latin have Nastalique script CCs on its right and left, its script type would be changed to Nastalique.
If a ligature having script type Nastalique have Latin script CCs on its right and left, its script type would be changed to Latin.
If a Latin script ligature has a diacritic associated with it and it is
NEIGHBORING RULES
If a Latin script ligature has a diacritic associated with it and it is placed below the MB or inside the MB, script type of such ligature would be converted to Latin.
Script type of first ligature in a line is changed to script type of next two CCs, if these two CCs have same script type.
Script type of last ligature in a line is changed to script type of previous two CCs, if these two CCs have same script type.
If a ligature having script type Latin have Nastalique script CCs on its right and left, its script type would be changed to Nastalique.
If a ligature having script type Nastalique have Latin script CCs on its right and left, its script type would be changed to Latin.
If a Latin script ligature has a diacritic associated with it and it is If a Latin script ligature has a diacritic associated with it and it is placed below the MB or inside the MB, script type of such ligature
![Page 8: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/8.jpg)
RUN MARKING
![Page 9: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/9.jpg)
99Identity Crisis
(Collective WillNationality)
55(Gallstones(blle saltscholesterolcalcium
RECOGNITION
saltscholesterolcalcium£
![Page 10: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/10.jpg)
Identity Crisis
(Collective Will
Nationality)
(Gallstones)
blle salts
Cholesterol
POST-PROCESSING
Cholesterol
Calcium
![Page 11: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/11.jpg)
QUESTIONS ?
![Page 12: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and](https://reader035.fdocuments.net/reader035/viewer/2022062414/5f74bd0c922cbe12f925417b/html5/thumbnails/12.jpg)
THANK YOU ☺