Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS...
-
Upload
thomas-carpenter -
Category
Documents
-
view
219 -
download
1
Transcript of Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS...
![Page 1: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/1.jpg)
Arabic Language Computing applied to
the Quran
- a PhD research project byKais Dukes
I-AIBS Institute for Artificial Intelligence and Biological Systems
School of Computing
University of Leeds
![Page 2: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/2.jpg)
![Page 3: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/3.jpg)
The Challenge: An interdisciplinary approach to understanding the Quran
![Page 4: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/4.jpg)
(1) What is the Quran?
Holy Book Prophet Text Dated
Suhuf Ibrahim (Scrolls) Abraham ?
The Tawrat (Torah) Moses 1500 BCE?
The Zabur (Psalms) David 1000 BCE?
The Injil (Gospel) Jesus 1 CE
The Quran Muhammad (PBUH) 610-632 CE
The last in a series of 5 religious texts
![Page 5: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/5.jpg)
(1) What is the Quran?
-Classical Arabic, 1300+ years ago
- All believers should learn the text; translations are “interpretations”
- Islamic Law (legal logic)
- Divine guidance & direction
- Science and philosophy
- Has inspired Algebra, Linguistics
The central religious text of Islam
![Page 6: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/6.jpg)
(2) Traditional Arabic Linguistics
- Orthography (diacritics and vowelization)- Etymology (Semitic roots)- Morphology (derivation and inflection)- Syntax (origins of dependency grammar)- Discourse Analysis & Rhetoric- Semantics & Pragmatics
Originated in Arabs studying the language of the Quran (scientific analysis for at least 1000 years – a lot older than English language!):
![Page 7: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/7.jpg)
(3) Computational LinguisticsQuran is online, for keyword searchBUT verse-by-verse translations are interpretationsMuslims should access the “true” Classical Arabic source
![Page 8: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/8.jpg)
(3) Computational Linguistics
Example question-answering dialog system:
QuestionHow long should I breastfeed my child for?
Answer Mothers should suckle their offspring for two years, if the father wishes to complete the term (The Holy Quran, Verse 2:233).
- How far can we go?- Is an Artificial Intelligence system realistic?
![Page 9: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/9.jpg)
An AI approach to understanding the Quran
Central HypothesisAugmenting the text of the Quran with rich annotation will lead to a more accurate AI system.
- Prepare the data by annotating the Quran.- Use the data to build an AI system for concept search and question-answering.
![Page 10: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/10.jpg)
Annotating the QuranChallenges
Orthography - Complex non-standard script
Morphology (word structure) - Arabic is highly inflected, challenging to analyze
Grammar - Phrase structure, dependency
Semantics – Ontology of Entities and Concepts referred to by pronouns and nouns
![Page 11: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/11.jpg)
Annotating the QuranSolutions
- Computing advances have made annotation possible, to high accuracy
- Leverage existing resources from Traditional Arabic Grammar
-Machine-Learning annotation followed by manual verification
-- Community effort using online volunteers
![Page 12: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/12.jpg)
Recent Advances: Orthography
Google Search for verse (68:38) on Jan 21, 2008 shows many typos
An accurate digital copy of the Quran?
Encoding Issues- Missing diacritics
- Simplified script (not Uthmani)
- Windows code page 1256, not Unicode
![Page 13: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/13.jpg)
Recent Advances: OrthographyTanzil Project (http://tanzil.info)
- Stable version released May 2008
- Uses Unicode XML encoding, including the special characters designed for the complex Arabic script of the Quran
- Manually verified to 100% accuracy by a group of experts who have memorized the entire text of the Quran
![Page 14: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/14.jpg)
Recent Advances: OrthographyJava Quran API (http://jqurantree.org)
(Dukes 2009)
- Java classes for querying the Tanzil XML of the Quran
- gives authentic script on web-pages
![Page 15: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/15.jpg)
Recent Advances: Morphology
- Buckwalter Arabic Morphological Analyzer (Tim Buckwalter, 2002)
- Morphological Analysis of the Quran at the University of Haifa (Shuly Wintner, 2004)
- Lexeme & feature based morphological representation of Arabic (Nizar Habash, 2006)
![Page 16: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/16.jpg)
The Haifa Corpus (2004)
Multiple analysis for each word (up to 5)rbb+fa&l+Noun+Triptotic+Masc+Sg+Pron+Dependent+1P+Sgrbb+fa&l+Noun+Triptotic+Masc+Sg+Gen
Not manually verifiedAuthors reports an F-measure of 86%
Non-standard annotation scheme not familiar to traditional Arabic linguists e.g. extracting a list of all verbs is non-trivial
Arabic text is only encoded phonetically instead of using the original Arabic. e.g. searching for a specific root is not easy
![Page 17: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/17.jpg)
The Quranic Arabic Corpus http://corpus.quran.com/
Kais Dukes Arabic Language Computing Applied to the Quran – PhD (part-time)
word structure - colour-coded morphological analysis translation - word-for-word English translations grammar- dependency parse following Arabic tradition semantics – ontology of entities and concepts Machine Learning - annotations used for A.I. training Impact - dozens of researchers have collaborated/cited, and a million visitors have used the website this year
![Page 18: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/18.jpg)
The Quranic Arabic CorpusVerified Uthmani Script
- Unicode Uthmani Script- Sourced from the verified Tanzil project
![Page 19: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/19.jpg)
The Quranic Arabic CorpusPhonetics (faja'alnāhumu)
- Phonetic transcription generated algorithmically- Guided by Arabic vowelized diacritics
![Page 20: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/20.jpg)
The Quranic Arabic CorpusInterlinear translation
- Word-for-word translation from accepted sources- Interlinear translation scheme
![Page 21: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/21.jpg)
The Quranic Arabic CorpusLocation Reference (21:70:4)
- Common standard for verses (Chapter:Verse)- Extended in the QAC corpus to include word numbers and segment numbers, e.g. (21:70:4:2)
![Page 22: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/22.jpg)
The Quranic Arabic CorpusMorphological Segmentation
- Division of a single word into multiple segments- Part-of-speech tag assigned to each segment- Traditional Arabic Grammar rules used for division
![Page 23: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/23.jpg)
The Quranic Arabic CorpusMorphological segment features
![Page 24: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/24.jpg)
The Quranic Arabic CorpusArabic Grammar Summary
![Page 25: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/25.jpg)
The Quranic Arabic TreebankSyntactic Annotation
- Dependency Grammar based onإعراب (i'rāb)- Syntactico-semantic roles for each word
![Page 26: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/26.jpg)
The Quranic Arabic TreebankOntology of entities and concepts
- linked to/from nouns and pronouns in the text
![Page 27: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/27.jpg)
The Quranic Arabic TreebankFramework for collaboration
Message Board:“If you come across a word and you feel that a better analysis could be provided, you can suggest a correction online by clicking on an Arabic word”(currently 5228 resolved messages; 1048 under review)
Resources:Publications; Citations, Reviews, FAQs, Feedback,Data Download, Software download, Mailing list
![Page 28: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/28.jpg)
The Quranic Arabic TreebankUsers: researchers, public
- Artificial Intelligence and Computational Linguistics- Arabic linguistics-Quranic and Islamic Studies-Classical literature analysis-Anyone who wants to appreciate the Quran
![Page 29: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/29.jpg)
The Quranic Arabic Treebanknew Computational Linguistics?
- First Treebank of Classical Arabic
- Free Treebank of the Quran
- First formal representation of Traditional Arabic Grammar using constituency/dependency graphs
- Machine-Learning parser
![Page 30: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/30.jpg)
The Quranic Arabic CorpusPart-of-speech TaggingPart-of-speech Tag Name Arabic Name N Noun اسمPN Proper noun علم اسماءPRON Personal pronoun ضميرDEM Demonstrative pronoun اشارة اسمREL Relative pronoun موصول اسمADJ Adjective صفةV Verb فعلP Preposition جر حرفPART Particle حرفINTG Interrogative particle استفهام حرفVOC Vocative particle نداء حرفNEG Negative particle نفي حرفFUT Future particle استقبال حرفCONJ Conjunction عطف حرفNUM Number رقمT Time adverb زمان ظرفLOC Location adverb مكان ظرفEMPH Emphatic lām prefix التوكيد المPRP Purpose lām prefix التعليل المIMPV Imperative lām prefix االمر المINL Quranic initials مقطعة حروف
-Part-of-speech tags adapted from Traditional Arabic Grammar, and mapped to English equivalents (not the other way around)
- These tags apply to words in the Quran, as well as to individual morphological segments in the text
![Page 31: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/31.jpg)
Automatic AnnotationClassical Arabic Dependency Parser
-
- Joakim Nivre (2009) dependency parsing using a shift/reduce queue/stack architecture with machine learning
- Following similar architecture, but with hand written rules, custom parser has anF-measure of 77.2%
![Page 32: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/32.jpg)
University of Leeds Postgraduate Researcher Conference 2011
Criteria for “PGR Researcher of the Year 2011”• Ability to communicate research to the lay and non-
specialist research audience• Impact/potential impact of the research in terms of
e.g. application of findings for economic or social benefit; the significance of the contribution/potential contribution of the research to the academic subject area
• Evidence of local or national publicity or public engagement.
![Page 33: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/33.jpg)
Ability to communicate research to the lay and non-specialist audienceExample Feedback (319 comments)“I would like to applaud you for your effort” Prof
Behnam Sadeghi, Stanford University“We are big admirers of the work” Prof Gregory Crane,
Classics Dept, Tufts University “I regularly use your work on the Qur'an and read it
whenever I can.” Prof Yousuf Islam, Director, Daffodil International University
“Congratulations to all concerned on this project” - Prof Michael Arthur, VC, Leeds Uni
![Page 34: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/34.jpg)
Impact: application of findings for economic or social benefit
Over a million users already, and growing; many unforseen social benefits, eg:
“I work as a chaplain in correctional centers in the State of Missouri, U.S.A. Thanks for your permission to use the Quranic Arabic Corpus in these correctional centers” Tadar Wazir.
![Page 35: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/35.jpg)
Impact: significance of the research to the academic subject
area• 10 papers in research conferences & journals• 25 citations (from Google Scholar) - so far...• Positive feedback from top researchers• Only free-to-download Arabic treebank• A de-facto standard data-set for AI research
![Page 36: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/36.jpg)
Evidence of local or national publicity or public engagement
Newspapers, eg Muslim Post; better still:Website – world-wide public engagement!
![Page 37: Arabic Language Computing applied to the Quran - a PhD research project by Kais Dukes I-AIBS Institute for Artificial Intelligence and Biological Systems.](https://reader035.fdocuments.net/reader035/viewer/2022062417/5515f47a550346cf6f8b5587/html5/thumbnails/37.jpg)
Conclusion
This is not the endto come: 2nd half of PhD project;
and more?
Kais Dukes I-AIBS Institute for Artificial Intelligence
and Biological Systems
School of Computing
University of Leeds