DATA MINING FOR HADITH CLASSIFICATION BY KAWTHER …
Transcript of DATA MINING FOR HADITH CLASSIFICATION BY KAWTHER …
DATA MINING FOR HADITH CLASSIFICATION
BY
KAWTHER A.ALDHLAN
A thesis submitted in fulfilment of the requirement
for the degree of Doctor of Philosophy in Information and
Communication Technology
Kulliyyah of Information and
Communication Technology
International Islamic University
Malaysia
FEBRUARY 2013
ii
ABSTRACT
The holy Qur'an and Hadith are the two fundamental resources of the legislation and
law in Muslim community. Including the Islamic books, these resources can be used
as the sole authoritative source of knowledge and wisdom. Besides, they stand out as
the source of a large collection of analysis and interpretation texts, which could
provide a gold standard for artificial intelligent (AI) knowledge extraction and
knowledge representation experiments. Recently, there are increasing attentions to
automate the Islamic resources Qur'an, Sunnah and tradition books, motivate
researchers to look for mechanisms that can represent and discover the knowledge of
these resources. In the present study, extracted Islamic knowledge representing the
focal point of the research, three famous books in Hadith science framed the corpus of
this study. The present study attempted to explore new approach to classify Hadith
according to its validity degree (Sahih, Hasan, Da'eef and Maudoo') using data mining
techniques, the proposed Hadith classifier (HC) model was built through learning
process and was represented by the tree structure modeling. Moreover, the attributes
of the instances originally were obtained from the source books directly. Whilst some
of these attributes which is not mentioned in these books were indicated as null
values, or missing values. A novel mechanism was employed to handle these missing
data. This mechanism was generated based on the investigation methods of the Isnad
in Hadith science. Representing or extracting Islamic knowledge is very critical step
because it may affect life of Muslim, therefore, the results of the research were
compared with the resource books, concurrently with the point of view of the expert in
Hadith science. Indeed, the extracted knowledge shed light on the differences between
Al-Imam Al-Bukhari, Al-Termithi and Al-Albani methods in takhareej AL-Hadith. Furthermore, the findings of the research showed that the performance of the proposed
HC had significant effect with the proposed missing data detector method (MDD), the
correct classification rate (CCR) was sharply increased from (50.1502 %) before using
MDD to (97.597%) after applying it . Furthermore, the favorable results of comparing
the performance of HC against naïve bayes classifier indicated that the decision tree
(DT) Modeling is a viable approach to classify Hadith due to the excel performance,
ease of implementation, and ease of rules induction and results interpretation.
iii
HC
HC
MDDCCR
MDD
iv
APPROVAL PAGE
The thesis of Kawther Binti Ali ALdhlan has been approved by the following:
___________________________ Akram M. Zeki
Supervisor
___________________________ Ahmed M. Zeki
Co-Supervisor
___________________________ Tengku Mohd Bin Tengku Sembok
Internal Examiner
___________________________ Imad Fakhri Taha Alshaikhli
Internal Examiner
___________________________ Hassanin M. Al-Barhamtoshy
External Examiner
___________________________ Abdul Kabir Hussain Solihu
Chairman
v
DECLARATION
I hereby declare that this thesis is the result of my own investigations, except where
otherwise stated. I also declare that it has not been previously or concurrently
submitted as a whole for any other degrees at IIUM or other institutions.
Kawther Binti Ali Dhlan Al-Dhlan
Signature …………………………………… Date ……………………..
vi
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
DECLARATION OF COPYRIGHT AND AFFIRMATION
OF FAIR USE OF UNPUBLISHED RESEARCH
Copyright © 3102 by Kawther Binti Ali Al-Dhlan. All rights reserved.
DATA MINING FOR HADITH CLASSIFICATION
No part of this upublished research may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise without prior written permission of the copyright holder except
as provided below.
1. Any material contained in or derived from this unpublished research may
only be used by others in their writing with due acknowledgement.
2. IIUM or its library will have the right to make and transmit copies (print
or electronic) for institutional and academic purposes.
3. The IIUM library will have the right to make, store in a retrieval system
and supply copies of this unpublished research if requested by other
universities and research libraries.
Affirmed by Kawther Binti Ali Dhlan Aldhlan.
…………………………. ………………..
Signature Date
vii
To My Beloved Mother (May Allah Bless Her)
To My Father
To My Husband and My Family
viii
ACKNOWLEDGEMENTS
In the name of Allah, the Most Beneficent, the Most Merciful.
All praise and gratitude to Almighty Allah S.W.T. for giving me an opportunity to
undertake and complete this study.
I felt very grateful for having an exceptional doctoral committee. Without their
guidance and support, I may not be able to reach this far in my academic endeavor.
First and foremost, my deepest gratitude goes to Dr. Akram M. Zeki, my major
supervisor and my mentor. His dedication and patient in guiding me has helped me
get through this study and make me what I am today. My deepest gratitude also goes
to Dr. Ahmed M. Zeki, my co-supervisor, who have greatly contributed in assisting
me to complete my work. I would to extent my thanks to the examiners committee for
their valued notes.
My personal thanks also go to all lecturers and staff of the Kulliyyah of
Information and Communication Technology whose wisdom have shaped my way of
looking at things one way or another.
I am also grateful to the University Of Hail UOH for giving me this priceless
opportunity to further my study in this doctoral program. And special thanks to
Associate Prof. Dr. Naser Ismail from Islamic department in college of education in
UOH for his efforts and guidance.
Finally, I would like to extend my gratitude to my beloved husband, Hamad
Alreshidi, for his understanding, collaborative and undivided support. To my children,
Solaf, Turki and Wed may this success become an inspiration for their future. And to
my mother (God bless her), father, sisters and brothers thank you for your unfailing
faith, thank you for encouragements and prayers.
ix
TABLE OF CONTENTS
Abstract ................................................................................................................... ii
Abstract in Arabic .................................................................................................... iii
Approval Page .......................................................................................................... v
Declaration Page ...................................................................................................... vi
Copyright Page ......................................................................................................... vii
Dedication ................................................................................................................ viii
Acknowledgement ................................................................................................... ix
List of Tables .......................................................................................................... xiii
List of Figures ......................................................................................................... xv
List of Abbreviations .............................................................................................. xviii
List of Symbols ........................................................................................................ xxi
CHAPTER ONE: INTRODUCTION ................................................................. 1
1.1 The Importance of Hadith ................................................................... 1
1.2 Background of the Study ........................................................................ 3
1.3 Statement of the Problem ....................................................................... 8
1.4 Research Objectives .............................................................................. 10
1.5 Research Questions ................................................................................ 10
1.6 Research Hypotheses ............................................................................. 11
1.7 Significance of the Study ....................................................................... 11
1.8 Research Model ...................................................................................... 12
1.8.1 The Research Framework ........................................................... 12
1.8.2 Data Representation Models ........................................................ 14
1.8.2.1 Hadith Structure ............................................................... 14
1.8.2.2 The Representation of Hadith Structure ............................ 15
1.9 Research Approach ............................................................................... 16
1.10 Organization of the Thesis ................................................................... 17
CHAPTER TWO: OVERVIEW OF HADITH SCIENCE ............................... 19
2.1 Introduction ............................................................................................ 19
2.2 Definition of Hadith ............................................................................... 20
2.3 Component of Hadith ........................................................................... 20
2.4 Hadith Verification ............................................................................... 22
2.5 Classification of Hadith ........................................................................ 24
2.6 Summary ............................................................................................... 33
CHAPTER THREE: LITRATURE REVIEW ................................................... 35
3.1 Introduction ............................................................................................ 35
3.2 Overview of Data Mining ...................................................................... 35
3.2.1 What Is a Data Mining? ............................................................... 35
3.2.2 Data Mining Approaches ............................................................. 37
3.2.2.1 Supervised learning ........................................................... 37
x
3.2.2.2 Unsupervised Clustering .................................................... 45
3.2.2.3 Semi-supervised learning ................................................... 46
3.2.3 How to Mine Data ........................................................................ 46
3.2.3.1 The Knowledge Discovery Process .................................. 47
3.2.4 Applications of Data Mining ........................................................ 51
3.2.5 Why Data Mining ............................................................................ 57
3.3 The role of ICT in Islam ................................................................................... 58
3.4 Applications of DM in Hadith Science .................................................. 60
3.5 Summary ................................................................................................ 67
CHAPTER FOUR: RESEARCH METHODOLOGY ....................................... 68
4.1 Introduction ............................................................................................ 68
4.2 Research Framework .............................................................................. 68
4.3 Research Procedures .............................................................................. 72
4.4 The Sample of the Study ........................................................................ 81
4.5 Evaluation Strategy ................................................................................ 84
4.5.1 Confusion Matrix ......................................................................... 84
4.5.2 Correct Classification Rate .......................................................... 85
4.5.3 Error Rate ..................................................................................... 86
4.5.4 Sensitivity ..................................................................................... 86
4.5.5 Specificity ................................................................................... 87
4.5.6 Precision ....................................................................................... 87
4.5.7 F-Measure .................................................................................. 87
4.5.8 Receiver Operating Characteristic .............................................. 88
4.5.9 Area under the Curve ................................................................... 90
4.6 Summary ................................................................................................ 90
CHAPTER FIVE: DATA PRE-PROCESSING AND ALGORITHMS ........... 92
5.1 Introduction ........................................................................................... 92
5.2 Corpus of the Study ............................................................................... 92
5.3 Data Pre-Processing ............................................................................... 94
5.4 Attributes Selection ............................................................................... 95
5.5 The Missing Data Detector Method (MDD) .......................................... 98
5.6 Decision Tree Construction ................................................................... 113
5.6.1 Select the Attributes .................................................................... 114
5.6.1.1 Entropy (Information Impurity) ......................................... 114
5.6.1.2 Information Gain Criterion ................................................ 115
5.6.1.3 Gain ratio ........................................................................... 116
5.6.2 Pruning Option ............................................................................. 116
5.6.2.1 Pre-Pruning ........................................................................ 117
5.6.2.2 Post-Pruning ....................................................................... 117
5.6.2.3 Comparisons of pre-pruning and post-pruning .................. 117
5.6.3 Rule Induction Based on the Pruned Tree.................................... 119
5.7 C4.5 Algorithm ..................................................................................... 119
5.8 Summary ............................................................................................... 120
xi
CHAPTER SIX: MODEL EVALUATION AND DISCUSSION ..................... 121
6.1 Introduction ............................................................................................ 121
6.2 Implementation ...................................................................................... 122
6.3 Experiment Procedures .......................................................................... 122
6.3.1 Training Procedures ..................................................................... 122
6.3.2 Testing Procedures ....................................................................... 129
6.3.2.1 Trial#1: Test Phase with (33.3%) of the Sample size ........ 129
6.3.2.2 Trial#2: Test Phase with (33.3%) of the Sample size ........ 132
6.3.2.3 Trial#3: Test Phase with (11.1%) of the Sample size ........ 138
6.3.2.4 Trial#4: Test Phase with (11.1%) of the Sample size ........ 140
6.3.3 Comparing the Results of HC with the Expert Point View ......... 144
6.4 The Comparison between the HC and Naïve Bayes Classifier .............. 146
6.5 Summary ............................................................................................... 149
CHAPTER SEVEN: CONCLUSION AND FUTURE WORKS ....................... 151
7.1 Overview ............................................................................................... 151
7.2 Summary of the proposed method ........................................................ 151
7.3 Summary of Findings ............................................................................. 153
7.4 Comparative study with literature ......................................................... 154
7.5 Conclusion .............................................................................................. 156
7.6 Contributions .......................................................................................... 157
7.7 Future Directions .................................................................................... 158
BIBLIOGRAPHY ................................................................................................. 160
APPENDIX A ......................................................................................................... 169
GLOSSARY ............................................................................................................ 175
xii
LIST OF TABLES
Table No. Page No.
3.1 Summarization of the previous works in Hadith 65
4.1 The attributes of the sample set 76
4.2 No. of Hadith in Al-Bukhari collection 81
4.3 Al-Bukhari narrators' grade 82
4.4 Confusion matrix 85
4.5 AUC index and its effectiveness for discrimination 90
5.1 Results of pre-processing phase for Al-Hadith in Figure 5.4 95
5.2 Hadith attributes according to (Al-Suyutih, 1965,Tahan,1996) 97
5.3 The Tracing Table of the Example (1) 103
5.4 Hadith terms used to indicate the narrator's reliability 104
5.5 The Tracing Table of the Example (2) 108
5.6 Hadith terms used to indicate the narrator's retention (preservation) 108
5.7 The Tracing Table of the Example (3) 111
5.8 The Tracing Table of the Example (4) 113
5.9 The rank of the attributes according to the entropy value 115
5.10 The rank of the attributes according to the information gained Method 115
5.11 The rank of the attributes according to gain ratio method 116
6.1 The information gained of the Hadith features 123
6.2 The evaluation results of Hadith Classifier (training phase) 127
6.3 The evaluation results of HC of trial#1 131
6.4 The evaluation results of HC of trial#2 136
6.5 The evaluation results of HC of trial#3 138
xiii
6.6 The evaluation results of HC of trial#4 142
6.7 The RS Between the Results of HC and the Expert Point View 145
6.8 The comparison between Naïve Bayes classifier and HC classifier 146
7.1 Comparative Study with the literature 155
xiv
LIST OF FIGURES
Figure No. Page No.
1.1 The research framework 12
1.2 The proposed Hadith Classifier HC 13
1.3 Hadith components 14
1.4 Bottom- Up approach 15
1.5 Top- Down approach 15
2.1 Components of Hadith 21
2.2 Hadith classification 24
2.3 Mursal مُرْسَل (hurried) case 26
2.4 Munqati’ مُنْقَطِع (broken) case 26
2.5 Mu’allaq مُعَلَق (hanging) case 27
3.1 The boundary hyperplane in support vector machine 39
3.2 Illustration of decision tree with replication 43
3.3 An overview of the knowledge discovery process (KDP) 50
4.1 Hadith classification without handling missing data 69
4.2 The process of missing data detector MDD 70
4.3 The proposed Hadith Classifier 71
4.4 The research procedures 73
4.5 The first stage of the experiment (Training phase) 78
4.6 The second stage of the experiment (Testing phase I) 79
4.7 The second stage of the experiment (Testing phase II) 80
4.8 AL-Ahadith in Sahih Al-Bukhari 81
4.9 Summary of Al-Bukhari narrators' grade 82
xv
4.10 AL-Ahadith in Jami'u AL-Termithi 83
4.11 AL-Ahadith in Silsilat Al-AHadith Al-Dae'ifah w' Al-Mawdu'ah. 83
4.12 The confusion matrix of the training model 85
4.13 An ROC curve and different points of significance 89
5.1 Example of Sahih Hadith 93
5.2 Example of Hasan Hadith 93
5.3 Example of Da'eef Hadith 93
5.4 Example of Maudoo' Hadith 93
5.5 The metadata of the Hadith narrators in narrator table 99
5.6 The relationships between the narrators in the teacher table 99
5.7 The flowchart of the first method 100
5.8 The flowchart of the second method 102
5.9 The flowchart of the narrators' reliability method (Part I) 106
5.10 The flowchart of the narrators 'reliability method (Part II) 107
5.11 The flowchart of the method of determining the narrator's
preservation 110
5.12 The flowchart of the method that determines the value of the Isnad
defective 112
5.13 Part of DT tree for HC 118
6.1 The results of identification narrators of the Isnad chain 124
6.2 The induced rules after parsing the path from the root to the leaf node 124
6.3 The decision tree of the target Hadith Classifier 125
6.4 The confusion matrix of the training model 126
6.5 ROC curves of the classes in Hadith classifier (Training Phase) 128
6.6 Test procedures 129
6.7 The confusion matrix of Hadith Classifier of trial#1 130
xvi
6.8 ROC curves of the classes in HC for trial#1 132
6.9 Isnad identification Process for 33.3% of test dataset 134
6.10 Classification process of 33.3% of test dataset 135
6.11 The confusion matrix of Hadith Classifier of trial#2 136
6.12 ROC curves of the classes in HC for trial#2 137
6.13 The confusion matrix of Hadith Classifier of trial#3 138
6.14 ROC curves of the classes in HC for trial#3 139
6.15 Isnad identification process for trial#4 140
6.16 Classification process of trial#4 141
6.17 The confusion matrix of Hadith Classifier of trial#4 141
6.18 ROC curves of the classes in Hadith Classifier for trial#4 143
6.19 ROC curves of the naïve Bayes classifier 148
6.20 CCR of naïve Bayes classifier and DT classifier 149
7.1 Summary of the research approach 152
xvii
LIST OF ABBREVIATIONS
Acc Accurate rate
ANN Artificial Neural Network
AUC Area Under the Curve
CART Classification And Regression Trees
CBR Case-Based Reasoning
CCR Correct Classification Rate
CCRE Correct Classification Rate according to the Expert
CCRHC Correct Classification Rate according to the HC.
DFT Document Frequency Thresholding
DM Data Mining
DOB Date Of Birth
DOD Date Of Death
DTs Decision Trees
ER Error Rate
FN False Negative
FP False Positive
GA Genetic Algorithms
HC Hadith Classifier
ICT Information Communication Technology
ID Identification
ID3 Interactive Dicotomizer
KDP Knowledge Discovery Process
xviii
K-NN K- Nearest Neighbor
MAX Maximum
MDD Missing Data Detector
MIN Minimum
NB Naïve Bayes
NCP Number of Correct Prediction
NOC Total Number of Cases
NOCR Number of Corrected Rules
NOP Total Number of Predictions
NOR Number of Rules
NWP Number of Wrong Predictions
RI Rule Induction
RS Rate of similarity.
ROC Receiver Operating Characteristic
SN Sensitivity
SNoW Sparse Network of Winnows
SP Specificity
SVD Singular Value Decomposition
SVM Support Vector Machines
TF/IDF Term Frequency –Inverse Documentation Frequency
TN True Negative
TNR True Negative Rate
TP True Positive
TPR True Positive Rate
xix
UML Unified Modeling Language.
VC Vapnik Chervonenkis
VSM Vector Space Model
WWW World Wide Web
xx
LIST OF SYMBOLS
C Corpus
CCR Correct Classification Rate
ER Error Rate
FN False Negative
FP False Positive
Hi Tested Hadith
I entropy Information entropy
IG Information Gain
J The total number of classes
K Attributes
L Dataset
M Feature
N Training instance/ sample size
NCP Number of Correct Prediction
NOC Total Number of cases
NOP Total Number of Predictions
NWP Number of wrong Predictions
P(i) Probability
Rcv
The V-fold cross validation accuracy
Rte
Validation accuracy
SP Specificity
TN True Negative
xxi
TNR True Negative Rate
TP True Positive
TPR True Positive Rate
V Subsets
v_best The best attribute
xi Instance
Yi Class
1
CHAPTER ONE
INTRODUCTION
1.1 THE IMPORTANCE OF HADITH
The Qur'an is the last divine book, which was revealed from Allah as a declaration
and guidance to mankind. It is an explanation of all things and means for men to be
rightly guided. In many verses of the Qur'an, it is commanded to obey the prophet of
Allah.
قُلْ أَطيعوا اللَّه وأَطيعوا الرسولَ فَإِنْ تولَّوا فَإِنما علَيه ما حملَ وعلَيكُم ما (بِينلاغُ الْمولِ إِلا الْبسلَى الرا عموا ودتهت وهيعطإِنْ تو ملْتمسورة () ح
)54النور Say: "Obey Allah and obey the Messenger, but if you turn away, he (Messenger Muhammad ) is only responsible for the duty placed on him (i.e. to convey Allah's Message) and you for that placed on you. If you obey him, you shall be on the right guidance. The Messenger's duty is only to convey (the message) in a clear way (i.e. to preach in a plain way)." (The Holy Qur'an 24: 54)1
This is quite a significant point because understanding the Qur'an fully can
only be possible with following the Sunnah of the prophet. Sunnah, means the actions,
sayings and silent permissions (or disapprovals) of the Prophet. The word Sunnah is
also used to refer to religious duties that are optional. Furthermore, Sunnah means the
recorded sayings (Hadith) of Prophet Muhammad . In this sense, Muslims believe
that the Sunnah of the Prophet Muhammad is the second of the two revealed
fundamental sources of Islam, after the Holy Qur'an (Hasan, 2004). It is impossible to
1 Translation of the meaning of the Nobel Qur'an was taken from http://www.dar ussalam.com/TheNobleQuran/ , The Nobel Qur'an.
2
understand the Qur'an without reference to the Hadith; and it is impossible to explain
Hadith without relating it to the Qur'an. Where the Quran gives Muslims a broad
framework for how they should live, the Hadith gives them specific information. For
instance:
Qur'an commands Muslim to pray, prophet Muhammad explained when and
how to pray:
)لُّوا كَملِّيصونِي أُصمتأَيا ر(َ .انخيالش هجرأَخ Perform your prayer in the same manner you had seen me doing [Reported by Al-Bukhai& Muslim].2
Qur'an commands Muslim to make Hajj, prophet Muhammad explained
how to perform Hajj
)اسني منذُوا عمٍ )كَكُمخلسم هجرأَخ Take from me your rituals (of Hajj) [Reported by Muslim]3
Moreover, Allah has informed in the surah Al-e Imran that the Prophet had
the characteristic of teaching the Qur'an and purifying mankind:
)ع اللَّه نم لَقَد هاتآي هِملَيلُو عتي فُسِهِمأَن نولاً مسر يهِمثَ فعإِذْ ب نِينمؤلَى الْم) ويزكِّيهِم ويعلِّمهم الْكتاب والْحكْمةَ وإِنْ كَانوا من قَبلُ لَفي ضلالٍ مبِينٍ
)164سورة آل عمران (Indeed Allah conferred a great favor on the believers when He sent among them a Messenger (Muhammad ) from among themselves, reciting unto them His Verses (the Qur'an), and purifying them (from sins by their following him), and instructing them (in) the Book (the Qur'an) and Al-Hikmah [the wisdom and the Sunnah of the Prophet (i.e. his legal ways, statements, acts of worship, etc.)], while before that they had been in manifest error (The Holy Qur'an 2: 164)
2Translation of the Hadith was taken from http://3refe.com/vb/showthread.php?t=160957 The official website of Dr.Al-Ereefi 3 Translation of the Hadith was taken from http://en.wathakker.net/articles/print/584 wathakker.com
3
It would be useful to attract attention to the phrase "teaching the book and the
wisdom which emphasizes the relationship between Qur'an and Hadith and indicates
Hadith as the second resource of the knowledge and wisdom, therefore, Hadith is an
important source of reference for the development of Islamic laws, the Muslim
community recorded prophet Muhammad's words and actions for posterity, and as the
number of these reported conversations grew exponentially in the century after his
death, the community developed sophisticated methods to evaluate their veracity to
know which traditions were reliable, and which were clearly fraudulent. While the
early collections of Hadith often contained Hadith that were of questionable origin,
collections of authenticated Hadith called Sahih (sound, true, correct) were compiled,
gradually by many Hadith scientists such as Al-Bukhari, Muslim, Al-Tirmidhi, Al-
Nasa'i, Ibnu Majah, Abu Daud, Al-Darimi, Malek and Ibnu Hanbal (Hadith-
traditional books, 2009).
1.2 BACKGROUND OF THE STUDY
Nowadays, many studies are published and much software are developed to serve the
prophetic tradition through several channels that can help students and scientists to
find tremendous amount of information in the simplest and the easiest way.
Furthermore, such software introduce quick electronic search instead of the manual
search (Aldhlan et al., 2010).
Most initial studies concentrated on transferring the Hadith resources books
such as Al-Bukhari collection, Muslim collection and other books into databases
either on websites or as software. However, it has become necessary to find smart
approaches that can adopt Hadith literature rather than simply storing Hadith
resources on a compact disk or websites with all probabilities of having negative