Text Summarization using Deep Learning and Fuzzy...
-
Upload
duongkhuong -
Category
Documents
-
view
216 -
download
2
Transcript of Text Summarization using Deep Learning and Fuzzy...
Text Summarization using Deep Learning and
Fuzzy Classifier
J. Anitha1 and PVGD Prasad Reddy2
1Associate Professor, IT,
Vignan Institute of Information Technology, Duvvada 2 Professor, Department of Computer Science & System Engineering,
Andhra University
Abstract. Text summarization is the process of automatically creating a shorter
version of one or more text documents. Nowadays, document summarization
plays an important role in information retrieval, which aims at extracting a
condensed version of the original document. The use of text summarization
enables users to reduce the amount of text that must be read while still
assimilating the core information. It uses two main algorithms such as deep
learning algorithm and the fuzzy classifier. The E-GSO is incorporate to
optimize the weights and the combination of deep learning and the fuzzy
generates the score for every sentence which is also called as the hybrid score.
Finally, the summarized document can be obtained from the hybrid score. As
per the experimental analysis, the Fuzzy +DL achieved and average precision
rate of 0.7712, average recall rate of 0.55765 and average f-measure rate of
0.64124 at alpha and beeta as 0.5 and compression ratio of 30%, 40%,50% and
60% . The comparative analysis also provided reasonable results to proved that
the efficiency of the Fuzzy +DL method.
Keywords: Deep learning (DL), Fuzzy classifier, Error tolerant Group search
optimization E-GSO.
1 Deep Learning Score for Sentence Score Generation
This approach consists of three phases: feature extraction, feature enhancement, and
summary generation, which work together to assimilate core information and generate
a coherent, understandable summary. We are exploring various features to improve
the set of sentences selected for the summary, and are using a Restricted Boltzmann
Machine to enhance and abstract those features to improve resultant accuracy without
losing any important information. The deep learning algorithm contains multiple non-
linear operations and also it represents the high level abstractions. The deep learning
algorithm consists of totally three processes which are:
Solution encoding
Fitness computation
Error tolerant GSA
Advanced Science and Technology Letters Vol.147 (SMART DSC-2017), pp.93-97
http://dx.doi.org/10.14257/astl.2017.147.14
ISSN: 2287-1233 ASTL Copyright © 2017 SERSC
Fig. 1. Deep learning for sentence score generation
2 Fitness Computation
Initially, give the members from the member encoding process as an input to the
fitness computation. Consider the feature vectors as training set of data. Model the
training set of data two layer networks which are called as “Restricted Boltzmann
Machine” in which stochastic feature vectors are connected to the stochastic feature
detectors using symmetrically weighted connections. The features correspond to
visible units and feature vectors correspond to hidden units. The energy of the joint
configuration(x, y) of the visible and hidden units are given by,
n number of visible variables
m number of hidden variables
si and tj Bases
x i Binary state of visible unit of i
yj Binary state of hidden unit of j
Z ij weight between x i and yj
3 Error Tolerant GSA
After the fitness computation, three parameters are computing in this process which is
(a) Error change
(b) Member change
(c) New member
(a)Error change:
The error change can be calculated by,
))((
%1
1
tt
tt
eore
eechangeerror
1 1
,
),(n
visiblei
m
hiddenj ji
ijjijjii zyxytxsyxE
Advanced Science and Technology Letters Vol.147 (SMART DSC-2017)
94 Copyright © 2017 SERSC
(b)Member change:
The member change can be calculated by,
])(max[
%1
1
tt
tt
MorM
MMchangemember
(c)New member:
The new member can be calculated by,
eerrorchangMmemberNew tt % 1
4 Performance Analysis
Initially, give the members from the member encoding process as an input to the
fitness computation. Consider the feature vectors as training set of data. Model the
training set of data two layer networks which are called as “Restricted Boltzmann
Machine” in which stochastic feature vectors are connected to the stochastic feature
detectors using symmetrically weighted connections. The features correspond to
visible units and feature vectors correspond to hidden units. The energy of the joint
configuration(x, y) of the visible and hidden units are given by,
Table 1. Precision, Recall and F-measure values
CR
α=0.25,β=0.7
5
α=0.5,β=0.5 α=0.75,β=0.2
5
Precision
30 0.6666 0.75 0.8333
40 0.7 0.7 0.6
50 0.4444 0.7777 0.555
60
0.7142 0.8571 0.5714
recall
30 0.6513 0.6923 0.7692
40 0.5384 0.5384 0.6923
50 0.3625 0.5384 0.6153
60
0.3840 0.4615 0.4615
F-measure
30 0.64 0.72 0.8
40 0.6086 0.6086 0.7826
50 0.5882 0.63636 0.72727
60 0.5
0.6 0.6
Advanced Science and Technology Letters Vol.147 (SMART DSC-2017)
Copyright © 2017 SERSC 95
4.1 Analysis on Precision
The figure 4.5 shows the precision rate at various alpha and beeta values (α=0.5, 0.25,
0.75 and β=0.5,0.25,0.75)as well as different compression ratios such as 30, 40, 50
and 60. The graph depicts that the precision rate is high at the compression ratio 60%
and low at the compression ratio 50%.
Fig. 2. Analysis on precision
5 Conclusion
Text summarization is the process of automatically creating a shorter version of one
or more text documents. Nowadays, document summarization plays an important role
in information retrieval, which aims at extracting a condensed version of the original
document. The use of text summarization enables users to reduce the amount of text
that must be read while still assimilating the core information. It uses two main
algorithms such as deep learning algorithm and the fuzzy classifier. The E-GSO is
incorporate to optimize the weights and the combination of deep learning and the
fuzzy generates the score for every sentence which is also called as the hybrid score.
Finally, the summarized document can be obtained from the hybrid score. As per the
experimental analysis, the Fuzzy +DL achieved and average precision rate of 0.7712,
average recall rate of 0.55765 and average f-measure rate of 0.64124 at alpha and
beeta as 0.5 and compression ratio of 30%,40%,50% and 60% and. The comparative
analysis also provided reasonable results to proved that the efficiency of the Fuzzy
+DL method.
Advanced Science and Technology Letters Vol.147 (SMART DSC-2017)
96 Copyright © 2017 SERSC
References
1. Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.:Summarizing text documents:
sentence selection and evaluation metrics. In: Proceedings of the 22Nd Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR
1999, pp. 121–128. ACM, New York (1999). http://doi.acm.org/10.1145/312624.312665
2. Graham, R., Knuth, D., Patashnik, O.: Concrete Mathematics: A Foundation for Computer
Science. Addison-Wesley, Boston (1994)MATHGoogle Scholar. 3. Hovy, E., Lin, C.-Y., Zhou, L., Fukumoto, J.: Automated summarization evaluation with
basic elements. In: Proceedings of the Fifth Conference on Language Resources and
Evaluation (LREC 2006) (2006)Google Scholar. 4. Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938).
http://www.jstor.org/stable/2332226MathSciNetCrossRefMATHGoogle Scholar. 5. Lin, C.Y.: Looking for a few good metrics: automatic summarization evaluation - how
many samples are enough? In: Proceedings of the NTCIR Workshop 4 (2004)Google
Scholar. 6. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries, pp. 25–26
(2004)Google Scholar. 7. Lin, C.Y., Hovy, E.: Manual and automatic evaluation of summaries. In: Proceedings of the
ACL-2002 Workshop on Automatic Summarization, AS 2002, vol. 4, pp. 45–51.
Association for Computational Linguistics, Stroudsburg, PA, USA (2002).
http://dx.doi.org/10.3115/1118162.1118168. 8. Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence
statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the
Association for Computational Linguistics on Human Language Technology, NAACL
2003, vol. 1, pp. 71–78. Association for Computational Linguistics, Stroudsburg, PA, USA
(2003). http://dx.doi.org/10.3115/1073445.1073465. 9. Mani, I., Maybury, M.T.: Automatic summarization. In: Association for Computational
Linguistic, 39th Annual Meeting and 10th Conference of the European Chapter, Companion
Volume to the Proceedings of the Conference: Proceedings of the Student Research
Workshop and Tutorial Abstracts, p. 5, Toulouse, France, 9-11 July 2001.
Advanced Science and Technology Letters Vol.147 (SMART DSC-2017)
Copyright © 2017 SERSC 97