Text Summarization using Deep Learning and Fuzzy...

5
Text Summarization using Deep Learning and Fuzzy Classifier J. Anitha 1 and PVGD Prasad Reddy 2 1 Associate Professor, IT, Vignan Institute of Information Technology, Duvvada 2 Professor, Department of Computer Science & System Engineering, Andhra University Abstract. Text summarization is the process of automatically creating a shorter version of one or more text documents. Nowadays, document summarization plays an important role in information retrieval, which aims at extracting a condensed version of the original document. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. It uses two main algorithms such as deep learning algorithm and the fuzzy classifier. The E-GSO is incorporate to optimize the weights and the combination of deep learning and the fuzzy generates the score for every sentence which is also called as the hybrid score. Finally, the summarized document can be obtained from the hybrid score. As per the experimental analysis, the Fuzzy +DL achieved and average precision rate of 0.7712, average recall rate of 0.55765 and average f-measure rate of 0.64124 at alpha and beeta as 0.5 and compression ratio of 30%, 40%,50% and 60% . The comparative analysis also provided reasonable results to proved that the efficiency of the Fuzzy +DL method. Keywords: Deep learning (DL), Fuzzy classifier, Error tolerant Group search optimization E-GSO. 1 Deep Learning Score for Sentence Score Generation This approach consists of three phases: feature extraction, feature enhancement, and summary generation, which work together to assimilate core information and generate a coherent, understandable summary. We are exploring various features to improve the set of sentences selected for the summary, and are using a Restricted Boltzmann Machine to enhance and abstract those features to improve resultant accuracy without losing any important information. The deep learning algorithm contains multiple non- linear operations and also it represents the high level abstractions. The deep learning algorithm consists of totally three processes which are: Solution encoding Fitness computation Error tolerant GSA Advanced Science and Technology Letters Vol.147 (SMART DSC-2017), pp.93-97 http://dx.doi.org/10.14257/astl.2017.147.14 ISSN: 2287-1233 ASTL Copyright © 2017 SERSC

Transcript of Text Summarization using Deep Learning and Fuzzy...

Text Summarization using Deep Learning and

Fuzzy Classifier

J. Anitha1 and PVGD Prasad Reddy2

1Associate Professor, IT,

Vignan Institute of Information Technology, Duvvada 2 Professor, Department of Computer Science & System Engineering,

Andhra University

Abstract. Text summarization is the process of automatically creating a shorter

version of one or more text documents. Nowadays, document summarization

plays an important role in information retrieval, which aims at extracting a

condensed version of the original document. The use of text summarization

enables users to reduce the amount of text that must be read while still

assimilating the core information. It uses two main algorithms such as deep

learning algorithm and the fuzzy classifier. The E-GSO is incorporate to

optimize the weights and the combination of deep learning and the fuzzy

generates the score for every sentence which is also called as the hybrid score.

Finally, the summarized document can be obtained from the hybrid score. As

per the experimental analysis, the Fuzzy +DL achieved and average precision

rate of 0.7712, average recall rate of 0.55765 and average f-measure rate of

0.64124 at alpha and beeta as 0.5 and compression ratio of 30%, 40%,50% and

60% . The comparative analysis also provided reasonable results to proved that

the efficiency of the Fuzzy +DL method.

Keywords: Deep learning (DL), Fuzzy classifier, Error tolerant Group search

optimization E-GSO.

1 Deep Learning Score for Sentence Score Generation

This approach consists of three phases: feature extraction, feature enhancement, and

summary generation, which work together to assimilate core information and generate

a coherent, understandable summary. We are exploring various features to improve

the set of sentences selected for the summary, and are using a Restricted Boltzmann

Machine to enhance and abstract those features to improve resultant accuracy without

losing any important information. The deep learning algorithm contains multiple non-

linear operations and also it represents the high level abstractions. The deep learning

algorithm consists of totally three processes which are:

Solution encoding

Fitness computation

Error tolerant GSA

Advanced Science and Technology Letters Vol.147 (SMART DSC-2017), pp.93-97

http://dx.doi.org/10.14257/astl.2017.147.14

ISSN: 2287-1233 ASTL Copyright © 2017 SERSC

Fig. 1. Deep learning for sentence score generation

2 Fitness Computation

Initially, give the members from the member encoding process as an input to the

fitness computation. Consider the feature vectors as training set of data. Model the

training set of data two layer networks which are called as “Restricted Boltzmann

Machine” in which stochastic feature vectors are connected to the stochastic feature

detectors using symmetrically weighted connections. The features correspond to

visible units and feature vectors correspond to hidden units. The energy of the joint

configuration(x, y) of the visible and hidden units are given by,

n number of visible variables

m number of hidden variables

si and tj Bases

x i Binary state of visible unit of i

yj Binary state of hidden unit of j

Z ij weight between x i and yj

3 Error Tolerant GSA

After the fitness computation, three parameters are computing in this process which is

(a) Error change

(b) Member change

(c) New member

(a)Error change:

The error change can be calculated by,

))((

%1

1

tt

tt

eore

eechangeerror

1 1

,

),(n

visiblei

m

hiddenj ji

ijjijjii zyxytxsyxE

Advanced Science and Technology Letters Vol.147 (SMART DSC-2017)

94 Copyright © 2017 SERSC

(b)Member change:

The member change can be calculated by,

])(max[

%1

1

tt

tt

MorM

MMchangemember

(c)New member:

The new member can be calculated by,

eerrorchangMmemberNew tt % 1

4 Performance Analysis

Initially, give the members from the member encoding process as an input to the

fitness computation. Consider the feature vectors as training set of data. Model the

training set of data two layer networks which are called as “Restricted Boltzmann

Machine” in which stochastic feature vectors are connected to the stochastic feature

detectors using symmetrically weighted connections. The features correspond to

visible units and feature vectors correspond to hidden units. The energy of the joint

configuration(x, y) of the visible and hidden units are given by,

Table 1. Precision, Recall and F-measure values

CR

α=0.25,β=0.7

5

α=0.5,β=0.5 α=0.75,β=0.2

5

Precision

30 0.6666 0.75 0.8333

40 0.7 0.7 0.6

50 0.4444 0.7777 0.555

60

0.7142 0.8571 0.5714

recall

30 0.6513 0.6923 0.7692

40 0.5384 0.5384 0.6923

50 0.3625 0.5384 0.6153

60

0.3840 0.4615 0.4615

F-measure

30 0.64 0.72 0.8

40 0.6086 0.6086 0.7826

50 0.5882 0.63636 0.72727

60 0.5

0.6 0.6

Advanced Science and Technology Letters Vol.147 (SMART DSC-2017)

Copyright © 2017 SERSC 95

4.1 Analysis on Precision

The figure 4.5 shows the precision rate at various alpha and beeta values (α=0.5, 0.25,

0.75 and β=0.5,0.25,0.75)as well as different compression ratios such as 30, 40, 50

and 60. The graph depicts that the precision rate is high at the compression ratio 60%

and low at the compression ratio 50%.

Fig. 2. Analysis on precision

5 Conclusion

Text summarization is the process of automatically creating a shorter version of one

or more text documents. Nowadays, document summarization plays an important role

in information retrieval, which aims at extracting a condensed version of the original

document. The use of text summarization enables users to reduce the amount of text

that must be read while still assimilating the core information. It uses two main

algorithms such as deep learning algorithm and the fuzzy classifier. The E-GSO is

incorporate to optimize the weights and the combination of deep learning and the

fuzzy generates the score for every sentence which is also called as the hybrid score.

Finally, the summarized document can be obtained from the hybrid score. As per the

experimental analysis, the Fuzzy +DL achieved and average precision rate of 0.7712,

average recall rate of 0.55765 and average f-measure rate of 0.64124 at alpha and

beeta as 0.5 and compression ratio of 30%,40%,50% and 60% and. The comparative

analysis also provided reasonable results to proved that the efficiency of the Fuzzy

+DL method.

Advanced Science and Technology Letters Vol.147 (SMART DSC-2017)

96 Copyright © 2017 SERSC

References

1. Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.:Summarizing text documents:

sentence selection and evaluation metrics. In: Proceedings of the 22Nd Annual International

ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR

1999, pp. 121–128. ACM, New York (1999). http://doi.acm.org/10.1145/312624.312665

2. Graham, R., Knuth, D., Patashnik, O.: Concrete Mathematics: A Foundation for Computer

Science. Addison-Wesley, Boston (1994)MATHGoogle Scholar. 3. Hovy, E., Lin, C.-Y., Zhou, L., Fukumoto, J.: Automated summarization evaluation with

basic elements. In: Proceedings of the Fifth Conference on Language Resources and

Evaluation (LREC 2006) (2006)Google Scholar. 4. Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938).

http://www.jstor.org/stable/2332226MathSciNetCrossRefMATHGoogle Scholar. 5. Lin, C.Y.: Looking for a few good metrics: automatic summarization evaluation - how

many samples are enough? In: Proceedings of the NTCIR Workshop 4 (2004)Google

Scholar. 6. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries, pp. 25–26

(2004)Google Scholar. 7. Lin, C.Y., Hovy, E.: Manual and automatic evaluation of summaries. In: Proceedings of the

ACL-2002 Workshop on Automatic Summarization, AS 2002, vol. 4, pp. 45–51.

Association for Computational Linguistics, Stroudsburg, PA, USA (2002).

http://dx.doi.org/10.3115/1118162.1118168. 8. Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence

statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the

Association for Computational Linguistics on Human Language Technology, NAACL

2003, vol. 1, pp. 71–78. Association for Computational Linguistics, Stroudsburg, PA, USA

(2003). http://dx.doi.org/10.3115/1073445.1073465. 9. Mani, I., Maybury, M.T.: Automatic summarization. In: Association for Computational

Linguistic, 39th Annual Meeting and 10th Conference of the European Chapter, Companion

Volume to the Proceedings of the Conference: Proceedings of the Student Research

Workshop and Tutorial Abstracts, p. 5, Toulouse, France, 9-11 July 2001.

Advanced Science and Technology Letters Vol.147 (SMART DSC-2017)

Copyright © 2017 SERSC 97