Survey on Summarization Techniques and Existing Work

18
Survey on Summarization Techniques and Existing Work Prof. Satish Kamble 1.1 , Shivlila Mandage 1 ,Shubhangi Topale 1 , Dipali Vagare 1 , Prerana Babbar 1 , 1.1 Professor, PVG’s College of Engineering and Technology, Pune, Maharashtra, India [email protected] 1 Student, PVG’s College of Engineering and Technology, Pune, Maharashtra, India {shivlilamandge32,shubhangitopale25,dipali.vagare1995,[email protected] Abstract The text document summarization is the concept of summarizing the text where the whole document is condensed to a smaller version retaining its original abstract meaning. In extractive summarization technique it selects the most important sentences from documents to make a summary. Due to increasing amount of news articles on web, it is very difficult for a user to read all the newspapers and to sort and extract all different perspectives of the news. In this survey paper we explore the most prominent and significant work done in the single and multiple document summarization on English as well as other regional languages (Marathi, Bangla). Whereas, for English language there are many techniques have been proposed, rare works has been done for Marathi text summarization. Usually summarization technique has two classes as Abstractive summaries and Extractive summaries. Keywords: Text document summarization, Abstractive summaries, Extractive summaries; International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com 69

Transcript of Survey on Summarization Techniques and Existing Work

Survey on Summarization Techniques and

Existing Work

Prof. Satish Kamble1.1, Shivlila Mandage1,Shubhangi Topale1, Dipali Vagare1,

Prerana Babbar1,

1.1Professor, PVG’s College of Engineering and Technology, Pune,

Maharashtra, India

[email protected]

1Student, PVG’s College of Engineering and Technology, Pune,

Maharashtra, India

{shivlilamandge32,shubhangitopale25,dipali.vagare1995,[email protected]

Abstract

The text document summarization is the concept of summarizing the text

where the whole document is condensed to a smaller version retaining its

original abstract meaning. In extractive summarization technique it selects

the most important sentences from documents to make a summary. Due to

increasing amount of news articles on web, it is very difficult for a user to

read all the newspapers and to sort and extract all different perspectives of

the news. In this survey paper we explore the most prominent and

significant work done in the single and multiple document summarization

on English as well as other regional languages (Marathi,

Bangla). Whereas, for English language there are many techniques have

been proposed, rare works has been done for Marathi text summarization.

Usually summarization technique has two classes as Abstractive

summaries and Extractive summaries.

Keywords: Text document summarization, Abstractive summaries,

Extractive summaries;

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

69

1 Introduction

In this growing digital age, for supporting and interpreting text information text

summarization has become a crucial and timely tool. Automatic text summarization is

the methodology in which text is reduced to create summary from an original

document. Text-summarization is an important application, given the exponential rise

in data available on Net. Reduction of text is a complex problem which poses many

challenges to the scientific community.

The main intention of the text summarization is to convey the important ideas in

the documents, by eliminating less important and redundant pieces of information.

About 71 million people speak Marathi as their native language. Marathi is one of the

top 22 official languages of India. In recent years many efforts are undertaken to get

Marathi data sources into main stream.

While summarization two fundamental questions arise:

How to select essential content?

How to express the selected content in a concise manner?

For summarization of English document many systems are available which

provides adequate accuracy. Yet there is no such a system for summarization of

Marathi document. This can be helpful for individuals who are trying to read and

access large amounts of Marathi information in a short interval of time. The main aim

[1] of summarization is to summarize a document by

To extract important facts from document.

To cover its all silent points and aspects.

To narrow down on particular statement.

To include most important statements without any error or repetitions.

There are three sections of the summarization system: Pre-processing: the Text

document step includes finding boundary of Marathi sentences, Tokenizing the

sentence, removal of Marathi-stop-words, Stemming and breaking Marathi document

into collection of sentences & Elimination of duplicate sentences. In sentence ranking:

step, the importance of sentences are determined and calculated by considering

different features. For selecting important sentences in summary distinct features are:

Frequency of word, Positional value, cue-words, Skeleton of the Document (words in

title and header) [38]. Summary: After ranking the sentence based on their score and

selecting top ranked sentences the summary is formed. To increase the readability of

summary, the sentences in summary is reordered based on their appearances in

original text; for example, the sentence which occurs first in original text will appear

first in summary.

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

70

2 Related Work

There are distinct automatic text summarization systems available for most of the

commonly used natural language. For English Text following work has been done in

recent years.

The Luhn developed "The automatic creation of literature abstracts" in 1958 [2]

which perform the summarization of a single-document and propose the frequency of

a singular word in a document to be a valuable measure of significance. Though the

technology developed by Luhn’s was a preliminary step towards the summarization,

but his many ideas are still found to be a very effective for text summarization. Jing

presented "Sentence Reduction for Automatic Text Summarization" in 2000 [3] the

system Which reduce a sentence by removing unsuitable phrases like prepositional

phrases, clauses, to infinitives, or gerunds from sentences Edmundson presented

"New methods in automatic extracting" 1969[4] and proposed a ordinary structure of

text summarization methodology. From his previous work he had taken two ideas

such as positional value and frequency of word. There were two primary features used

the first one was skeleton of the document (the sentence containing a title or heading)

and the second one was presence of cue words (occurrence of words like significant,

or hardly).

In recent years, immense amount of text documents in Indian languages are up for

grabs on internet. For better management and retrieval of such documents, automatic

classification can be helpful. Till 2013, classification of Marathi text documents was

unexplored, so in work [11] various classification methods are compared for Marathi

Text. After testing Naive Bayes, Centroid Classifier, KNN, and modified KNN,

results concluded that Naive Bayes is most efficient considering time and accuracy.

Here, Marathi text documents were pre-processed using rule based stemmer and

Marathi word dictionary without removing stop words [11].

After viewing Marathi, Hindi and Bengali language with perspective of

Information Retrieval (IR) [12] propounded light and aggressive stemming

approaches. By applying some aggressive stemmers enhanced recovery effectiveness

can be obtained. A significant performance contrasts were found after comparison

betwixt no stemming and stemming indexing schemes. When an aggressive stemmer

is applied, related improvements determined were approximately 18% for Bengali

language, around 28% for Hindi language and approximately 42% for Marathi

language as compared to a no stemming approach. In evaluation of these stemming

technologies, FIRE 2008 Test collection and two language independent indexing

methods i.e. n-gram and trunc-n are used. To expedite IR operation, two algorithmic

stemmers were exhorted. First one is to remove inflectional suffixes and another is to

remove frequently occurring derivational suffixes [12].

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

71

Table 2.1. Various text summarizers for Indian language

Sr.No Method Approach

Bengali Language

1 Text Summarization using

Extractive Method.

1) It scored the files of content in which query terms

are having the highest frequency.

2) The summary of text documents produced on the

basis of query terms by applying vector-space-term-

weighting. [5]

2 Text Summarization using

Extractive Method.

1) It decided the information on sentiments in the

input text.

2) K-mean is used for clustering of theme.

3) It applies graph representation at relational level.

4) Aggregation is accomplished by clustering of

theme and by applying the graph representation.

5) This approach used page ranking for selection of

appropriate sentences [6].

3 Text Summarization using

Extractive Method.

1)Following two important features are used to rank

sentences:

Thematic term

Position.

2) It uses TF-IDF techniques [7].

Kannada Language

1

Text Summarization using

Extractive Method.

1)It is based on Information Retrieval (IR)

2) This approach processes the input text and

determines which lines are appropriate and which

lines are not appropriate.

3) This system uses command based interaction .[8]

2 Text Summarization using

Extractive Method.

1) It takes classified Kannada documents from

online web sources.

2) Thematic words are identified.

3)This approach uses Inverse-Document-

Frequency(IDF) method with Term-Frequency (TF)

.[9]

3 Text Summarization using

Extractive Method.

1) This approach is based on statistical.

2) This method make easy to identify the exact topic.

[10]

4 Text Summarization using

Extractive Method.

1) It uses GSS coefficient and TF-IDF.

Punjabi Language

1 Text Summarization using 1) Each line of input text is treated as vector.

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

72

Extractive Method.

2) It uses GSS coefficient and TF-IDF [13].

3)Weight-age of sentence is determined by applying

regression(weight learning method).[14] [15] [16]

Hindi Language

1 Text summarization based

on statistical method

1) This approach is based on statistical.

2) It uses Language based approach along with

heuristic approach.

3) It gives weightage and ranking to lines.[17]

2

Text Summarization using

Extractive Method

1) This approach uses genetic algorithm and neural

network for extracting the summary.

2) Selection of sentences is depending on the score.

3) Each sentence score is calculated on the basis of

feature extraction.

3 Text summarization is

using fuzzy-logic

1) Fuzzy logic is used to extract important sentences.

2) Summary is generated by extracting the highest

score sentences.

3)20% compression ratio is used to generate

document summary

4) After extracting the summary sentences are

organized in the original order.

4 Text summarization is

based on rule based

approach.

1) Handcrafted rules are developed to generate the

summary.

2) System generates the extractive summary.

3) It does not include the semantic analysis of the

Hindi text.

4) Summary of the Hindi text obtained from only

single document.

5 Single document

Summarization using

Extraction Method.

1) This approach uses statistical & linguistic feature

to extract the sentences.

2) Genetic Algorithm is also used [22].

6 Text Summarization using

Abstractive Method.

1) Rich Semantic Graph technique is used to

generate summary [23].

Tamil Language

1 Text Summarization using

Extractive Method.

1) Semantic graph technique is used.

2) It recognizes Subject.

3) Object predicated from individual lines for

making semantic-graph of document [13].

2 Text Summarization using

Extractive Method.

1) Syntax of language neutral applied to condense

the text documents [13].

3 Text Summarization using 1) Graph theoretic scoring technique is used for

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

73

Extractive Method.

summarization.

2) It used statistics of frequency of words.

3) To score the sentences, weight-age of positional

term are calculated by string pattern [18].

Malayalam Language

1. Text Summarization using

Extractive Method.

1) It uses Machine Learning Approaches to generate

the summary (Like Naive Bayes, Neural Network &

Hidden Markov Model (HMM)) [19].

2. Text Summarization using

Extractive Method

1) Maximum Marginal Relevance (MMR)

techniques used with successive Threshold [20].

3 Text Summarization using

Extractive Method.

1) It uses TF-IDF techniques [21].

Table 2.1 shows the existing summarizers method and approach used for Indian

languages.

3 Method-Wise Techniques Used In Automatic Text Summarization

The short explanations about each approach are discussed in following section.Fig.3.1

shows various approaches used for summarization.

1 Abstractive text summarization:

Abstractive text summarization creates summary by understanding main concepts

from source document thus expressing it in a clear natural language. Abstractive

techniques are grouped into two classes [24].

Structured Based Approach:

This approach includes most relevant information from document by using

intellectual schemes such as pattern, extraction rules .Structured based approach uses

other structures such as ontology, tree, lead and body phrase structure to extract

relevant information [25].

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

74

Semantic Based approach:

In this approach [24], documents are represented semantically and are go through

Natural Language Generation (NLG) system. This method identifies verb phrases,

noun phrases by examining linguistic data.

2 Extractive text summarization:

Extractive summarization extracts significant sentences from document by

maintaining minimum redundancy of information in summary. In these methods, the

most important regions of a document such as sentences, paragraphs and headline are

given more weight age.

Fig. 3.1 Classification of various approaches used for summarization

3.1 Abstractive Summarization Methods:

3.1.1 Structured Based Methods

3.1.1.1 Tree Based:

The method uses language generation for producing brief summary. By using [40]

shallow parser similar sentences are pre-processed using and then sentences are

mapped to predicate-argument structure. With help of the content planner and theme

intersection algorithm determine common phrases by comparing the predicate-

argument structures. It uses units of the given document read and easy to summarize

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

75

[28].

3.1.1.2 Template Based:

In this approach, a topic is represented as a set of related approaches and implemented

as a template containing slots and fillers. The templates are filled with important text

snippets extracted by the Information Extraction systems. These text snippets are used

to generate meaningful, informative multi document summaries by using IE based

multi document summarization algorithm. This approach works just if the summary

sentences are as of now present in the source documents. It can't deal with the

functions if a multi document requires data about similarities and contrasts over

various documents [26].

3.1.1.3 Ontology Based Method:

Ontology method is abstractive based summarization approach. Ontology in

summarization is to increases interpreted concepts and joint relations between the

concepts. Identifying the related concepts and finding out undetermined relations is

important tasks in ontology-based summarization. Ontology approach present and

determine the vocabulary used. It uses dictionary which is used in language. The

result of ontology is a terminology agreement between users community. This

agreement specifies which term is used to represent a concept in order to avoid

ambiguity. This is nothing but vocabulary normalization. When a concept gives two

synonym terms, the normalization process selects one of those labels which are from

the concept. Finally summarization process generates a summary with preserve

important information and shorter than original document [27].

3.1.1.4 Lead and Body Phrase Method:

This approach is abstractive based summarization. This method finds out lead

sentences from the document. Whichever same chunks are there, are searched in body

and lead phrases. By using similarity metric, repeated phrases are identified. Then

substitution of body phrases takes place when it appears in body [40].

3.1.1.5 Rule Based Method:

The Rule Based methodology is based on an abstraction scheme. It generates

summaries from clusters of news articles on same story. This method represents the

summary in term of categories and list of forms. It uses Content selection module

which selects the best candidate from the generated information extraction rules to

answer one or more forms of a category. At the end generated patterns are used for

generating summary sentences [40].

3.1.2 Semantic Based Methods:

3.1.2.1 Multimodal Semantic Model:

Multimodal document contains both text and images. Knowledge illustration concept

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

76

utilized in this model which is based on objects is constructed initially. With the help

of information density metric contents are rated [40]. Parser stores expression which

expresses the concepts and relationship. Concepts and relationships searched using

metric which is based on applicability of concepts, number of relationships with

different concepts and the number of expressions showing the existence of concept in

the present document.

3.1.2.2 Information Item Based Method:

This approach provides abstract illustration of source documents, instead of from

sentences of source documents. This framework based on three modules: Information

Item retrieval, sentence generation, sentence selection and summary generation. First

with parser Information Item retrieval module gives syntactic analysis of text and the

triple which is verb’s subject and object. In sentence generation module, a sentence is

directly generated from first module using a language generator, the NLG [40].

Sentence selection module ranks the sentences based on their average. Document

Frequency (DF) score. Finally, a summary generation step take place which include

dates and locations for the highly ranked generated sentences.

3.1.2.3 Semantic Graph Based Method:

This abstractive based approach contains three phases as the first Phase gives the

given document semantically using Rich Semantic Graph (RSG). RSG used to create

a semantic graph. In RSG, graph nodes nothing but [39] verbs and nouns of the input

document with edges corresponding to semantic and topological relations between

them. The second phase reduces the source document to more reduced graph using

some Analytical rules. And the third Phase generates the abstractive summary from

the reduced rich semantic graph. This stage accepts a semantic representation as RSG

and creates the summarized text [23].

3.2 Extractive Based Methods:

3.2.1 Term Frequency-Inverse Document Frequency Method:

This is statistical method. This method is used to judge the importance of a term in a

document. Term frequency is used to find the number of times a term occurs in a

document. The term frequency of words like “the” can be very high. Inverse

document frequency is calculated as the log of total number of documents divided by

total number of documents in which the term occurs [29]. Inverse document

frequency of a term can be low even if its term frequency is very high.

3.2.2 Cluster Based Method:

A document can address many documents in itself. They are normally broken up

explicitly or implicitly into sections. Just like the document, summaries should

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

77

address different “themes” appearing in the document. If the document collection for

which summary is being produced is of totally different topics, document clustering

becomes very important to generate a meaningful summary.

Already clustered documents are input to the summarizer. Each cluster is treated as

a theme. After clustering, sentences are selected from each cluster to give the

summary. This is done so that all the themes of a document are covered in the

summary. First factor for sentence selection is to determine how similar the sentences

are to the theme of the cluster. Second factor is to locate the sentence in the document.

The last factor is to calculate how much similar the first sentence is to document [30].

Overall score of sentence is given by the equation:

Si = W1 × Ci + W2 × Fi + W3 × Li (1)

Where

Si – score of sentence Ci.

Fi - scores of the sentence i based on how much similar the sentences are to the theme

of the cluster and first sentence of document respectively.

Li - score of the sentence based on its location in the document.

W1, W2 and W3 – weights

3.2.3 Graph Theoretic Approach:

In this technique, a graph is constructed. The nodes of the graph represent the

sentences. The edges of the graph symbolize connection between the sentences which

share the same words. The nodes which have more edges contain important sentences

and are given more priority for summarization [31].

3.2.4 Machine Learning Approach:

In this method, the summarization task can be seen as a two-class classification

problem. Sentences are grouped into summary sentences and non-summary sentences.

The summarizer is trainable, the training data-set and their extractive summaries are

used for reference. [32][33]

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

78

3.2.5 Latent Symantec Analysis (LSA) Method:

Singular Value Decomposition (SVD) is one of the most powerful tools to find

principal orthogonal matrix in multidimensional information. [34] This technique is

called LSA since SVD can be applied to document word matrices, collect documents

that are semantically identified with each other however they do not share common

words between them. Cosine similarity between rows (vectors) is calculated. Those

vectors which have cosine similarity near about 1 have the most similar words, and

those vectors whose cosine similarity is near about 0 have most dissimilar words. This

technique extracts topic related sentences and word with essential contents from

documents.

3.2.6 Text Summarization with Neural Networks:

In this method, each document is converted into a list of sentences. Each sentence is

represented as a vector [f1, f2... f7], composed of 7 features. The first phase of the

process involves training the neural networks to learn the types of sentences that

should be included in the summary. Once the network has learned the features that

must exist in the summary sentences, we need to determine the trends and

relationships among the features that are intrinsic in the majority of sentences. This

can be accomplished by the feature fusion phase, which consists of two steps: 1)

eliminating uncommon features; and 2) collapsing the effects of common features

[35].

3.2.7 Automatic Text Summarization Based On Fuzzy Logic:

This method considers each characteristic of a text such as sentence length, similarity

to title, similarity to key word and etc. as the input of fuzzy system. All the rules

required for summarization are also the input into the knowledge base. Each sentence

gets a score ranging from zero to one as they are fed. This value determines the

importance of sentences for summary generation. The input membership function for

each feature is divided into three membership functions which are composed of

insignificant values (low L), very low (VL), medium (M), significant values (High h)

and very high (VH). The important sentences are extracted using if-then rules

according to the feature criteria [36].

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

79

3.2.8 Query Based Extractive Text Summarization:

In this approach, the summary is produced based on the query. Sentence scores are

calculated based on the frequency of terms. Sentences which contain query phrases

are scored higher than the sentences which contain single words from the query, and

these are scored more than the sentences which do not contain any query term.

Sentences with high scores are part of output summary [37].

4 Proposed Work

The proposed summarization system is categorized into three segments: pre-

processing, sentence scoring and summarization.

Input to a summarization process can be one or more text documents. In single

document summarization only one document is given as input, it is called single

document text summarization but in multi-document summarization a group of related

text documents is the input. Fig. 4.1 shows the flow of proposed text summarization

system.

Fig. 4.1 Flow of the proposed text summarization system

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

80

1 Pre-processing:

The preprocessing step consists of tokenization, stop-word removal, and stemming.

Tokenization: Every word is considered as a token. Tokenization is the act of

breaking up a sequence of strings into pieces such as words, keywords, phrases,

symbols and other elements called tokens. Each sentence is broken down into tokens

and words of different language or punctuation might be removed.

Stop words removal: The words which occur frequently but do not add any semantic

value to the document are termed as stop-words. In Marathi words आणि (And), ण िं वा

(Or),परिंतु (But),म्हिनू(As) etc. are used frequently in sentences which have little

significance in the implication of a document. These words can simply be removed

for classification process.

Stemming: A single word can have many different forms. These words have to be

converted to their original form for e.g. “stemmed”, “stemmer”, “stemming” are all

from the root word “stem”. Stemming can be helpful for calculating the word

frequency of a root word.

2 Sentence Ranking and Summarization

After pre-processing, the sentences are ranked based on four important features:

Frequency, Position value, Cue words and Skeleton of the document.

Frequency: Frequency is the number of times a word occurs in a document. If a

word’s frequency in a document is high, then the word is important for the document.

Positional Value: The positional value of a sentence is calculated by assigning the

highest value to the first sentence and the lowest value to the last sentence of the

document. The sentence with higher value is more important.

Cue Words: Words such as therefore, hence, lastly, finally, meanwhile or on the other

hand are cue words. These words are connective expressions. The sentences

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

81

containing cue words are usually very significant.

Skeleton of the Document: The words in title and headers in document are considered

as skeleton of the document. Those words are having some additional weights in

sentence scoring for summarization.

Sentence Scoring: The final score of sentence is calculated by a Linear Combination

of all above features (i.e. frequency, positional value, weights of Cue Words and

Skeleton of the document).

3 Summary Making

In our proposed work, sentences are ranked on the basis of their scores and selecting

K-top ranked sentences .In this value of K is set by user. Summary is produced after

ranking these sentences. The sentences in the summary are reordered based on their

presence in the original text to increase the readability of the summary; for example,

the sentence which take place first in the original text will appear first in the

summary.

5 Conclusion

Natural Language Processing (NLP) is the area where Automatic Text Summarization

is a well-known task. It can broadly be divided into two classes: Abstractive and

Extractive summarization. Abstractive summary deals with understanding the

semantic relationship between the texts whereas; extractive summary deals with

directly extracting important sentences from the document and then creating the

summary. Therefore, abstractive summarization is little more tedious to carry out than

extractive summarization. In the proposed approach feature extraction is primary

phase, after those sentences are scored. In the last phase higher ranked sentences are

selected as a summary. In this paper we have briefly illustrated automatic text

summarization techniques for several Indian regional languages (Hindi, Tamil,

Bangla, Kannada, English and Panjabi etc).However there are various techniques that

have been proposed for several languages but a very few effort have been taken for

Marathi text.

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

82

Reference

1. Mathews, L.M., Dr Sathiyamoorthy, E.: Intricacies of an automatic text

summarizer, International Journal of Engineering and Technology (IJET).Vol 5,

No 3 (2013)

2. Luhn, H.P. : The automatic creation of literature abstracts, in IBM Journal of

Research Development, Vol 2, No 2, 159-165 (1958)

3. Jing Hongyan : Sentence Reduction for Automatic Text Summarization, Proc.

sixth conference on Applied natural language processing, 310-315 (2000)

4. Edmundson, H.P.: New methods in automatic extracting, in Journal of the ACM,

16(2): 264 (1969)

5. Islam, T., Masum, S.M.A.: Bhasa: A Corpus Based Information Retrieval and

Summarizer for Bengali Text, Macquarie University, Sydney, Australia, (2004)

6. Das, A., Bandyopadhyay, S.: Topic-based Bengali opinion summarization, Proc.

23rd International Conference on Computational Linguistics: Posters, 232–240

(2010)

7. Sarkar, K.: An approach to summarizing Bengali news documents, Proc.

International Conference on Advances in Computing, Communications and

Informatics, 857-862 (2012)

8. Kallimani, J.S., Srinivas, K.G., Eswara, B.R.,: Information Retrieval by Text

Summarization for an Indian Regional Language, Proc. International Conference

on Natural Language Processing and Knowledge Engineering, 1-4 (2010)

9. R. Jayashree, Srikanta K.M., K. Sunny : Document Summarization in Kannada

using Keyword Extraction, Proc. AIAA 2011,CS & IT 03, 121–127 (2011)

10. Kallimani, J.S., Srinivasa, K.G., Eswara, B.R.,: Summarizing News Paper

Articles: Experiments with Ontology Based, Customized, Extractive Text

Summary and Word Scoring, Journal of Cybernetics and Information

Technologies, Bulgarian Academy of Sciences, Vol. 12, No. 2, 34-50 (2012)

11. Pati, M.,Game P.:Comparison of Marathi Text Classifiers; International Journal

on Information Technology 4.1 (2014)

12. Dolamic, Ljiljana, Jacques S.: Comparative study of indexing and search

strategies for the hindi, marathi, and bengali languages, ACM Transactions on

AsianLanguage Information Processing (TALIP) 9.3 (2010)

13. Banu, M., Karthika, C., Sudarmani, P.,Geetha, T.V.:Tamil Document

Summarization Using Semantic Graph Method, Proc. International Conference

on Computational Intelligence and Multimedia Applications, Vol. 2, 128-134,

(2007)

14. Gupta, V., Lehal, G.S.: Automatic Text Summarization for Punjabi Language,

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

83

International Journal of Emerging Technologies in Web Intelligence, Vol. 5, No.

3, 257-271, (2013)

15. Gupta, V., Lehal,G.S.: Complete Preprocessing Phase of Punjabi Language Text

Summarization, International Conference on Computational Linguistics

COLING‘12, IIT Bombay, India, 199-205, (2012)

16. Keyan, M.K., Srinivasagan, K.G.: Multi-Document and Multi-Lingual

Summarization using Neural Networks, Proc. International Conference on Recent

Trends in Computational Methods,Communication and Controls, 11-14, Vol.

5,(2012).

17. Garain, U., Datta, A.K., Bhattacharya, U., Parui, S.K.: Summarization of JBIG2

Compressed Indian Textual Images, Proc. 18th International Conference

onPattern Recognition (ICPR‘06), IEEE, Kolkata, India, Vol. 3, 344-347,(2006)

18. Kumar, S., V.S. Ram, S.L. Devi.: Text Extraction for an Agglutinative Language,

Proc. Journal: Language in India, 56-59, (2011)

19. Kabeer, R., Idicule, S.M.: Text Summarization For Malayalam Documents An

Experience, International Conference of Data Science and Engineering (ICDSE),

(2014)

20. Ajmal, E. B, Haroom, R. P.: Summarization of Malayalam Document Using

Relevance of Sentence, International Journal of Least Research in Engineering

and Technology (IJLRET), Vol. 1, Issue 6, (2015)

21. Renjith, S.R., Sony, P.:An Automatic Text Summarization for Malayalam Using

Sentence Extraction, Proc. 27th IRF International Conference (2015)

22. Dalal V., Shelar Y.: A Survey of Various Methods for Text Summarization,

International Journal of Engineering Research and Development, Vol. 11,57-59

(2015)

23. Subramaniam, M., Dalal, V.:Test Model for Rich Semantic Graph Representation

for Hindi Text using Abstractive Method, IRJET (2015)

24. Khan, A., Naomie, S.:A review on abstractive summarization methods, Journal

of Theoretical and Applied Information Technology,vol. no. 59 (2014)

25. Saranyamol ,C. S., Sindhu, L., :A Survey on Automatic Text Summarization,

International Journal of Computer Science and Information Technologies, Vol. 5

(2014).

26. Oya, T., Mehdad, Y., Carenini G., Ng, R.:A Template-based Abstractive Meeting

Summarization: Leveraging Summary and Source Text Relationships, Proc. 8th

International Natural Language Generation Conference.

27. Ramezani Majid, Feizi-Derakhshi Mohammad-Reza:Ontology-Based Automatic

Text Summarization Using FarsNet”, ACSIJ Advances in Computer Science: an

International Journal, Vol. 4, No.14 (2015)

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

84

28. Gaikwad, D.K., Mahender, N.: A Review Paper on Text Summarization-

International Journal of Advanced Research in Computer and Communication

Engineering Department of C.S. & I.T., Dr. B. A. M. U., Aurangabad,

Maharashtra, India.

29. Kaur ,J., Gupta, V.:Effective approaches for extraction of keywords, IJCSI

International Journal of Computer Science Issues (IJCSI), vol. 7,(2010)

30. Deshpande ,A. R.,Lobo ,L. M. R. J.: Text Summarization using Clustering

Technique, International Journal of Engineering Trends and Technology (IJETT)

- Volume4 ( 2013)

31. Kumar, N., Kannan, S., Varma,V.: A Knowledge Induced Graph-Theoretical

Model for Extract and Abstract Single Document Summarization Computational

Linguistics and Intelligent Text Processing - International Conference (2013)

32. Sarkar, K., Nasipuri, M., Ghose, S.: Using Machine Learning for Medical

Document Summarization- International Journal of Database Theory and

Application (2011)

33. Bazrfkan, M., Radmanesh, M.: Using Machine Learning Methods To Summarize

Persian Texts- Indian J.Sci.Res. 7 Issue1: 1325-1333 (2014)

34. Gupta, V., Singh, G.: A survey of text summarization extractive techniques-

Journal of Emerging Technologies in Web Intelligence, Vol. 2, No. 3 (2010)

35. Khosrow, K.: Text Summarization Using Neural Networks, Proc. Second IEEE

International Conference on Intelligent Systems, (2004)

36. Patil, P.D., Kulkarni, N.J.: Text Summarization Using Fuzzy Logic (IJIRAE)

(2014)

37. Ibrahim, I., Nounou, N., Alaa, H., Hebat, A.: Query Based Arabic Text

Summarization IJCST, (2013)

38. Efat, M.d., Iftekharul, A., Mohammad, I., Humayun, K.: Automated Bangla text

summarization by sentence scoring and ranking (2013) International Conference

on Informatics Electronics and Vision (ICIEV), (2013)

39. Moawad, I.F., Mostafa, A.: Semantic graph reduction approach for abstractive

Text Summarization (2012) Seventh International Conference on Computer

Engineering & Systems (ICCES), (2012)

40. Khan, A., Salim.: A Review On Abstractive Summarization Methods - Faculty of

Computing, Universitity Teknologys Malaysia, 81310, Skudai, Johor, Malaysia

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

85

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 1 (2017) © Research India Publications. http://www.ripublication.com

86