Presentation of Domain Specific Question Answering System Using N-gram Approach.

24
Presented by: Tasnim Ara Islam Roll: 1007010 Farh Naz Chowdhuy Roll: 1007038 Supervisor: Dr. K.M. Azharul Hasan Professor Dept of CSE, KUET. Domain Specific Question Answering System Using N-gram Approach Project/Thesis CSE 4000

Transcript of Presentation of Domain Specific Question Answering System Using N-gram Approach.

Page 1: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Presented by:

Tasnim Ara Islam

Roll: 1007010

Farh Naz Chowdhuy

Roll: 1007038

Supervisor:

Dr. K.M. Azharul HasanProfessor

Dept of CSE, KUET.

Domain Specific Question Answering

System Using N-gram Approach

Project/Thesis CSE 4000

Page 2: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Outline Introduction

Objective

Problem Statement

Scope of thesis

Theoretical Consideration

POS Tagger

N-Gram

Q/A System Using N-gram Approach

Experimental Analysis

Project/Thesis CSE 4000

Page 3: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Introduction

Project/Thesis CSE 4000

Page 4: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Objective

User wants specific answers rather than full text

documents or best-matching passages.

To find answers of factoid (people or places, or the

amounts of stuffs) questions by using domain specific

documents.

Project/Thesis CSE 4000

Page 5: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Problem Statement

Our system is a Q/A system which is a specific

type of information retrieval.

Given a text document, the system attempts to

find out the best matching answer to the question.

The output will be a sentence, not be any snippet

or any short answer.

Project/Thesis CSE 4000

Page 6: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Scope of the Thesis

WH- words: Who, What, When, Where, Which, Whom.

Domain specific document.

N-Gram mining approach.

Environment:

Eclipse Java EE IDE (Version: Luna Service Release 2 (4.4.2)),

jre 1.8.0_45

Stanford POS Tagger (Version 3.0.1).

List: Regular and irregular verb list, Synonym List

Project/Thesis CSE 4000

Page 7: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Theoretical Consideration

Project/Thesis CSE 4000

Page 8: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Q/A systems

Pattern based question answering system.

Ex. <NAME> was born on <ANSWER>

Key reference - AskMSR, a web based Q/A system.

Used N-gram mining, filtering and tiling for getting

answer.

Applied N-gram both for question and text sentences.

Project/Thesis CSE 4000

Page 9: Presentation of Domain Specific Question Answering System Using N-gram Approach.

N-Gram N-grams are sequences of characters or words extracted

from a text.

Types -

1. Character based

2. Word based

An n-gram of size 1 is referred to as a Unigram; size 2 is a

Bigram; size 3 is a Trigram and so on.

Taj mahal is a world heritage site.

Bigrams are-

Taj mahal, mahal is, is a, a world, world heritage, heritage site

Trigrams are-

Taj mahal is, mahal is a, is a world, a world heritage, world heritage site

Project/Thesis CSE 4000

Page 10: Presentation of Domain Specific Question Answering System Using N-gram Approach.

POS Tagger

POS Tagger is a software that reads text in some language

and assigns parts of speech to each word such as noun, verb,

adjective etc.

Stanford POS tagger is NLP based library which deals with

parts of speech detection of English language.

Input: I like watching movies.

Output:

I_PRP like_VBP watching_VBG movies_NNS

Project/Thesis CSE 4000

Page 11: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Q/A System Using N-gram

Project/Thesis CSE 4000

Page 12: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Steps of implementation

1. Domain specific question in GUI.

2. Splitting the Text files.

3. Query reformulation.

I. Change corresponding verb.

a. Do, Does, Did.

b. Regular or irregular.

c. Synonym word.

II. Find the Parts of speech from words in questions using POS Tagger.

III. Select Verb, Main Verb and Noun.

4. Verb, Main verb and Noun are matched with passage sentences

by N-Gram Mining.

5. Sentence of maximum match based on verb and main verb is

the answer. Project/Thesis CSE 4000

Page 13: Presentation of Domain Specific Question Answering System Using N-gram Approach.

System in Brief…web-query-solution (filename, passageName, question)

begin

sSentence{} := get sentence from file,

qVerb{} := verb from Question,

qMainVerb{} := mainVerb from Question,

qNoun{} := noun from Question

if(NGram(sSentence) = NGram(qVerb OR qMainVerb OR qNoun)) then

begin

count verb, mainverb, noun and return.

end

max:= no. of verb and no. of mainverb

if(max is MAXIMUM) then return answer String

end

Fig: System Algorithm.Project/Thesis CSE 4000

Page 14: Presentation of Domain Specific Question Answering System Using N-gram Approach.

User Input

Project/Thesis CSE 4000

Page 15: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Experimental Analysis

Project/Thesis CSE 4000

Page 16: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Case Study 1

The Taj Mahal is a white marble mausoleum. It is located in Agra, Uttar

Pradesh, India. Mughal emperor Shah Jahan built Taj mahal in memory of

his third wife, Mumtaz Mahal. The Taj Mahal is widely recognized as "the

jewel of Muslim art in India". In 1983, the Taj Mahal became a UNESCO

World Heritage Site. The construction began around 1632. The construction

was completed around 1653. The architects of Taj mahal are Abd ul-Karim

Ma'mur Khan, Makramat Khan, and Ustad Ahmad Lahauri. Lahauri is

generally considered to be the principal designer. In 1631, Shah Jahan was

grief-stricken for the death of his wife. Mumtaz Mahal was Shah Jahan's

third wife and a Persian princess. Mumtaz died during the birth of their 14th

child, Gauhara Begum.

Project/Thesis CSE 4000

Page 17: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Case Study 2 and 3

Child Labour

Bangladesh Cricket team

Project/Thesis CSE 4000

Page 18: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Output Ranking

Excellent

Satisfactory

Bad

Project/Thesis CSE 4000

Page 19: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Experimental AnalysisNo Q/A Rank

1. Q.Where is Taj mahal located in?

Ans: taj mahal is located in agra uttar pradesh india Excellent

2. Q.What is Taj mahal?

Ans: the taj mahal is widely recognized as the jewel of muslim art in india.

Satisfactory

3. Q.Who was mumtaj mahal?

Ans: in 1631 shah jahan was grief-stricken for the death of his

Bad

4. Q.When did the construction begin?

Ans: the construction began around 1632.

Excellent

5. Q.Who is the principal designer?

Ans: the taj mahal is widely recognized as the jewel of muslim art in india.

Bad

Project/Thesis CSE 4000

Page 20: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Experimental ResultsCase study 1: Taj Mahal : 32 questions. From those:

Excellent: 15, so, 46.87%

Satisfactory: 14, so, 43.75%

Bad: 3, so, 9.38%

Case study 2: Child Labour :14 questions. From those:

Excellent: 6, so, 42.86%

Satisfactory: 1, so, 7.14%

Bad: 7, so, 50%

Case study 3: Bangladesh Cricket Team : 24 questions. From those:

Excellent: 14, so, 58.33%

Satisfactory: 0, so, 0%

Bad: 10, so, 41.67%

Project/Thesis CSE 4000

Page 21: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Accuracy Measure

Total question asked = 32 + 14 + 24 = 70 questions

Among those,

Excellent answers = 15 + 6 + 14 = 35

Satisfactory answers = 14 + 1 + 0 = 15

Bad Answers = 3 + 7 + 10 = 20

Percentage of Excellent answers = 50%

Percentage of Satisfactory answers = 21.43%

Percentage of Bad answers = 28.57%

Project/Thesis CSE 4000

Page 22: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Limitations

Deals with simple sentences only.

Does not handle antonyms, spell checking.

Not domain independent.

Complex questions can not be handled.

Project/Thesis CSE 4000

Page 23: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Conclusion

While implementing the system we faced difficulties. A

lot can be done to make the system domain

independent. We can implement more linguistic

features. These will make the system more robust.

Project/Thesis CSE 4000

Page 24: Presentation of Domain Specific Question Answering System Using N-gram Approach.

Thank You.

Project/Thesis CSE 4000