My experiment

WELCOME PhD Journey In India

By : Boshra F. Zopon Al_Bayaty

Prof . Dr. Shashank. D. Joshi

(Guide)

Knowledge Discovery from

Web Search

OUTLINE

PhD Course Work

Knowledge Discovery from Web Search

National and International Conferences

The Research Contribution

Conclusion and Suggestion for Future Work

Knowledge Discovery From Web Search, PhD Journey

PhD Course Work • The Students Play an important part in College development

INTRODUCTION

Knowledge discovery is a process to extract useful information from the source of information or data by using a

combination of machine learning, statistical analysis, search engine, modeling techniques and natural language processing.

Knowledge discovery is an extension of information retrieval. Information retrieval is extension of data mining. Therefore,

the process of IR data miming will support knowledge discovery directly or indirectly.

Because of the popularity of computers and networks, Internet has become the most important information source.

Traditionally, people use some keywords and simple Boolean algebra to search the related articles.

The best example of knowledge discovery is a tool like search engine which helps to extract information. Evaluation of any

web search engine is the key to ensure the effectiveness, efficiency, Scalability, and usability of these browsing methods.

Because of the imprecise results of keyword search in the Internet, all the studies of web mining method are trying to improve

the accuracy or value of the information gotten from the web pages.

Although search by keywords is the most efficient and popular method to find related information from the Internet, it exists

two problems by using this method.

1. The first is that some search results don’t match with the user’s requirement.

2. There are too many similar articles in the search results.

Because of the two problems, users spend a lot of time organizing the search results and finding what they really want.

The knowledge discovery of sense with the help of context can be done by Word Sense Disambiguation which is open problem in Natural Language Processing.

Word Sense Disambiguation is the ability to computationally determine which sense of a word has being used.

The main WSD methods are : Stacking and Voting, voting can be weighted and non-weighted

6

Problem Definition

Fig. 2. The Screenshot from WordNet Shows the Multiple meaning of Straight Word


Goals and objective set for research work are as follows:

1. To analyze the influence of context on determining the sense of given word with the help of a technique by creating separate context for every sense of every word.

2. To study different type of techniques used for knowledge discovery, apply them for the process of disambiguation, and improve the accuracy.

3. To design and implement new model called “Master- Slave” model.

4. To evaluate the performance of proposed model with the help of different parameters like precision, recall, F-measure.

7

Goals and Objective


8

Supervised Algorithms Suggested

in Research work

Naive Bayes (NB)

Decision Tree (DT)

Decision list (DL)

AdaBoost (AB)

Support Vector

Machine(SVM)

System Requirements and Analysis

Fig.5. Five Supervised Selected


MASTER – SLAVE MODEL

Slave Classifiers

Cn

Master

Classifier

O/P

O/P

Input Data Set

Output

C1

The Reputation


THE REFERENCE OF THE CONTEXT

10

http://www. e-quran.com/language

Fig.9. The resource of data set


http://www.e.quran.com/

http://www.e-quran.com/language



The Source of Context: In order to provide input of words, the process of word sense disambiguation is executed for that word. These words are selected from one paragraph in a holy book “Al_Quran” [E-QURAN.COM] as shown in fig. 8, to perform word sense disambiguation.

11

System Requirements and Analysis

Fig.8. The resource of data set


12

•At this Stage Accuracy related with every algorithm still not up to mark.

• Decision List selected as Master approach for two reasons:

1. Got high Accuracy

2. It’s reputation: Decision list is one of the robust approaches in word sense disambiguation field to address sense

disambiguation. It has long history background e.g. - Kelly and stone, 1975, Block, 1988. Decision list is one of the reputed

algorithms with considerable historic background. History performance is a very important parameter that plays vital role

in deciding algorithm as Master or Slave in our suggested model. Decision list has a good reputation in WSD field, from the

results previous work is reported.

No. Approach Accuracy (%)

1. Decision List 69.12

2. Adaboost 65.27

3. Naïve Bayes 62.86

4. SVM 56.11

5. Decision Tree 45.14

TABLE 3

The final results of five supervised approaches

System Design: Select Master approach (The First Part of System)

0

50

100

Ac

cu

ra

cy

%

Decision List

Adaboost Naïve Bayes

SVM Decision

Tree

Accuracy (%) 69.12 65.27 62.86 56.11 45.14

Accuracy (%)

Fig 22: Final accuracy Algorithms graph


13

System Development and Implementation Algorithm

Input: Data Set, Context, Choice of algorithm

Output: Correct sense according to context.

Process: Word Sense Disambiguation.

For Loop

For Loop

Step1 Select data set, Data source, context and

the algorithm.

Step2 For all words in data set (W), For all

sense (S)

Step 3 (features) find POS from data source (d)

Step 4 Use Master-Slave algorithms.

Step 5 Calculate sense wise P,R and F.

Step6 Select sense with highest value

Step7 Sum all accuracies to calculate overall

accuracy

Step8 boosting factor addition

Step9 Display sense accuracy

End Loop

End Loop

Step1. Accuracy of Master X % is collected.

Step2. Accuracy of Slave y %

Step3. Collect voting to improve X by using factor F= (X - f)/100.

Step4. Accuracy of Word=old Accuracy + F

Step5. Apply this factor for all words, X1, X2, X3…, and X15.

Step6. Calculate precision, Recall, and f-measure.

System Design: The Second Part of System


14

No. Approach Before Combination

Recall Precision F- measure

1 N.Bayes 30.573 62.86 188.58

2 D. List 44.033 69.126 207.38

3 Adaboost 45.92 65.273 195.82

Discussion on Results (Before Combination)

0

500

1000

Pra

ise

Na

me

Wo

rsh

ip

Wo

rld

s

Lo

rd

Ow

ner

Rec

om

pe-

nse

Tru

st

Gu

ide

Str

aig

ht

Pa

th

an

ger

Da

y

Fa

vo

red

Hel

p

COMPARATIVE ANALYSIS OF PRECISION

1st Experiment Precision

2nd Experiment Precision

The Master–Slave model deals with three experiments. In the first experiment, Decision list acts

a Master and Naïve Bayes act as Slave. Individually each algorithm gives good values of precision

and f-measure.

Fig 27: Comparative analysis Graph


15

Approach After Combination

Recall Precision F-

measure

1st Experiment (N.Bayes +

D.L) 68.46667 51.06 1531.8

2nd Experiment (D.L+ Ada)

52.61333 69.23333 2077

3rd Experiment (N.Bayes +

Ada +D.L)

47.37333 70.14667 2104.4

0

500

1000

Pra

ise

Na

me

Wo

rsh

ip

Wo

rld

s

Lo

rd

Ow

ner

Rec

om

pe-

nse

Tru

st

Gu

ide

Str

aig

ht

Pa

th

an

ger

Da

y

Fa

vo

red

Hel

p

COMPARATIVE ANALYSIS OF RECALL

1st Experiment Recall

2nd Experiment Recall

Second combination: used for experiment, in the combination Decision list acts as Master and

Adaboost acts as a Slave. The details of accuracies are mentioned below:

Overall precision 69.23% and recall is 52.61%, so the results of the experiment are satisfactory and

the overall rise in terms of recall and precision is 85.80 and 1.0733 respectively.

Third experiment: the details of accuracy are mentioned below:

Overall precision is 70.14%, recall is 47.37%, which gives rise of 48.73 and 14.53 respectively.

First experiment: The details of accuracy are mentioned below:

Overall precision is 51.06%, recall is 68.46%, which gives rise in Recall more than Precision


Discussion on Results (After Combination)


16

Approach Enhancement

Recall Precisio

n

F- measure

1st Experiment (N.Bayes +

D.L)

378.9367 -118 -354

2nd Experiment (D.L+ Ada) 85.8033 1.0733 3.2

3rd Experiment (N.Bayes +

Ada +D.L)

14.5333 48.7367 146.2

0

5000

Pra

ise

Na

me

Wo

rsh

ip

Wo

rld

s

Lo

rd

Ow

ner

Rec

om

pe…

Tru

st

Gu

ide

Str

aig

ht

Pa

th

an

ger

Da

y

Fa

vo

red

Hel

p

COMPARATIVE ANALYSIS OF F-MEASURE 1st Experiment F-Measure

2nd Experiment F-Measure

Third experiment: It is observed that there in increase in precision and f-measure by 48.7367 and

146.2 respectively; this combination gives all round performance for precision.

Second experiment: There is increase in precision by 1.0733 and f-measure 3.2, unlike to the first

experiment recall is decreased. This is enhancement in precision to resolve word sense

disambiguation problem.

First experiment: When they are combined together its recall is enhanced which might be useful

application like search engine which requires more coverage of sample space, but word sense

disambiguation it is less useful.


Discussion on Results (Enhancement)


Empower WSD with social N/W.

There are number of applications where Master-Slave modeling is needed, that is when user enters a query that query could be

refined with the help of the information or tags received from the social networking site from profile of that individual or the thing

which should or liked by the individual. This process will not only ensure correct sense of a word but it will also increase the

accuracy of a given results displayed.

Empower Translation online

Web-browser to run on online for WSD and provides online interface between user and system to support some application like

Google or Bing translations and this enable the user to easily comprehend the out put.

M-S model for other languages

Would like Master- Slave to support more and more languages like Arabic, Hindi, Germany and so on.

17

Conclusion and Suggestion for future Work


The advantages of this work are to improve the accuracy, disambiguate word, and analyze the relationship among

data set, algorithm and context.

Our proposed solution to this problem provides good level of accuracy. Result of the experiments in this research;

are as per the anticipation, delivering accuracy more than ( 70.14%).

WSD is still one of the central challenges in NLP and all researchers try to meet it.

18

The Research Contribution

• Model

Proposed Model to supervised Algorithms with Master- Slave Combination

• Algorithm

The experiment performed use novel algorithm which is Master- Slave algorithm

using boosting factor. This Master- Slave algorithm (Unique Algorithm) is formed by

selecting best set of algorithms to improve the accuracy of disambiguation.

• Design

The Master-Slave algorithm performance is efficiently with the help of boosting

factor, this boosting factor depend upon the error rate and varies accuracy.

• Performance Optimization

Results of experiments presented with the help of graph proves that selected

algorithm and design work to improvise the accuracy equal to 70.14% this helps to

disambiguate sense efficiently.

•Comparison of novel approach has been made to prove the excellence of it with

respect all other approach.


National conference

Attended and published paper, National in Computer Science and Information Technology organized by Y

M College, Pune held on 27-28 Sept. 2013.

Attended and published paper, National Conference on, Modeling, Optimization and Control, NCMOC 4th

To 6th March, 2015.

Attended National Conference on Advance Technologies for Secured Communication Using 4G & LTE

(ATSC-2014), B. V. U, College of Engineering, Pune. 5-6 February, 2014.

Attended National Conference, On FOSSsumMIT’14, In association with Pune Linux Group, Department of

Computer Engineering, MITCOE, Pune, 1st to 2nd August 2014.

International Conferences

International conference IEEE Canada, IHTC, Ottawa, http://www.ihtc2015. ieee.ca/, 31 May- 4th June, 2015.

International Conference on Knowledge and Software Engineering, December 6-7 2014, Paris, France.

[email protected].

International Conference on Emerging Trends in Science and Cutting Edge Technology (ICETSCET),

YMCA, New Delhi, 28 September, 2014. www.icetscet.com.

International Conference on current advances in Engineering and Technology (ICET-14), Knowledge and

Software Engineering, Trivandurm, Kerala, IFERP Connecting engineers..Developing research (Unit of

VVERT), 14th December, 2014. www.icet.com.

National and International Conferences


http://www.ihtc2015/

http://www.ihtc2015/

mailto:[email protected]

http://www.icetscet.com/

http://www.icet.com/

•International Conferences

Canada – Ottawa , Parise- France


•International Conferences

Trivandurm and New Delhi


SOME SUGGESTIONS

Advantages of Workshops.

The progress reports and Scientific research .

The Main three Stages For PhD degree.

Very Positive Result.


•ADVANTAGES OF WORKSHOPS


SIX MONTHLY PROGRESS REPORTS


REVIEW AND COMMENTS FROM FIRST PRESENTATION

Introduction

Literature Review

Problem Definition (Word Sense

Disambiguation)

Objective of Study

Methodology

Research plan

Select Research Approaches (Five Supervised

Approaches)

System Modeling (Master – Slave

Techniques)

System Requirements

Publication (2 papers)

Conclusion

Source of Bibliography

References

25

Sr.

No.

Comment Status

1. Data Normalizing is required Done

2. Refer more papers based on Supervised

neural network

Done

Table. 1 The status of first presentation comments


The Three Stages For PhD degree

Review for Second Presentation

Introduction

Literature Review (Revised)

Problem Definition

Objective of Study

Motivation

Methodology

The Work Done So Far

Jump to Master – Slave Technique

The Reference of Context and Data Set selected

(Sys. Requirements and Data Normalization)

Modeling – designing- Compilation

Supervised Approaches under Study Implemented

The Comparative Analysis of the Results

The Limitation and Suggestion for future work

Conclusion

System Development Life – Cycle Phases (SDLC)

The Research Contribution in Knowledge and Scientific Research.

Bibliography

Activities and Publications

REVIEW AND COMMENTS FROM SECOND PRESENTATION

26

Sr. No. Comment Status

1. The candidate presented the program of

work which was in with the approved

objectives. It is suggested use of decision

tree and supervised learning.

Done by clarification on decision tree by using example related implementation.

2. Thesis hypothesis could be revisited. The hypothesis or the assumptions made are mentioned below:

1. To perform the combination, the algorithm selected should be based on the individual

performance and reputation.

2. To disambiguate the sense the context has to select.

3. To know POS and senses there must be trust is on the word source referred.

4. Improvement in accuracy of the disambiguation.

5. Increase the performance of algorithm using Master- Slave system.

6. Improvement in the word sense disambiguation irrespective of amount of data set,

data source, context.

7. To improved the algorithm with all combinations.

Table. 2 The status of Second presentation comments

The Three Stages For PhD degree


VERY POSITIVE RESULT.


VERY POSITIVE RESULT


29

Google Scholar search


30

Thank You

http://www.google.iq/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwiO4dX8vqzLAhVDCZoKHbrBDo4QjRwIBw&url=http://forum.imageslove.net/pic48756/&psig=AFQjCNEPwRmnTnW0dhvPmsRUCc8-udWYXA&ust=1457368512815504

My experiment

Travel

Transcript of My experiment