Jun Liu ([email protected]) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian

23
Jun Liu ([email protected]) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian SAC 2010, Sierre, Switzerland June 11, 202 2

description

Mining Preorder Relation between Knowledge Units from Text. Jun Liu ([email protected]) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian. November 10, 2014. SAC 2010, Sierre, Switzerland. Outline. Motivation and Challenges Two features of the Preorder Relation - PowerPoint PPT Presentation

Transcript of Jun Liu ([email protected]) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian

Page 1: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Jun Liu ([email protected])

Lu Jiang, Zhaohui Wu

Qinghua Zheng, Yanan QianSAC 2010, Sierre,

SwitzerlandApril 19, 2023

Page 2: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

Page 3: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Learning is an incremental process. To understand a new knowledge unit often relies on the understanding of certain existing knowledge units.

Preorder relations among the knowledge units help the learners avoid the disorientation problem in learning.

Manually annotating the potential preorder relations is very time consuming, and requires the annotators be the domain experts.

Definition of Triangle

Triangle Interior Angles Sum Theorem

Definition of Interior Angle

Definition of Exterior Angle

Triangle Exterior Angle

Theorem

Preorder Relation

Page 4: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Given a text document set , and a knowledge unit set extracted from T as the input, the preorder relation mining process will output a set .

Each can be further represented as a triplet of (name, type, content).

Name: such as “definition of subnet mask”

Type: such as definition, property or method

Content: the text content of the knowledge unit

)1}({ niuU i )1}({ mktT k

UUA

iu

Page 5: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

There has been no previous work on mining the relation among the knowledge units.

Ontology learning, KAT and RDC can hardly be applied to mine the preorder relations .

Challenges:Knowledge units expressed in natural language are ambiguous or ill-formedKnowledge units have far more complex structures than the concepts and named entities Preorder relations have the characteristic of long distance dependency

Page 6: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

Page 7: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

We generated KUs in the given document set by using our extraction method and manually refined the results. Then we manually annotated the preorder relation among the extracted KUs.

The annotating work was conducted as follows:a) Developed web-based annotating system

b) Hired 24 undergraduates from the CS department

c) Create a set of rules to guide the work

d) Created the experimental data set that covers the five courses: Computer Network , Advanced Mathematics, Computer Organization and Architecture, Database System and Application and Geometry (KUs: 5000+; Relations: 7000+ )

Page 8: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

is inversely proportional to exponential function of d, that is, .

Preorder relation can be mined within the same document, or the documents with similar topic.

Page 9: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

If knowledge units in are precursors of knowledge units in , then .

Page 10: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

Page 11: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Text Set Text AssociationMining

Candidate KU-Pairs Generation

Preorder Relation Identification

Distribution Asymmetry of Domain Term

Locality of Preorder Relation

Text Associations Candidate KU-Pairs Preorder Relations

: Knowledge Unit (KU): Text

Page 12: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Text Association Mining aims at finding the documents of similar topic, and then ranks them in pairs.

The clustering process deals with three cases:1. Two documents ti and tj are put into one cluster;

2. A document ti is put into the cluster S (assume tj in S is closest to ti);

3. Cluster S and cluster S’ merge into a new cluster (assume ti in S and tj in S’ are closest document pair).

For each pair (ti , tj ), set a proper threshold F0 (F0<1 ),

If , ;

If , ;

Once the clustering is finished, the directed graph is also generated.

Page 13: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

For each node in , .

For each ,

Page 14: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Three useful features for classification–based recognition algorithm

1. Term frequency:

The greater the is , the more likely that has preorder relation.

2. Distance:

decays exponentially while grows.

3. Semantic type:

Page 15: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

Page 16: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

ID Course Name #KUs #Preorder relations

1 Computer Network 889 758

2 Computer Organization and Architecture 743 839

3 Database System and Application 1,398 1,176

4 Geometry 427 1,325

ID#possible

pairs#candidate

pairsretention

ratio

#training samples

- +

1 49,506 1,858 91.9 1,828 620

2 28,392 4,678 94.3 1,477 680

3 195,806 3,219 96.8 2,524 890

4 12,882 2,313 95.2 1,454 704

Page 17: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

ClassifierClassifier CriteriaCriteriaID = 1ID = 1 ID = 2ID = 2 ID = 3ID = 3

-- ++ -- ++ -- ++

SVM

precision 99.3 93.3 99.4 61.6 99.1 73.3

recall 99.5 89.6 97.2 88.3 95.6 93.5

F1-score 99.4 91.4 98.3 72.6 97.3 82.2

DT( C4.5 )

precision 97.6 70.3 99.8 52.4 96.2 81.8

recall 98.0 66.4 95.5 95.7 98.0 69.8

F1-score 97.8 68.3 97.6 67.7 97.1 75.3

NB

precision 99.4 60.8 99.7 48.8 97.2 75.5

recall 95.7 92.0 94.8 94.8 96.7 77.9

F1-score 97.5 73.2 97.2 64.4 96.9 76.7

MLP

precision 99.5 56.3 99.7 53.3 99.3 70.8

recall 94.7 93.6 95.7 95.2 95.0 94.6

F1-score 97.1 70.3 97.7 68.3 97.1 81.0

Page 18: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

The classification results is immune to the changing of β within a certain range. we set β to 0.4.

Page 19: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Yotta (1024) : A topic-map-based knowledge management system(under construction)

Page 20: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

Page 21: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Two features of the preorder relation were discovered : the locality of the preorder relation and the distribution asymmetry of the domain terms.

A classification-based method of mining the preorder relations was proposed.

Future work: to extend the method to mining the preorder relation residing in online knowledge repository –Wikipedia.

Page 22: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

1. J. M. Ruiz-Sanchez, R. Valencia-Garca, J. T. Fernandez-Breis, R. Martnez-Bejar and P. Compton. An Approach for Incremental Knowledge Acquisition from Text. Expert Systems with Applications, July 2003, 25(1):77-86.

2. C. Timothy and P. Patrick. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-04, Barcelona, Spain, 2004 :33-40.

3. X.Y. Du, M. Li, S. Wang. A Survey on Ontology Learning Research. Journal of Software, 2006, 17(9):1837-1847.

4. F. Michael and H. Eduard. Offline Strategies for Online Question Answering: Answering Questions Before They Are Asked. The 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), Sapporo, Japan, 2003: 1-7.

5. D. Zhou, J. Su and M. Zhang. Modeling Commonality among Related classes in Relation Extraction. The 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL’2006), Sydney, Australia, 2006: 121-128.

6. M. Witbrock, D. Baxter, J. Curtis, et al. An Interactive Dialogue System for Knowledge Acquisition in CYC. The 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 2003: 138-145.

7. X. Chang, Q.H. Zheng. Knowledge Element Extraction for Knowledge-Based Learning Resources Organization. The 6th International Conference on Web-based Learning. Edinburgh, United Kingdom, 2007: 102-113.

Page 23: Jun Liu (liukeen@mail.xjtu) Lu Jiang,   Zhaohui Wu Qinghua Zheng,  Yanan Qian

Thank You!Thank You!Thank You!Thank You!

Questions?Questions?