Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan...
-
Upload
samson-curtis -
Category
Documents
-
view
220 -
download
0
Transcript of Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan...
Triplet Extraction from Triplet Extraction from SentencesSentences
Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan“Jožef Stefan” Institute, Ljubljana, Slovenia
Assist. Prof. Dr. Dunja MladenićBlaž FortunaMarko Grobelnik
Lorand Dali June 2008
Location of the project in the Location of the project in the field of Computer Sciencefield of Computer Science
Artificial IntelligenceNatural Language ProcessingMachine Learning
My My fatherfather carriescarries around the around the picturepicture of the of the kidkid who who camecame with his with his walletwallet..
Motivation of Triplet ExtractionMotivation of Triplet Extraction
Advantages◦ compact and simple representation of the
information contained in a sentence◦ avoids the complexity of a full parse◦ contains semantic information
Applications◦ building the semantic graph of a document◦ summarization◦ question answering
Triplet Extraction – 2 Triplet Extraction – 2 ApproachesApproachesExtraction from the parse tree of the
sentence using heuristic rules◦ OpenNLP – Treebank Parsetree◦ Link Parser – Link Grammar (a type of dependency
grammar)
Extraction using Machine Learning◦ Support Vector Machines (SVM) are used◦ The SVM model is trained on human annotated data
Short review of SVMShort review of SVM
Features of the triplet Features of the triplet candidatescandidatesOver 300 features depending on:Sentence
◦ length of sentence, number of words, etcCandidate
◦ context of Subj, Verb and Obj;◦ distance between Subj, Verb, Obj
Linkage◦ number of links, of link types, nr of links from S, V, O
Minipar◦ depth, diameter, siblings, uncles, cousins, categories,
relations
Treebank◦ depth, diameter, siblings, uncles, cousins, path to root, POS
Evaluation and TestingEvaluation and TestingTraining set = 700 annotated sentences
Test set = 100 annotated sentences
Compare the extracted triplets from a sentence to the annotated triplets from that same sentence
Comparison is done according to a similaritry measure [0, 1] between two triplets
extracted to annotated => precision
annotated to extracted => recall
ConclusionsConclusions
Triplet extraction using hand rulesTriplet extraction using machine
learning (SVM)Question answering system based on
triplets
QuestionsQuestions
Triplet Similarity MeasureTriplet Similarity Measure
S V O
S’ V’ O’
SubjSim VerbSim ObjSim
TrSim = (SubjSim + VerbSim + ObjSim) / 3
TrSim, SubjSim, VerbSim, ObjSim [0, 1]
String Similarity MeasureString Similarity Measure
The way to success is under heavy construction
The road to success is always under construction
road success under construction
way success under heavy construction
Sim = nMatch / maxLen = 3 / 5 = 0.6
Evaluating the extracted Evaluating the extracted tripletstriplets
Sentence Sentence
Tr1
Tr2
Tr3
Tr1
Tr2
Precision
Recall
Extracted Golden Standard
My My fatherfather carriescarries around the around the picturepicture of the of the kidkid who who camecame with his with his walletwallet..
Question TypesQuestion TypesYes/No QuestionsList QuestionsReason QuestionsQuantity QuestionsLocation QuestionsTime Questions
Block Diagram of QA Block Diagram of QA SystemSystem
Parse and
determine
question type
BuildQuery
SearchTriplets
Question Answer
If a If a listenerlistener nodsnods his his headhead while while youyou're 're explainingexplaining your your programprogram; wake him up.; wake him up.