Identify Experts from a Domain of Interest
-
Upload
faculty-of-computer-science -
Category
Technology
-
view
449 -
download
1
Transcript of Identify Experts from a Domain of Interest
![Page 1: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/1.jpg)
Identify Experts from a Domain of Interest
„„ Al. I. Cuza” University of IaAl. I. Cuza” University of Ia ss i, Romi, Rom aa niania
Faculty of Computer ScienceFaculty of Computer Science
![Page 2: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/2.jpg)
Context Statistics CriES2010 Input data System components◦ Questions and answers pre-processing◦ Pre-processing of interest areas◦ Getting the list of experts
Results Conclusions
![Page 3: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/3.jpg)
Yahoo! Answers – a collaborative community service, multilingual through which members can ask questions and can receive answers
![Page 4: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/4.jpg)
Google Ad Planner traffic statistics for Y!A, December 2009:◦ 26,000,000 Unique visitors (users) (US)◦ 110,000,000 Total visits (US)
Y!A represents between 1.03% to 1.7% of Yahoo! traffic In present, the identification of experts is done semi-automatically
![Page 5: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/5.jpg)
Automatic search of human expert in the multilingual context offered by Yahoo! Answers network
Participants start from a collection of questions and answers and they must identify the expert able to answer to a new question
![Page 6: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/6.jpg)
Initial digraph
Initial Yahoo!answers collections
en fr ge sp
Eliminate stop words
Domains keywords
Initial users questions
Eliminate stop words
Questions keywords
Relevant words for questions
Relevant words for domains
Similarity score between questions and domains
Run 2 Run 1Run 0
![Page 7: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/7.jpg)
Initially we divided the original XML (over 800 Mb) in 204 smaller files (the bigger file was “Other – Internet” ~ 80 Mb and the smaller one was the “MSN” ~ 670 bytes)
Examples of categories achieved:◦ Alergia, Alergias, Allergies◦ Astronomy◦ Biology◦ Mathematics◦ Monitors◦ Paranormal
![Page 8: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/8.jpg)
For every question from a category, we process the information existing in the tags <title> and <description>
First we removed the stop-words and punctuation signs <topic lang="en">
<title>Do animals have feelings?</title> <description>can an animal feel regrets ,
compassion, sad, fear etc?</description> <category>Zoology</category>
<tokens>animals, feelings, animal, feel, regrets, compassion, sad, fear</tokens>
</topic>
![Page 9: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/9.jpg)
For English topics we used WordNet:<topic lang="en"> <title>What is the origin of "foobar"?</title> <description>I want to know the meaning of the word and how
to explain to my friends.</description> <category>Programming&Design</category> (1) <tokens>origin,foobar,meaning,word,explain,friends
</tokens> (2) <synonyms>descent,extraction,origination,inception,
significance,signification,import,substance</synonyms></topic>
![Page 10: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/10.jpg)
For other languages we used Google Translate service first and then English WordNet:
<topic lang="fr"><title>ki connaitre l'histoire de l'aspirine?</title><description/><category>Biologie</category><questioner>u8620</questioner><answerer>u313460</answerer>(1)<tokens> connaitre, histoire, aspirine</tokens>(2.1)<tokens_en>know,history,aspirin</tokens_en>(2.2)<synonyms_en>account,chronicle,story,acetylsalicylic
acid,Bayer,Empirin,St. Joseph</synonyms_en>(2.3)<synonyms>compte, chronique, l'histoire, l'acide
acétylsalicylique, Bayer, Empirin, Saint- Joseph</synonyms></topic>
![Page 11: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/11.jpg)
For each new question we calculate a similarity score between it and existing answered questions from the same topic
The similarity score depend by common words from tags <tokens> and <synonyms>
The solution = first 10 experts selected in descending order of similarity scores
![Page 12: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/12.jpg)
Similar to Run 1: For each new question we calculate a similarity score
between it and existing answered questions from the same topic
The solution = first 10 experts selected in descending order of similarity scores
Difference: The similarity score depend only by common words from tag <tokens>
![Page 13: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/13.jpg)
In this case we used only the input digraph
<edge source="u765155" target="u52050"> <desc>1592994;Laptops & Notebooks</desc></edge>
For every topic and for every person we calculate the number of questions answered by that person in that topic (using “target” element)
Initial digraph
![Page 14: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/14.jpg)
Run Id CharacteristicsStrict Lenient
P@10 MRR P@10 MRR
0 We eliminate stop words and we consider relevant keywords and their synonyms (using Google Translate and English WordNet)
0.52 0.80 0.82 0.94
1 We eliminate stop words and we consider only relevant keywords
0.47 0.77 0.77 0.93
2 We consider only the digraph provided by Yahoo 0.62 0.84 0.83 0.94
![Page 15: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/15.jpg)
Runs 2 and 0 obtained good results (normal for run 0 and unexpected for run 2)
Problems related to execution time for our runs (few hours)
Future work is related to multilinguality:◦ In our approach Allergies, Allergien, Alergias,
Alergia represent different topics with different experts◦ We still search the algorithm to identify the best multilingual
expert
![Page 16: Identify Experts from a Domain of Interest](https://reader035.fdocuments.net/reader035/viewer/2022081404/5597520f1a28abec5b8b45c1/html5/thumbnails/16.jpg)