SICS @ CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A...

1
SICS @ CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – [email protected] Preben Hansen – [email protected] Magnus Sahlgren – [email protected] Swedish-French Bilingual Experiments Swedish-French Bilingual Experiments : Results • Data-driven query translation approach: find correspondences between terms in different languages based on their mutual occurrence in aligned text regions. • Term weights (N is total number of documents, n(t) number of docs for term t, wt,q = 1 in experiments) • All runs morphologically preprocessed at Conexor using the Functional Dependency Grammar parser. Experiment: Parallell corpora ("Europarl"), aligned at sentence level. Lemmatization using tools from Connexor. Build bilingual vector space using Random Indexing. Translate Swedish queries by extracting the most correlated terms in French. Use search system "Searcher" developed at SICS. The thesaurus component was also used for the interactive QA experiment. Tasks Interactive Cross-Language Question & Answering task Bookmarking – introduce a ”save” list for a second inspection Collaboration – Allow users to help and support each other to accomplish their task. With or without thesaurus (term expansion) People do collaborate when given the possibility People do need to have help with related or expanded terms Conclusions Subjects actually did collaborate during search task Collaborative IR activities were observed in 5 categories: •Topic; Search strategies; Vocabularies, Translation; and System functionalities Collaboration seems to correlate with performance. User pairs seemed to have similar results as per the task evaluation metric. This is probably not because of explicit aid given from one user to another - but due to meta- information such as: "can the system cope with this type of question” Conclusions Xling Reading Search and Inspection Interface Experiment 8 participants were grouped into 4 pairs Each participant ran searches according to the i-CLEF setup matrix - 16 i-CLEF queries Tasks were given in Swedish; users had a mono-lingual French retrieval system to work with. Subjects allowed to communicate during the task 2 systems: with and without term expansion Le Monde and SDA French from 1994- 1995. • The approach is very simple, but clearly viable. It is efficient, fast, and scalable. The quality of the parallel data is decisive for its performance. Error Analysis: Most errors are out-of-vocabulary errors. Proper names are problematic (e.g. "Tour de France") Polysemy is problematic. Topic C229: Swedish word "damm" means both "dust" (not relevant to the query) and "dam" (relevant to the query). While the term expansion technology worked well for the ad-hoc retrieval translation task, it did not work as well in the interactive monolingual case. In the latter situation, failing to meet user expectations of wide coverage reduced trust and hence usefulness of the tools to nothing. Collaboration during Search Task Number of Turns (Utterances) Users Query #

Transcript of SICS @ CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A...

Page 1: SICS @ CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – jussi@sics.se Preben Hansen – preben@sics.se.

SICS @ CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A

Jussi Karlgren – [email protected] Hansen – [email protected] Sahlgren – [email protected]

Swedish-French Bilingual Experiments

Swedish-French Bilingual Experiments : Results

• Data-driven query translation approach: find correspondences between  terms in different languages based on their mutual occurrence in  aligned text regions.

• Term weights (N is total number of documents, n(t) number of docs for term t, wt,q = 1 in experiments)

• All runs morphologically preprocessed at Conexor using the Functional Dependency Grammar parser.

Experiment:• Parallell corpora ("Europarl"), aligned at sentence level.

• Lemmatization using tools from Connexor.

• Build bilingual vector space using Random Indexing.

• Translate Swedish queries by extracting the most correlated terms in French.

• Use search system "Searcher" developed at SICS.

• The thesaurus component was also used for the interactive QA experiment.

Tasks• Interactive Cross-Language Question & Answering task

• Bookmarking – introduce a ”save” list for a second inspection

• Collaboration – Allow users to help and support each other to accomplish their task.

• With or without thesaurus (term expansion)

• People do collaborate when given the possibility

• People do need to have help with related or expanded terms

Conclusions• Subjects actually did collaborate during search task

• Collaborative IR activities were observed in 5 categories:

•Topic; Search strategies; Vocabularies, Translation; and System functionalities

• Collaboration seems to correlate with performance. User pairs seemed to have similar results as per the task evaluation metric. This is probably not because of explicit aid given from one user to another - but due to meta-information such as: "can the system cope with this type of question”

Conclusions

Xling Reading Search and Inspection Interface

Experiment• 8 participants were grouped into 4 pairs

• Each participant ran searches according to the i-CLEF setup matrix - 16 i-CLEF queries

• Tasks were given in Swedish; users had a mono-lingual French retrieval system to work with.

• Subjects allowed to communicate during the task

• 2 systems: with and without term expansion

• Le Monde and SDA French from 1994-1995.

•  The approach is very simple, but clearly viable.  

• It is efficient, fast, and scalable. 

• The quality of the parallel data is decisive for its performance.

Error Analysis:•  Most errors are out-of-vocabulary errors. 

• Proper names are problematic (e.g. "Tour de France") 

• Polysemy is problematic. Topic C229: Swedish word "damm" means both "dust" (not relevant to the query) and "dam" (relevant to the query).

While the term expansion technology worked well for the ad-hoc retrieval translation task, it did not work as well in the interactive monolingual case. In the latter situation, failing to meet user expectations of wide coverage reduced trust and hence usefulness of the tools to nothing.

Collaboration during Search Task

Nu

mb

er

of

Tu

rns

(Utt

era

nce

s)

Users

Query #