Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable...

Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context PairsS H O N O S U K E I S H I WATA R IN O B U H I R O K A J IN AO K I YO S H I N AG AM A S A S H I T OYO DAM A S A R U K I T S U R E G AWA

Past Work : Mikolov et al. (2013b)

1. Suggested an alternative to dictionary and phrase tables based machine translation system.

2. Based on the observation that various concepts have similar geometric arrangements in vector space irrespective of their language.

3. First, monolingual models are built of the languages and then a small bilingual dictionary is used to learn a linear transformation from one source to target language.

4. The word closest to the computed representation is then served as the translation.

Proposed Idea1. The objective is to improve the transformation when using count based word vectors by

using the correspondence between dimensions of word vectors.

2. Exploit the fact that dimensions of word vectors are related to context words which could be used to be gain insight into the cross-lingual correspondence between dimensions of word vectors.

3. Also try to gain some insight by using surface forms of the word. This is useful when dealing with languages that share vocabulary. (e.g., “cocktail” in English and “c´octel” in Spanish)

Proposed Method1. Start with the already discussed optimization

problem

2. Add a regularizer to this problem to prevent over-fitting.

3. Add the terms that exploit the discussed theory

Proposed Method(continued)1. Dtrain = The existing training data of word pairs is reused for finding correspondence between

the dimensions.

2. Evaluate word pairs based on the following distance function and add those which have distance below a certain threshold to Dsim .

3. The last two terms in the objective function force the learning process to strengthen the term wjk when k-th dimension in the source language corresponds to j-th dimension in the target language.

4. βtrain and βsim are parameters representing the strength of the new terms.

Comparison Methods1. Baseline : This method learns the transformation matrix on count based word vectors using

the base equation.

2. CBOW: This method learns the transformation matrix on word vectors obtained by a neural network and uses the base equation.

3. Direct Mapping: Training data was used to “map each dimension in a word vector in the source language to the corresponding dimension in a word vector in the target language”

Evaluation Procedure1. Take a word vector from source language.

2. Run it through the method in question and obtain a word vector in target language.

3. Find the top-n (n=1,5) similar (based on cosine similarity) word vectors in the target language.

4. Check if the real vector appears among the chosen top-n.

Results

Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable...

Documents

Transcript of Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable...