Similarity metrics for Japanese kanji
-
Upload
larsyencken -
Category
Technology
-
view
491 -
download
9
description
Transcript of Similarity metrics for Japanese kanji
Similarity Metrics for Japanese Kanji
Lars Yencken / 99designs
Maths and Science Meetup, 30th Nov 2012
LinguisticsComputerScience
Computational Linguistics
Relative difficultyof languages
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
Exceptionally difficult fornative English speakers
██████ 2200 class hours
Arabic, Cantonese, Mandarin, Japanese, Korean
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
Exceptionally difficult fornative English speakers
██████ 2200 class hours
Arabic, Cantonese, Mandarin, Japanese, Korean
持
持/mo(tsu)/ "to carry"
持 挂拝
distance(持, 挂) = ???
The space of kanji
dog
dough
log
持挂
拝土
Approaches
Compare images
持挂
Compare components
�
�
扌, 土, 寸
彳, 土, 寸
Compare strokes
P R O S P E R I T Y
P R O P E R T I E S
P R O S P E R I T Y
P R O P E R T I E S
distance: 6
�
�
3, 11a, 2a, 2a
3, 11a, 2a, 2a, 2a
distance: 1
Compare trees
�
� �
� � �
� �� � � � �
� �
� � �
� �� � � � �
�
�
� �
� � �
� �� � � � �
� �
� � �
� �� � � � �
�
tree edit distance
So what works?
Thanks!