Application of a Topic Model Visualisation Tool to a Second … · 2019-10-07 · Application of a...

1
Applicaon of a Topic Model Visualisaon Tool to a Second Language Maria Skeppstedt, Magnus Ahltorp, Andreas Kerren, Rafal Rzepka, Kenji Araki We explored adapons required for applying the topic modelling tool Topics2Themes to a language that is very different from the one for which the tool was originally developed. Topics2Themes, which enables text analysis on the output of topic modelling, was developed for English, and we here applied it on Japanese texts. As white space is not used for indicang word boundaries in Japanese, the texts had to be pre-tokenised and white space inserted to indicate a token segmentaon, before the texts could be im- ported into the tool. The tool was also extended by the addion of word translaons and phonec readings to support users who are second-language speakers of Japanese. Applying Topics2Themes on a Japanese corpus We applied the adapted version of Topics2Themes on a corpus consisng of 1,000 microblogs collected with the criterium that they should contain the same content wrien in Japanese and in English. The Topics panel shows the topics that have been automacally extracted from the corpus. The Terms and Texts panels show the terms and texts associated with the extracted topics. Finally, the Themes panel shows the themes that have resulted from a manual analysis of extracted texts. The fiſth element in the Topics panel has been selected, i.e., the element represenng the food topic. This has resul- ted in that elements associated with the food topic in the other panels have been sorted as the top-ranked ones. The user hovers the mouse over the fourth element in the Themes panel, which results in a highlighng of associa- ted elements in the other three panels. Correspondence to: [email protected] hps://github.com/mariask2/topics2themes

Transcript of Application of a Topic Model Visualisation Tool to a Second … · 2019-10-07 · Application of a...

Page 1: Application of a Topic Model Visualisation Tool to a Second … · 2019-10-07 · Application of a Topic Model Visualisation Tool to a Second Language Maria Skeppstedt, Magnus Ahltorp,

Application of a Topic Model Visualisation Tool to a Second LanguageMaria Skeppstedt, Magnus Ahltorp, Andreas Kerren, Rafal Rzepka, Kenji Araki

We explored adaptions required for applying the topic modelling tool Topics2Themes to a language that is very different from the one for which the tool was originally developed.

Topics2Themes, which enables text analysis on the output of topic modelling, was developed for English, and we here applied it on Japanese texts. As white space is not used for indicating word boundaries in Japanese, the texts had to be pre-tokenised and white space inserted to indicate a token segmentation, before the texts could be im-ported into the tool.

The tool was also extended by the addition of word translations and phonetic readings to support users who are second-language speakers of Japanese.

Applying Topics2Themes on a Japanese corpus

We applied the adapted version of Topics2Themes on a corpus consisting of 1,000 microblogs collected with the criterium that they should contain the same content written in Japanese and in English.

The Topics panel shows the topics that have been automatically extracted from the corpus. The Terms and Texts panels show the terms and texts associated with the extracted topics. Finally, the Themes panel shows the themes that have resulted from a manual analysis of extracted texts.

The fifth element in the Topics panel has been selected, i.e., the element representing the food topic. This has resul-ted in that elements associated with the food topic in the other panels have been sorted as the top-ranked ones.

The user hovers the mouse over the fourth element in the Themes panel, which results in a highlighting of associa-ted elements in the other three panels.

Correspondence to: [email protected] https://github.com/mariask2/topics2themes