Shyamala Doraisamy Stefan Rüger Faculty of Computer Science and Information TechnologyKnowledge...

1
Shyamala Doraisamy Stefan Rüger Faculty of Computer Science and Information Technology Knowledge Media Institute University Putra Malaysia The Open University Malaysia United Kingdom Towards Automatic Topic Detection of Folksongs THEMATIC CATEGORIES AUTOMATIC TOPIC MODELLING PRELIMINARY RESULTS ID Topic T1 Child Ballads T2 Other Ballads and Narrative Songs T3 Thwarted love T4 Love and Courtship T5 Lover’s Farewell T6 Lover’s Return T7 False Hearted Lovers and Seducers T8 Cuckolds T9 Burdens of Single and Married Life T10 Adventurous and Crafty Maidens T11 Rakes, Robbers and Highwaymen T12 Country Life T13 Sports and Pastimes T14 Sailors and Sea Songs T15 Soldiers and War Songs T16 Humorous Songs T17 Nonsense and Nursery Songs T18 Cumulative and Enumerative Songs T19 Carols, Religious and Wassail Songs T20 Various T21 Fragments REFERENCES •Experimentation •940 folksongs were obtained from www.folkinfo.org in abc music notation format •Pre-processed to remove notation tags, hyphens and punctuations marks •Topic analysis performed using the GibbsLDA++ package [4] •Number of topics for analysis were set to 5, 10, 15, 20 and 25 •Results •Topics output were analysed for mapping based on topics identified from Cecil Sharp’s Collection of Folk Songs [2] •With 10 topics, approximate mapping was able to be performed as shown in Results Table •With more than 10 topics, too many junk and insignificant topics were identified Topic Number Output Manual Label Mapped to listed topics 0 Carol Carols, Religious Songs and Wassail Songs (T19) 1 Sailors Sailors and Sea Songs, Soldiers and War Songs (T14,15) 2 Ballads Rakes, Robbers and Highwaymen and Country Life (T3,4) 3 Hunting Songs Sports and Pastimes (T13) 4 Land/country life Lover’s farewell, Lover’s return, Country Life (T5, 6, 12) 5 Difficult Life Burdens of Single and Married Life (T9) 6 Scottish/old English - 7 Junk/ Insignificant - 8 Happy Love Love and Courtship (T4) 9 Grand/Royal Child Ballad (T1) Topics from Cecil Sharp’s Collection of English Folk Songs [2] • Preliminary results show the feasibility of topic modelling of folksongs using LDA •Further investigation would be needed to reduce the insignificant topics identified •Future work •Topic Significance Ranking techniques to be tested to eliminate insignificant topics •Subject matter experts for performance validation •Larger data collections comprising folksongs in English from America, Australia, etc. •Formal Folk Song Collections and Bibliographies •Examples of collections with thematic categorisations •Cecil Sharp’s Collection of English Folk Songs [2] •David Atkinson, English Folksong Bibliography: An Introductory Bibliography Based on the Holdings of the Vaughn Williams Memorial Library, 3 rd (electronic) edition, 2006 •Informal collection from the Internet, •Eg: http://www.folkinfo.o rg with an alphabetically organised folksong collection, • Modelling text corpora and discrete data collections • to find short descriptions of the members of a collection that enable efficient processing of a large collection • Topic Modelling has been applied to song lyrics text corpora • Relatively few or no related studies on English Folksong lyrics from the English Tradition [1] Blei, D.M., Ng. A.Y., Jordan, M.I., Latent Dirichlet Allocation. The Journal of Machine Learning research 3, 993-1022 (2003). [2] Cecil Sharp’s Collection of English Folk Songs, edited by Maud Karples, Vol. 1 & 2, Oxford University Press, 1974. [3] AlSumait, L., Barbara, D., Gentle, J., Domeniconi, Topic Significance Ranking of LDA Generative Models, W. Buntine et. Al. (Eds.): ECML PKDD 2009, part 1, LNAI 5781, pp. 67-82,. [4] http://gibbslda.sourceforge.net OBSERVATIONS Formal Databases - eg: An Index Search with the Roud Folksong Index from the Vaughn Williams Memorial Library (VWML) at www.library.efdss.org Informal DataBases - eg: Indexed alphabetically from www.folkinfo.org providing notation, lyrics, notes and descriptions of songs and song index number (eg: Roud index) if available •Folksong collections in general are indexed by the collectors’ recorded data such as titles and, place collected, performer, etc • Folksong collection tasks are based on an oral tradition and several lyric versions of the same song could be available •Thematic categorisation of folksongs are commonly performed by collectors or bibliographers A subjective lyrics analysis would be required for this task •Automated topic modelling would be useful to support folksong thematic categorisation tasks Record 1 of 187800 Song title: Tune First line Record 2 of 187800 Record 3 of 187800 • Topic Significance Ranking •To evaluate topic significance using the approach proposed by Alsumait et. al. [3] •The distance between a topic distribution and three definitions of “junk distribution” is computed to determine topic significance Lyrics Notation , Discussion Notes There was a Lady ….., Lay the Bent to the……, And she had lovely ….., Fa, la la la, fa, la….. There was a Knight of Noble….., Which also lived in the …… • Folksong Lyrics vs Contemporary music lyrics •Classification •Genres vs themes •Vocabulary •Modern vs Old English • To utilise Latent Dirichlet Allocation (LDA) , a generative probabilistic model proposed by Blei et. Al [1] for topic model modelling Folksong Lyrics Collection Latent Topic Analysis Topic models Labeled models

Transcript of Shyamala Doraisamy Stefan Rüger Faculty of Computer Science and Information TechnologyKnowledge...

Page 1: Shyamala Doraisamy Stefan Rüger Faculty of Computer Science and Information TechnologyKnowledge Media Institute University Putra MalaysiaThe Open University.

Shyamala Doraisamy Stefan Rüger

Faculty of Computer Science and Information Technology Knowledge Media Institute

University Putra Malaysia The Open University

Malaysia United Kingdom

Towards Automatic Topic Detection of Folksongs

THEMATIC CATEGORIES

AUTOMATIC TOPIC MODELLING

PRELIMINARY RESULTS

ID Topic

T1 Child Ballads

T2 Other Ballads and Narrative Songs

T3 Thwarted love

T4 Love and Courtship

T5 Lover’s Farewell

T6 Lover’s Return

T7 False Hearted Lovers and Seducers

T8 Cuckolds

T9 Burdens of Single and Married Life

T10 Adventurous and Crafty Maidens

T11 Rakes, Robbers and Highwaymen

T12 Country Life

T13 Sports and Pastimes

T14 Sailors and Sea Songs

T15 Soldiers and War Songs

T16 Humorous Songs

T17 Nonsense and Nursery Songs

T18 Cumulative and Enumerative Songs

T19 Carols, Religious and Wassail Songs

T20 Various

T21 Fragments

REFERENCES

•Experimentation

•940 folksongs were obtained from www.folkinfo.org in abc music notation format

•Pre-processed to remove notation tags, hyphens and punctuations marks

•Topic analysis performed using the GibbsLDA++ package [4]

•Number of topics for analysis were set to 5, 10, 15, 20 and 25

•Results

•Topics output were analysed for mapping based on topics identified from Cecil Sharp’s Collection of Folk Songs [2]

•With 10 topics, approximate mapping was able to be performed as shown in Results Table

•With more than 10 topics, too many junk and insignificant topics were identified

Topic Number Output Manual Label Mapped to listed topics

0 Carol Carols, Religious Songs and Wassail Songs (T19)

1 Sailors Sailors and Sea Songs, Soldiers and War Songs (T14,15)

2 Ballads Rakes, Robbers and Highwaymen and Country Life (T3,4)

3 Hunting Songs Sports and Pastimes (T13)

4 Land/country life Lover’s farewell, Lover’s return, Country Life (T5, 6, 12)

5 Difficult Life Burdens of Single and Married Life (T9)

6 Scottish/old English

-

7 Junk/Insignificant -

8 Happy Love Love and Courtship (T4)

9 Grand/Royal Child Ballad (T1)

Topics from Cecil Sharp’s Collection of English Folk Songs [2]

• Preliminary results show the feasibility of topic modelling of folksongs using LDA

•Further investigation would be needed to reduce the insignificant topics identified

•Future work

•Topic Significance Ranking techniques to be tested to eliminate insignificant topics

•Subject matter experts for performance validation

•Larger data collections comprising folksongs in English from America, Australia, etc.

•Formal Folk Song Collections and Bibliographies

•Examples of collections with thematic categorisations

•Cecil Sharp’s Collection of English Folk Songs [2]

•David Atkinson, English Folksong Bibliography: An Introductory Bibliography Based on the Holdings of the Vaughn Williams Memorial Library, 3rd (electronic) edition, 2006

•Informal collection from the Internet,

•Eg: http://www.folkinfo.org with an alphabetically organised folksong collection,

• Modelling text corpora and discrete data collections

• to find short descriptions of the members of a collection that enable efficient processing of a large collection

• Topic Modelling has been applied to song lyrics text corpora

• Relatively few or no related studies on English Folksong lyrics from the English Tradition

[1] Blei, D.M., Ng. A.Y., Jordan, M.I., Latent Dirichlet Allocation. The Journal of Machine Learning research 3, 993-1022 (2003).

[2] Cecil Sharp’s Collection of English Folk Songs, edited by Maud Karples, Vol. 1 & 2, Oxford University Press, 1974.

[3] AlSumait, L., Barbara, D., Gentle, J., Domeniconi, Topic Significance Ranking of LDA Generative Models, W. Buntine et. Al. (Eds.): ECML PKDD 2009, part 1, LNAI 5781, pp. 67-82,.

[4] http://gibbslda.sourceforge.net

OBSERVATIONS

Formal Databases

- eg: An Index Search with the Roud Folksong Index from the Vaughn Williams Memorial Library (VWML) at www.library.efdss.org

Informal DataBases

- eg: Indexed alphabetically from www.folkinfo.org providing notation, lyrics, notes and descriptions of songs and song index number (eg: Roud index) if available

•Folksong collections in general are indexed by the collectors’ recorded data such as titles and, place collected, performer, etc

• Folksong collection tasks are based on an oral tradition and several lyric versions of the same song could be available

•Thematic categorisation of folksongs are commonly performed by collectors or bibliographers

• A subjective lyrics analysis would be required for this task

•Automated topic modelling would be useful to support folksong thematic categorisation tasks

Record 1 of 187800Song title: TuneFirst line

Record 2 of 187800

Record 3 of 187800

• Topic Significance Ranking

•To evaluate topic significance using the approach proposed by Alsumait et. al. [3]

•The distance between a topic distribution and three definitions of “junk distribution” is computed to determine topic significance

Lyrics Notation

,

Discussion Notes

There was a Lady …..,Lay the Bent to the……,And she had lovely …..,Fa, la la la, fa, la…..

There was a Knight of Noble…..,Which also lived in the ……

• Folksong Lyrics vs Contemporary music lyrics

•Classification

•Genres vs themes

•Vocabulary

•Modern vs Old English

• To utilise Latent Dirichlet Allocation (LDA) , a generative probabilistic model proposed by Blei et. Al [1] for topic model modelling

Folksong Lyrics

Collection

Latent Topic Analysis

Topic modelsLabeled models