ACL 2015: Automatic Identification of Age-Appropriate Ratings of Song Lyrics (Maulidyani & Manurung)

1
Automatic Identification of Age-Appropriate Ratings of Song Lyrics 1. The problem Media age-appropriateness: suitability of consumption of a song, book, film, videogame, etc., by a child of a given age. 2. The corpus 3. The experiment Ordinal class labels: classification via regression (Frank et al., 1998) using M5P classifier (Wang and Witten, 1997) SMOTE oversampling (Chawla et al., 2002) 4-fold cross validation Features: Vector space model (tf-idf weight) MRC Psycholinguistic Database (Coltheart 1981): Age of acquisition Familiarity Imageability Concreteness GloVe (Pennington et al. 2014): 50 dim. pre- trained vectors (6B tokens: Wikipedia 2014 + Gigaword 5) Example: 4. Results Experiment 1: varying granularity of class labels & instances. VSM features only: Experiment 2: focus on per-album granularity. Vary feature combinations: 3-year old ? ? ? Oh, I love trash! Anything dirty or dingy or dusty Anything ragged or rotten or rusty Yes, I love trash Do you want to build a snowman? Come on, let’s go and play I never see you anymore Come out the door It’s like you’ve gone away Don't you ever say I just walked away I will always want you I can't live a lie, running for my life I will always want you Age # Tracks # Albums Group #Tracks # Albums 2 696 5.7% 119 6.6% Toddler 826 6.7% 142 7.9% 3 130 1.1% 23 1.3% 4 251 2.1% 46 2.6% Pre-schooler 455 3.7% 77 4.3% 5 204 1.7% 31 1.7% 6 281 2.3% 41 2.3% Middle childhood 1 1,293 10.6% 230 12.8% 7 358 2.9% 71 3.9% 8 654 5.3% 118 6.6% 9 237 1.9% 50 2.8% Middle childhood 2 2,407 19.7% 408 22.7% 10 1,590 13.0% 253 14.1% 11 580 4.7% 105 5.8% 12 1,849 15.1% 253 14.1% Young teen 5,069 41.4% 672 37.4% 13 1,767 14.4% 242 13.5% 14 1,453 11.9% 177 9.8% 15 653 5.3% 116 6.5% Teenager 1354 11.1% 196 10.9% 16 521 4.3% 64 3.6% 17 180 1.5% 16 0.9% >17 838 6.8% 73 4.1% Adult 838 6.8% 73 4.1% Total 12,242 100.0% 1,798 100.0% 12,242 100.0% 1,798 100.0% Beyond mere censorship: behavioral, sociological, psychological, cultural norms. w 1 w 2 w n AOA FAM IMG CNC - 233 563 274 - 483 628 465 303 311 619 569 - 588 541 599 303 404 588 477 GloVe 01 GloVe 50 -0.070292 0.71087 0.11891 0.92121 -0.13886 0.2898 -0.64487 -1.0992 -0.183778 0.20567 Words oh i love trash Features used Target: Age Group Target: Year VSM 70.60% 57.15% VSM + MRC 71.02% 56.80% VSM + GloVe 70.58% 57.68% VSM + GloVe + MRC 70.47% 57.85% Using human judgments, can we train a classifier to distinguish age appropriate song lyrics? Sample granularity Target: Age Group Target: Year Per track 69.77% 58.58% Per album 70.60% 57.15% Anggi Maulidyani & Ruli Manurung Faculty of Computer Science, Universitas Indonesia [email protected], [email protected] AOA FAM IMG CNC dog 169 610 598 636 sun 181 617 635 639 actuality 586 247 361 213 absolution 608 241 372 256 sex 450 512 617 584 Label 2 w 1 w 2 w n AOA FAM IMG CNC - 276 609 247 - 370 632 400 - 302 606 361 - 199 613 220 - 402 554 399 - 201 632 217 292 608 307 GloVe 01 GloVe 50 0.29605 0.96954 -0.001091 1.1316 0.13627 0.51921 0.68047 -0.26044 1.2426 -0.19918 0.21705 0.1796 0.42855 0.390055 Words do you want to build a Label 5 Psycholinguistic features provide very slight accuracy increase (not statistically significant). Novel task, still MUCH to be explored (readability metrics, acoustic features?) What is human competence and agreement on this task? (All works are copyrighted to their respective owners)

Transcript of ACL 2015: Automatic Identification of Age-Appropriate Ratings of Song Lyrics (Maulidyani & Manurung)

Automatic Identification of Age-Appropriate Ratings of Song Lyrics

1. The problem

Media age-appropriateness: suitability of consumption of a

song, book, film, videogame, etc., by a child of a given age.

2. The corpus

3. The experiment

• Ordinal class labels: classification via regression (Frank

et al., 1998) using M5P classifier (Wang and Witten,

1997)

• SMOTE oversampling (Chawla et al., 2002)

• 4-fold cross validation

• Features:

• Vector space model (tf-idf weight)

• MRC Psycholinguistic Database (Coltheart 1981):

• Age of acquisition

• Familiarity

• Imageability

• Concreteness

• GloVe (Pennington et al. 2014): 50 dim. pre-

trained vectors (6B tokens: Wikipedia 2014 +

Gigaword 5)

Example:

4. Results

Experiment 1: varying granularity of class labels & instances. VSM

features only:

Experiment 2: focus on per-album granularity. Vary feature

combinations:

3-year old

??

?

Oh, I love trash!

Anything dirty or dingy or dusty

Anything ragged or rotten or rusty

Yes, I love trash

Do you want to build a snowman?

Come on, let’s go and play

I never see you anymore

Come out the door

It’s like you’ve gone away

Don't you ever say I just walked away

I will always want you

I can't live a lie, running for my life

I will always want you

Age # Tracks # Albums Group #Tracks # Albums

2 696 5.7% 119 6.6%Toddler 826 6.7% 142 7.9%

3 130 1.1% 23 1.3%

4 251 2.1% 46 2.6%Pre-schooler 455 3.7% 77 4.3%

5 204 1.7% 31 1.7%

6 281 2.3% 41 2.3%Middle

childhood 11,293 10.6% 230 12.8%7 358 2.9% 71 3.9%

8 654 5.3% 118 6.6%

9 237 1.9% 50 2.8%Middle

childhood 22,407 19.7% 408 22.7%10 1,590 13.0% 253 14.1%

11 580 4.7% 105 5.8%

12 1,849 15.1% 253 14.1%

Young teen 5,069 41.4% 672 37.4%13 1,767 14.4% 242 13.5%

14 1,453 11.9% 177 9.8%

15 653 5.3% 116 6.5%

Teenager 1354 11.1% 196 10.9%16 521 4.3% 64 3.6%

17 180 1.5% 16 0.9%

>17 838 6.8% 73 4.1% Adult 838 6.8% 73 4.1%

Total 12,242 100.0% 1,798 100.0% 12,242 100.0% 1,798 100.0%

Beyond mere censorship:

behavioral, sociological,

psychological, cultural norms.

w1 w2 … wnAOA FAM IMG CNC

- 233 563 274

- 483 628 465

303 311 619 569

- 588 541 599

303 404 588 477

GloVe01 … GloVe50

-0.070292 … 0.71087

0.11891 … 0.92121

-0.13886 … 0.2898

-0.64487 … -1.0992

-0.183778 … 0.20567

Words

oh

i

love

trash

Features used Target: Age Group Target: Year

VSM 70.60% 57.15%

VSM + MRC 71.02% 56.80%

VSM + GloVe 70.58% 57.68%

VSM + GloVe + MRC 70.47% 57.85%

Using human judgments, can we

train a classifier to distinguish age

appropriate song lyrics?

Sample granularity Target: Age Group Target: Year

Per track 69.77% 58.58%

Per album 70.60% 57.15%

Anggi Maulidyani & Ruli ManurungFaculty of Computer Science, Universitas Indonesia

[email protected], [email protected]

AOA FAM IMG CNC

dog 169 610 598 636

sun 181 617 635 639

actuality 586 247 361 213

absolution 608 241 372 256

sex 450 512 617 584

Label

2

w1 w2 … wnAOA FAM IMG CNC

- 276 609 247

- 370 632 400

- 302 606 361

- 199 613 220

- 402 554 399

- 201 632 217

292 608 307

GloVe01 … GloVe50

0.29605 … 0.96954

-0.001091 … 1.1316

0.13627 … 0.51921

0.68047 … -0.26044

1.2426 … -0.19918

0.21705 … 0.1796

0.42855 … 0.390055

Words

do

you

want

to

build

a

Label

5

• Psycholinguistic features provide very slight accuracy

increase (not statistically significant).

• Novel task, still MUCH to be explored (readability metrics,

acoustic features?)

• What is human competence and agreement on this task?

(All works are copyrighted to their respective owners)