Distributional Models vs. Linked Data: leveraging crowdsourcing to personalize music playlists

Post on 15-Jan-2015

384 views 1 download

Tags:

description

Italian Information Retrieval 2013 - Workshop (http://iir2013.isti.cnr.it) - Distributional Models vs. Linked Data: leveraging crowdsourcing to personalize music playlists

Transcript of Distributional Models vs. Linked Data: leveraging crowdsourcing to personalize music playlists

IIR 2013 - 4th Italian Information Retrieval Workshop

Pisa (Italy), 17.01.2013

Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists

Cataldo Musto, Fedelucio Narducci, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis

exponential growthof the available music

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Some stats28,000,000 songs available on iTunes Store (*)

around 31,000 hours of music

a typical user spends 1.5 hours for day listening to music

56 years=

to listen to the whole iTunes Library

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

(*) http://www.digitalmusicnews.com/permalink/2012/120425itunes

Information Overload

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

what music should I listen to?C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

personalization.

solution

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

personalized music playlists

solution

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Is this something new?No.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Amazon.com

Recommendations

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Genius @iTunes

Recommendations

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Recommendations

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

All the state of the art platforms share an important drawback.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

training is a bottleneck.C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

need for explicit

informationabout

user interests.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

social media

provide information about user preferences

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

example

user preferences in music from Facebook

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Our contributionPlay.me

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me• Goal

• To provide users with personalized music playlists

• Insights

• Extraction of explicit user preferences from Facebook

• Playlist creation by enriching explicit user preferences.

• New artists are added to those explicitly extracted from Facebook

• Comparison of two enrichment techniques

personalized music playlists

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.mearchitecture

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.mearchitecture

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

• Crawling from Last.fm

• Public API

•Content-based features

• Name of the artist + Social tags

• Noise processing

• Information locally stored

pre-processing

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

pre-processing

Sigur Ròs tag cloud from Last.fm

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

Play.mearchitecture

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

data extraction from Facebook

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

data extraction from Facebook

explicit preferencesC. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

data extraction from Facebook

implicit preferencesC. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

Play.mearchitecture

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

•Rationale

• Given a set of explicit preferences extracted from Facebook

• Play.me enrichs this set

• Extraction of artists similar to those the user explicity likes

enrichment

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

enrichment example

Coldplay extracted from Facebook

enrichment

radiohead red hot chili peppers kings of leon

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

Play.mearchitecture

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.meplaylist

Most popular songs of the artists extracted from Last.fm (as well as those added through the enrichment) are proposed to the user.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

let’s go deeper

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

• Comparison of two approaches

•Content-based strategy•Distributional Models

•Linked Data

enrichment

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

•Content-based strategy

• Each artist is modeled through a set of tags

• Each artist is represented as a point in a semantic geometrical space

• Distributional Models

• Similarity calculations to extract the most similar artists.

enrichment based on Distributional Models

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

“meaning is its use”

L.Wittgenstein(Austrian philosopher)

distributional models

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

insightby analyzing large corpus of textual data it is possible to infer information about the usage (about the meaning) of the terms.

example

distributional models

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

c1 c2 c3 c4 c5 c6 c7 c8 c9

t1 ✔ ✔ ✔ ✔

t2 ✔ ✔ ✔ ✔

t3 ✔ ✔ ✔

t4 ✔ ✔ ✔ ✔

distributional modelsterm/context matrix (WordSpace)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

beer vs. glass: good overlap

c1 c2 c3 c4 c5 c6 c7 c8 c9

t1 ✔ ✔ ✔ ✔

t2 ✔ ✔ ✔ ✔

t3 ✔ ✔ ✔

t4 ✔ ✔ ✔ ✔

distributional models

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

distributional modelsbeer vs. spoon: no overlap

c1 c2 c3 c4 c5 c6 c7 c8 c9

t1 ✔ ✔ ✔ ✔

t2 ✔ ✔ ✔ ✔

t3 ✔ ✔ ✔

t4 ✔ ✔ ✔ ✔

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

distributional modelsrock vs. post rock = good overlap

c1 c2 c3 c4 c5 c6

rock ✔ ✔ ✔

post rock ✔ ✔

jazz ✔

classical ✔ ✔ ✔

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

distributional modelsrock vs. classical = no overlap

c1 c2 c3 c4 c5 c6

rock ✔ ✔ ✔

post rock ✔ ✔

jazz ✔

classical ✔ ✔ ✔

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

representation of documents (*) can be inferred by combining the representation of

the terms (**) occurring in the document.

(*) documents = artists(**) terms = tags

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

distributional models

c1 c2 c3 c4 c5 c6 c7 c8 c9

t2 ✔ ✔ ✔ ✔

t3 ✔ ✔ ✔

d1 ✔ ✔ ✔ ✔ ✔

term/context matrix (DocSpace)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Coldplay

RadioheadKings of Leon

Lady Gaga

enrichment based on Distributional Models

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

enrichment based on Distributional Models

radiohead the killers kings of leon

input: vector space representation

output: artists with the highest cosine similarity

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

Linked Open Data Cloud

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Linked Open Data Cloud

Structured (RDF)

representation of the information

stored in Wikipedia.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

enrichment based on Linked Data

Coldplay play Alternative RockC. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

RDF triple

Relationships are explictly encoded in RDF.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

• Linked Open Data Cloud

• Each artist is mapped on a DBpedia node.

• univocal URI

• Relationship between artists (nodes) are explicitly encoded

• e.g. genre, artist category, etc.

• Use of SPARQL to extract artists (nodes) that

share the same features

enrichment based on Linked Data

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

Play.me

Play.meenrichment based on Linked Data

radiohead the smiths the verve

input: SPARQL query

output: artists sharing the same properties

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

recapenrichment process

radiohead

input: artist output: similar artists

kings of leon

radiohead

Linked Data

Distributional Models

coldplay the smiths

the verve

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

experimental evaluation.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

• Experiment

• Which one is the enrichment technique that can provide users with the best playlists ?

experimental design

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

•30 users

• Heterogeneous musical knowledge

• Last.fm crawl: 228,878 artists

• Extraction & Recommendation step

•325 artists extracted

•11 per user, on average

experimental designsettings

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

experimental setup

Given a playlist, each user can freely express her ownfeedback (like/dislike) on the proposed tracks.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

experimental setup

Experiment repeated three times (one run with Linked Data enrichment, another one with Distributional Models, one with a simple baseline based on popularity).

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

experimental setup

Users were unaware of the adopted configuration.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

experimental designresults

55

61,25

67,5

73,75

80

n=1 n=2 n=3

585858

69,775,276,3

63,264,6

65,9

Linked DataDistributional ModelsBaseline (Popularity)

n = number of artists added for each extracted artistC. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

experimental designresults

55

61,25

67,5

73,75

80

n=1 n=2 n=3

585858

69,775,276,3

63,264,6

65,9

Linked DataDistributional ModelsBaseline (Popularity)

distributional models overcome linked dataC. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

experimental designresults

55

61,25

67,5

73,75

80

n=1 n=2 n=3

585858

69,775,276,3

63,264,6

65,9

Linked DataDistributional ModelsBaseline (Popularity)

precision in distributional models drops down more rapidlyC. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

experimental designresults

55

61,25

67,5

73,75

80

n=1 n=2 n=3

585858

69,775,276,3

63,264,6

65,9

Linked DataDistributional ModelsBaseline (Popularity)

good results for baseline, as well (poor music knowledge?)C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

conclusions.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

both enrichment techniques overcome the baseline

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

distributional models overcome linked data

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

future research.C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

merging different enrichment techniques

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

evaluation with user-based metrics(serendipity, novelty, unexpectedness)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

modeling context.C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13

questions?C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis.Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists - IIR 2013 - 17.01.13