Craigslist++ sean anastasi joseph chen tatiana gershanovich andreas sekine cse454 craigslist++

download Craigslist++ sean anastasi joseph chen tatiana gershanovich andreas sekine cse454 craigslist++

of 10

  • date post

    12-Jan-2016
  • Category

    Documents

  • view

    216
  • download

    0

Embed Size (px)

Transcript of Craigslist++ sean anastasi joseph chen tatiana gershanovich andreas sekine cse454 craigslist++

craigslist++

craigslist++sean anastasijoseph chentatiana gershanovichandreas sekine

cse454 craigslist++to enhance craigslists interfaceshow related items also being sold at craigslistshow related items from other third-party sitesour goal

cse454 craigslist++main componentscrawler (heretrix)clusterer (carrot2)relevance sortinguser interface (greasemonkey)other stuff

how we do it

cse454 craigslist++specific crawling needsvolatile dataquestionable legalitiesheritrixonly crawling one domainproblematic setupour setup2 crawlers for new posts, 1 cleaner

crawlercse454 craigslist++Carrot2 what to cluster (title, body or title + body)?need of reclustering and combination

WordNet combination of synonym clusters

clusterercse454 craigslist++relevance sortingcse454 craigslist++

relevance sorting (cont.)

cse454 craigslist++greasemonkeyshow related posts (grouped by clusters)show which items have datajqueryfolding item listsmouseover details/images

user interfacecse454 craigslist++

amazon product advertising apiyahoo term extractionbotnet

othercse454 craigslist++greasemonkey pluginhttps://addons.mozilla.org/en-US/firefox/addon/748craigslist++ scripthttp://cubist.cs.washington.edu/~lidor7/craigslistpp.user.jscraigslisthttp://seattle.craigslist.org/

democse454 craigslist++