A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.
-
date post
19-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.
Vision
• Transform hyperlinked bags of words into semantically rich aggregate view of information on the web.
Instances
• Record of a concept– Restaurant• Gochi (19980 Homestead Rd Cupertino CA)
– Academia?• Publications, research institutions
Instance Representation
• Loosely-structured record (lrec)– Attribute-key, value pairs– Unique id field• Entity matching problem
– Metadata• Attribute list
Usage StudyInstance vs. Concept Search
• yelp.com– Month of queries resulting in a click (restaurants)– 59% specific business URL– 19% search URL either specific business or group– 11% specific group URL
Usage StudyConcept Attribute Search
• Remove restaurant name and location information from query
• Co-occuring words:– Menu (3%), coupons (1.8%), online, weekly
specials, locations (1.5%)– Nutrition, to go, delivery, careers, cod
Usage StudyAggregation Value
• 59% clicked on at least one other URL• 35% clicked on at least two other URLs• Small manual evaluation indicates pages are
often about the same business.
Usage StudyConcepts vs. Browsing
• 42% of homepage visits are from search engine– Immediately following URL• 11.5% location• 9% menu • 1% coupons
• 10.5% of user trails contain more than one distinct instance of the restaurant concept
Extraction
• Create new records from the web– Information extraction– Linking– Analysis• Meta-data tagging (cuisine type)
Domain-centric vs. Site-centric Extraction
• Site-centric extraction– Wrappers for page structure– Probabilistic models (CRF)
• Domain-centric extraction– Fields of interest– Statistical properties (single zip code, etc.)– Structure components (lists, link relationships)
Domain-centric Extraction
• Aggregator mining– Learn from extracted knowledge (similar menus)
• Matching– Text is “about” a record (restaurant review)
ApplicationSession Optimization
• User understanding– Historical modeling– Session modeling
• Content understanding• Example: Birks– Birks and Mayors (luxury Jewelers) vs. Birk’s
Steakhouse
ApplicationBrowse Optimization
• Alternatives: (Restaurants)– Similar type of cuisine– Similar location– Similar quality
• Augmentations: (Camera)– Batteries – Memory cards
Concept Search
Result Pages – shows multiple recordsConcept Pages – information about an instanceArticle Pages – a piece of authored text
Challenges
• Transfer learning– Transfer extractor knowledge
• Tracking uncertainty– Accuracy issues– “Web of concepts is not a one time affair”• Wrapper problems• Concept updates
• Relevance Measures– User satisfaction