A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.

21
A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.

A Web of ConceptsDalvi, et al.

Presented by Andrew Zitzelberger

Vision

• Transform hyperlinked bags of words into semantically rich aggregate view of information on the web.

Concept

• Things of interest– Searching for information– Accomplishing a task• Reservations, etc.

Instances

• Record of a concept– Restaurant• Gochi (19980 Homestead Rd Cupertino CA)

– Academia?• Publications, research institutions

Instance Representation

• Loosely-structured record (lrec)– Attribute-key, value pairs– Unique id field• Entity matching problem

– Metadata• Attribute list

Domain

• Set of related concepts– Academic community domain = {publications,

people, conferences}

Usage StudyInstance vs. Concept Search

• yelp.com– Month of queries resulting in a click (restaurants)– 59% specific business URL– 19% search URL either specific business or group– 11% specific group URL

Usage StudyConcept Attribute Search

• Remove restaurant name and location information from query

• Co-occuring words:– Menu (3%), coupons (1.8%), online, weekly

specials, locations (1.5%)– Nutrition, to go, delivery, careers, cod

Usage StudyAggregation Value

• 59% clicked on at least one other URL• 35% clicked on at least two other URLs• Small manual evaluation indicates pages are

often about the same business.

Usage StudyConcepts vs. Browsing

• 42% of homepage visits are from search engine– Immediately following URL• 11.5% location• 9% menu • 1% coupons

• 10.5% of user trails contain more than one distinct instance of the restaurant concept

Extraction

• Create new records from the web– Information extraction– Linking– Analysis• Meta-data tagging (cuisine type)

Domain-centric vs. Site-centric Extraction

• Site-centric extraction– Wrappers for page structure– Probabilistic models (CRF)

• Domain-centric extraction– Fields of interest– Statistical properties (single zip code, etc.)– Structure components (lists, link relationships)

Domain-centric Extraction

• Aggregator mining– Learn from extracted knowledge (similar menus)

• Matching– Text is “about” a record (restaurant review)

ApplicationAggregation

ApplicationSession Optimization

• User understanding– Historical modeling– Session modeling

• Content understanding• Example: Birks– Birks and Mayors (luxury Jewelers) vs. Birk’s

Steakhouse

ApplicationBrowse Optimization

• Alternatives: (Restaurants)– Similar type of cuisine– Similar location– Similar quality

• Augmentations: (Camera)– Batteries – Memory cards

Concept Search

Result Pages – shows multiple recordsConcept Pages – information about an instanceArticle Pages – a piece of authored text

Advertising

• Increase in targeted advertisements• Target concepts rather than keywords

Challenges

• Transfer learning– Transfer extractor knowledge

• Tracking uncertainty– Accuracy issues– “Web of concepts is not a one time affair”• Wrapper problems• Concept updates

• Relevance Measures– User satisfaction

Related Work

• Information Extraction/Integration Systems• Dataspace Systems• Semantic Web

Future Work

• Enrich representation model– Path storage to data– Provenance, versions, uncertainty– Hierarchal relationships (containment or

inheritance)

• Ranking of disparate sources