Facets Berkeley

download Facets Berkeley

of 76

Transcript of Facets Berkeley

  • 8/8/2019 Facets Berkeley

    1/76

    Semi-Automated Creation of

    Facet Hierarchies

    Marti HearstSchool of Information, UC Berkeley

    Joint work with Dr. Emilia Stoica

  • 8/8/2019 Facets Berkeley

    2/76

    Marti Hearst, Taxonomy Bootcamp 06

    Outline

    Faceted Metadata Definition

    Advantages

    Flamenco:

    Search Interface Design using Faceted Metadata

    Castanet: (Semi) Automated Tool for Creation of Category Systems

    Comparison to State-of-the-Art Alternatives

    Conclusions

  • 8/8/2019 Facets Berkeley

    3/76

    Marti Hearst, Taxonomy Bootcamp 06

    Focus: Search and Navigationof Large Collections

    Image

    Collections

    E-Government

    Sites

    Shopping Sites

    Digital Libraries

  • 8/8/2019 Facets Berkeley

    4/76

    Marti Hearst, Taxonomy Bootcamp 06

    Study by Vividence in 2001 on 69 Sites 70% eCommerce

    31% Service

    21% Content

    2% Community

    Poorly organized search results Frustration and wasted time

    Poor information architecture Confusion

    Dead ends

    "back and forthing"

    Forced to search

    Problems with Site Search

  • 8/8/2019 Facets Berkeley

    5/76

    Marti Hearst, Taxonomy Bootcamp 06

    What we want to Achieve

    Integrate browsing and searching seamlessly

    Support exploration and learning

    Avoid dead-ends, pogoing, and lostness

  • 8/8/2019 Facets Berkeley

    6/76

    Marti Hearst, Taxonomy Bootcamp 06

    Main Idea

    Use hierarchical faceted metadata Design the interface to:

    Allow flexible navigation

    Provide previews of next steps

    Organize results in a meaningful way

    Support both expanding and refining the search

  • 8/8/2019 Facets Berkeley

    7/76

    Marti Hearst, Taxonomy Bootcamp 06

    The Problem With Hierarchy

    Most things can be classified in more than one way.

    Most organizational systems do not handle this well.

    Example: Animal Classification

    otter

    penguin

    robin

    salmon

    wolf

    cobra

    bat

    Skin

    Covering

    Locomotion

    Diet

    robin

    bat wolf

    penguin

    otter, sealsalmon

    robin

    bat

    salmon

    wolf

    cobra

    otter

    penguin

    seal

    robin

    penguinsalmon

    cobra

    bat

    otter

    wolf

  • 8/8/2019 Facets Berkeley

    8/76

    Marti Hearst, Taxonomy Bootcamp 06

    Inflexible Force the user to start with a particular category

    What if I dont know the animals diet, but theinterface makes me start with that category?

    Wasteful Have to repeat combinations of categories

    Makes for extra clicking and extra coding

    Difficult to modify To add a new category type, must duplicate it

    everywhere or change things everywhere

    The Problem with Hierarchy

  • 8/8/2019 Facets Berkeley

    9/76

    Marti Hearst, Taxonomy Bootcamp 06

    The Problem With Hierarchy

    start

    fur scales feathers

    swim fly run slither

    fur scales feathers fur scales feathers

    fish

    rodents

    insects

    fish

    rodents

    insects

    fish

    rodents

    insects

    fish

    rodents

    insects

    fish

    rodents

    insects

    fish

    rodents

    insects

    fish

    rodents

    insects

    fish

    rodents

    insects

    fish

    rodents

    insects

    salmon bat robin wolf

  • 8/8/2019 Facets Berkeley

    10/76

    Marti Hearst, Taxonomy Bootcamp 06

    The Idea of Facets

    Facets are a way of labeling data A kind of Metadata (data about data)

    Can be thought of as properties of items

    Facets vs. Categories Items are placed INTO a category system

    Multiple facet labels are ASSIGNED TO items

  • 8/8/2019 Facets Berkeley

    11/76

    Marti Hearst, Taxonomy Bootcamp 06

    The Idea of Facets

    Create INDEPENDENT categories (facets) Each facet has labels (sometimes arranged in a hierarchy)

    Assign labels from the facets to every item Example: recipe collection

    Course

    Main Course

    Cooking

    Method

    Stir-fry

    Cuisine

    Thai

    Ingredient

    Bell Pepper

    Curry

    Chicken

  • 8/8/2019 Facets Berkeley

    12/76

    Marti Hearst, Taxonomy Bootcamp 06

    The Idea of Facets

    Break out all the important concepts into theirown facets

    Sometimes the facets are hierarchical Assign labels to items from any level of the hierarchy

    Preparation Method

    Fry

    Saute

    Boil

    BakeBroil

    Freeze

    Desserts

    Cakes

    Cookies

    Dairy

    Ice CreamSorbet

    Flan

    Fruits

    Cherries

    Berries

    Blueberries

    StrawberriesBananas

    Pineapple

  • 8/8/2019 Facets Berkeley

    13/76

    Marti Hearst, Taxonomy Bootcamp 06

    Using Facets

    Now there are multiple ways to get to each item

    Preparation Method

    Fry

    SauteBoil

    Bake

    Broil

    Freeze

    Desserts

    Cakes

    CookiesDairy

    Ice Cream

    Sherbet

    Flan

    Fruits

    Cherries

    BerriesBlueberries

    Strawberries

    Bananas

    Pineapple

    Fruit > Pineapple

    Dessert > Cake

    Preparation > Bake

    Dessert > Dairy > Sherbet

    Fruit > Berries > Strawberries

    Preparation > Freeze

  • 8/8/2019 Facets Berkeley

    14/76

    Marti Hearst, Taxonomy Bootcamp 06

    Example:

    Nobel Prize Winners Collection(Before and After Facets)

  • 8/8/2019 Facets Berkeley

    15/76

    Marti Hearst, Taxonomy Bootcamp 06

    Only One Way to View Laureates

  • 8/8/2019 Facets Berkeley

    16/76

    Marti Hearst, Taxonomy Bootcamp 06

    First, Choose Prize Type

  • 8/8/2019 Facets Berkeley

    17/76

    Marti Hearst, Taxonomy Bootcamp 06

    Next, view the list!

    The user must first choose an

    Award type (literature), then browse

    through the laureates in

    chronological order.

    No choice is given to, say organize

    by year and then award, or by

    country, then decade, then award, etc.

  • 8/8/2019 Facets Berkeley

    18/76

    Marti Hearst, Taxonomy Bootcamp 06

    Flamenco Interface:Using Hierarchical Faceted Metadata

    O i Vi

  • 8/8/2019 Facets Berkeley

    19/76

    Marti Hearst, Taxonomy Bootcamp 06

    Opening ViewSelect literature from PRIZE facet

  • 8/8/2019 Facets Berkeley

    20/76

    Marti Hearst, Taxonomy Bootcamp 06

    Group results by YEAR facet

  • 8/8/2019 Facets Berkeley

    21/76

    Marti Hearst, Taxonomy Bootcamp 06

    Select 1920s from YEAR facet

    C t i PRIZE lit t AND

  • 8/8/2019 Facets Berkeley

    22/76

    Marti Hearst, Taxonomy Bootcamp 06

    Current query is PRIZE > literature ANDYEAR: 1920s. Now remove PRIZE > literature

  • 8/8/2019 Facets Berkeley

    23/76

    Marti Hearst, Taxonomy Bootcamp 06

    Now Group By YEAR > 1920s

    Hi h T l

  • 8/8/2019 Facets Berkeley

    24/76

    Marti Hearst, Taxonomy Bootcamp 06

    Hierarchy Traversal:Group By YEAR > 1920s, and drill down to 1921

  • 8/8/2019 Facets Berkeley

    25/76

    Marti Hearst, Taxonomy Bootcamp 06

    Select an individual item

  • 8/8/2019 Facets Berkeley

    26/76

    Marti Hearst, Taxonomy Bootcamp 06

    Use Endgame to expand out

  • 8/8/2019 Facets Berkeley

    27/76

    Marti Hearst, Taxonomy Bootcamp 06

    Use Endgame to expand out

  • 8/8/2019 Facets Berkeley

    28/76

    Marti Hearst, Taxonomy Bootcamp 06

    Or use More like this to find similar items

  • 8/8/2019 Facets Berkeley

    29/76

    Marti Hearst, Taxonomy Bootcamp 06

    Start a new search using keyword California

  • 8/8/2019 Facets Berkeley

    30/76

    Marti Hearst, Taxonomy Bootcamp 06

    Note that category structure remains after the keyword search

    h i k d A d i h f bhi h

  • 8/8/2019 Facets Berkeley

    31/76

    Marti Hearst, Taxonomy Bootcamp 06

    The query is now a keyword ANDed with a facet subhierarchy

  • 8/8/2019 Facets Berkeley

    32/76

    Marti Hearst, Taxonomy Bootcamp 06

    Using Facets

    The system only shows the labels that correspondto the current set of items Start with all items and all facets

    The user then selects a label within a facet

    This reduces the set of items (only those that havebeen assigned to the subcategory label are displayed)

    This also eliminates some subcategories from the view.

  • 8/8/2019 Facets Berkeley

    33/76

    Marti Hearst, Taxonomy Bootcamp 06

    Advantages of Facets

    Cant end up with empty results sets (except with keyword search)

    Helps avoid feelings of being lost. Easier to explore the collection.

    Helps users infer what kinds of things are in thecollection. Evokes a feeling of browsing the shelves

    Is preferred over standard search for collection

    browsing in usability studies. (Interface must be designed properly)

  • 8/8/2019 Facets Berkeley

    34/76

    Marti Hearst, Taxonomy Bootcamp 06

    Advantages of Facets

    Seamless to add new facets and subcategories Seamless to add new items.

    Helps with categorization wars Dont have to agree exactly where to place something

    Interaction can be implemented using a standardrelational database.

    May be easier for automatic categorization

  • 8/8/2019 Facets Berkeley

    35/76

    Marti Hearst, Taxonomy Bootcamp 06

    Information previews

    Use the metadata to show where to go next More flexible than canned hyperlinks

    Less complex than full search

    Help users see and return to previous steps

    Reduces mental work Recognition over recall

    Suggests alternatives

    More clicks are ok only if(J. Spool) The scent of the target does not weaken

    If users feel they are going towards, rather than away, from their

    target.

  • 8/8/2019 Facets Berkeley

    36/76

    Marti Hearst, Taxonomy Bootcamp 06

    Facets vs. Hierarchy

    Early Flamenco studies compared allowingmultiple hierarchical facets vs. just one facet.

    Multiple facets was preferred and more successful.

  • 8/8/2019 Facets Berkeley

    37/76

    Marti Hearst, Taxonomy Bootcamp 06

    Limitation of Facets

    Do not naturally capture MAIN THEMES

    Facets do not show RELATIONS explicitly

    Aquamarine

    Red

    Orange

    Door

    Doorway

    Wall

    Which color associated with which object?

    Photo by J. Hearst, jhearst.typepad.com

  • 8/8/2019 Facets Berkeley

    38/76

    Marti Hearst, Taxonomy Bootcamp 06

    Terminology Clarification

    Facets vs. Attributes Facets are shown independently in the interface Attributes just associated with individual items

    E.g., ID number, Source, Affiliation

    However, can always convert an attribute to a facet

    Facets vs. Labels Labels are the names used within facets

    These are organized into subhierarchies

    Synonyms There should be alternate names for the category labels

    Currently (in Flamenco) this is done with subcategories

    E.g., Deer has subcategories stag, fawn, doe

  • 8/8/2019 Facets Berkeley

    39/76

    Marti Hearst, Taxonomy Bootcamp 06

    Usability Study Results

  • 8/8/2019 Facets Berkeley

    40/76

    Marti Hearst, Taxonomy Bootcamp 06

    Flamenco Usability Studies

    Usability studies done on 3 collections: Recipes (epicurious): 13,000 items Architecture Images: 40,000 items Fine Arts Images: 35,000 items

    Conclusions: Users like and are successful with the dynamic

    faceted hierarchical metadata, especially forbrowsing tasks

    Very positive results, in contrast with studies onearlier iterations.

  • 8/8/2019 Facets Berkeley

    41/76

    Marti Hearst, Taxonomy Bootcamp 06

    Most Recent Usability Study

    Participants & Collection 32 Art History Students ~35,000 images from SF Fine Arts Museum

    Study Design Within-subjects

    Each participant sees both interfaces Balanced in terms of order and tasks

    Participants assess each interface after use Afterwards they compare them directly

    Data recorded in behavior logs, server logs, paper-surveys; one or

    two experienced testers at each trial. Used 9 point Likert scales. Session took about 1.5 hours; pay was $15/hour

  • 8/8/2019 Facets Berkeley

    42/76

    Marti Hearst, Taxonomy Bootcamp 06

    Post-Interface Assessments

    All significant at p

  • 8/8/2019 Facets Berkeley

    43/76

    Marti Hearst, Taxonomy Bootcamp 06

    Post-Test Comparison

    15 16

    2 30

    1 29

    4 28

    8 23

    6 24

    28 3

    1 31

    2 29

    FacetedBaseline

    Overall Assessment

    More useful for your tasks

    Easiest to use

    Most flexible

    More likely to result in dead ends

    Helped you learn more

    Overall preference

    Find images of roses

    Find all works from a given period

    Find pictures by 2 artists in same media

    Which Interface Preferable For:

  • 8/8/2019 Facets Berkeley

    44/76

    How to Create Facet Hierarchies?

    Our Approach: Castanet

  • 8/8/2019 Facets Berkeley

    45/76

    Marti Hearst, Taxonomy Bootcamp 06

    Example: Recipes (3500 docs)

  • 8/8/2019 Facets Berkeley

    46/76

    Marti Hearst, Taxonomy Bootcamp 06

    Castanet Output (shown in Flamenco)

    h l

  • 8/8/2019 Facets Berkeley

    47/76

    Marti Hearst, Taxonomy Bootcamp 06

    Castanet Output (shown in Flamenco)

    C O ( h i Fl )

  • 8/8/2019 Facets Berkeley

    48/76

    Marti Hearst, Taxonomy Bootcamp 06

    Castanet Output (shown in Flamenco)

    C O ( h i Fl )

  • 8/8/2019 Facets Berkeley

    49/76

    Marti Hearst, Taxonomy Bootcamp 06

    Castanet Output (shown in Flamenco)

    C O ( h i Fl )

  • 8/8/2019 Facets Berkeley

    50/76

    Marti Hearst, Taxonomy Bootcamp 06

    Castanet Output (shown in Flamenco)

    Our Approach:

  • 8/8/2019 Facets Berkeley

    51/76

    Marti Hearst, Taxonomy Bootcamp 06

    Our Approach:Leverage the structure of WordNet

    O A h

  • 8/8/2019 Facets Berkeley

    52/76

    Marti Hearst, Taxonomy Bootcamp 06

    Our Approach

    Leverage the structure of WordNet

    Docum

    ents

    WordNet

    Get

    hypernym

    pathsSelect

    terms

    Build

    treeCompress

    tree

    Divide into facets

  • 8/8/2019 Facets Berkeley

    53/76

    Marti Hearst, Taxonomy Bootcamp 06

    1. Select Terms

    red blue

    Select well distributed

    terms from collection Documen

    ts

    WordNet

    Get

    hypernym

    pathsSelectterms

    Build

    tree

    Comp.

    tree

    h

  • 8/8/2019 Facets Berkeley

    54/76

    Marti Hearst, Taxonomy Bootcamp 06

    2. Get Hypernym Path

    red blue

    chromatic color

    abstraction

    property

    visual property

    color

    red, redness

    abstraction

    property

    visual property

    color

    blue, blueness

    chromatic color

    Documen

    ts

    WordNet

    Get

    hypernym

    pathsSelectterms

    Build

    tree

    Comp.

    tree

  • 8/8/2019 Facets Berkeley

    55/76

    Marti Hearst, Taxonomy Bootcamp 06

    3. Build Tree

    red blue

    chromatic color

    abstraction

    property

    visual property

    color

    red, redness

    abstraction

    property

    visual property

    color

    blue, blueness

    chromatic color

    red blue

    abstraction

    property

    visual property

    color

    red, redness

    chromatic color

    blue, blueness

    Documen

    ts

    WordNet

    Get

    hypernym

    pathsSelectter

    ms

    Build

    tree

    Comp.

    tree

    4 Compress Tree

  • 8/8/2019 Facets Berkeley

    56/76

    Marti Hearst, Taxonomy Bootcamp 06

    4. Compress Tree

    Documen

    ts

    WordNet

    Get

    hypernym

    pathsSelectter

    ms

    Build

    tree

    Comp.

    tree

    red, redness

    color

    red

    chromatic color

    blue, blueness

    blue

    green, greenness

    greengreenred

    color

    chromatic color

    blue

    4 Compress Tree (cont )

  • 8/8/2019 Facets Berkeley

    57/76

    Marti Hearst, Taxonomy Bootcamp 06

    4. Compress Tree (cont.)

    red

    color

    chromatic color

    blue green

    color

    red blue green

    Documen

    ts

    WordNet

    Get

    hypernym

    pathsSelectter

    ms

    Build

    tree

    Comp.

    tree

    5 Divide into Facets

  • 8/8/2019 Facets Berkeley

    58/76

    Marti Hearst, Taxonomy Bootcamp 06

    5. Divide into Facets

    Divide into facets

    Disambiguation

  • 8/8/2019 Facets Berkeley

    59/76

    Marti Hearst, Taxonomy Bootcamp 06

    Disambiguation

    Ambiguity in: Word senses

    Paths up the hypernym tree

    Sense 1 for word tuna

    organism, being

    => plant, flora=> vascular plant

    => succulent

    => cactus

    => tuna

    Sense 2 for word tuna

    organism, being

    => fish=> food fish

    => tuna

    => bony fish

    => spiny-finned fish

    => percoid fish

    => tuna

    2 paths for same word

    2 paths for

    same sense

    How to Select the Right Senses and Paths?

  • 8/8/2019 Facets Berkeley

    60/76

    Marti Hearst, Taxonomy Bootcamp 06

    How to Select the Right Senses and Paths?

    First: build core tree (1) Create paths for words with only one sense

    (2) Use Domains Wordnet has 212 Domains

    medicine, mathematics, biology, chemistry, linguistics, soccer, etc.

    Automatically scan the collection to see which domains apply

    The user selects which of the suggested domains to use or may add own

    Paths for terms that match the selected domains are added to the coretree

    Then: add remaining terms to the core tree.

  • 8/8/2019 Facets Berkeley

    61/76

  • 8/8/2019 Facets Berkeley

    62/76

    Castanet Evaluation

    Castanet Evaluation

  • 8/8/2019 Facets Berkeley

    63/76

    Marti Hearst, Taxonomy Bootcamp 06

    Castanet Evaluation

    This is a tool for information architects, so peopleof this type did the evaluation

    We compared output on Recipes

    Biomedical journal titles

    We compared to two state-of-the-art algorithms LDA (Blei et al. 04)

    Subsumption (Sanderson & Croft 99)

    Subsumption Output (shown in Flamenco)

  • 8/8/2019 Facets Berkeley

    64/76

    Marti Hearst, Taxonomy Bootcamp 06

    Subsumption Output (shown in Flamenco)

    Subsumption Output (shown in Flamenco)

  • 8/8/2019 Facets Berkeley

    65/76

    Marti Hearst, Taxonomy Bootcamp 06

    Subsumption Output (shown in Flamenco)

    Subsumption Output (shown in Flamenco)

  • 8/8/2019 Facets Berkeley

    66/76

    Marti Hearst, Taxonomy Bootcamp 06

    Subsumption Output (shown in Flamenco)

    Subsumption Output (shown in Flamenco)

  • 8/8/2019 Facets Berkeley

    67/76

    Marti Hearst, Taxonomy Bootcamp 06

    Subsumption Output (shown in Flamenco)

  • 8/8/2019 Facets Berkeley

    68/76

    LDA Output (shown in Flamenco)

  • 8/8/2019 Facets Berkeley

    69/76

    Marti Hearst, Taxonomy Bootcamp 06

    LDA Output (shown in Flamenco)

    LDA Output (shown in Flamenco)

  • 8/8/2019 Facets Berkeley

    70/76

    Marti Hearst, Taxonomy Bootcamp 06

    LDA Output (shown in Flamenco)

    Evaluation Method

  • 8/8/2019 Facets Berkeley

    71/76

    Marti Hearst, Taxonomy Bootcamp 06

    Evaluation Method

    Information architects assessed the categorysystems

    For each of 2 systems output: Examined and commented on top-level

    Examined and commented on two sub-levels

    Then comment on overall properties Meaningful?

    Systematic? Likely to use in your work?

    Evaluation Results

  • 8/8/2019 Facets Berkeley

    72/76

    Marti Hearst, Taxonomy Bootcamp 06

    Evaluation Results

    Results on recipes collection for Would you usethis system in your work? Yes in some cases or yes definitely:

    Pine (Castanet): 29/34

    Oak (LDA): 0/18 Birch (Subsumption): 6/16

    Results on quality of categories:

    Opportunities for Tagging

  • 8/8/2019 Facets Berkeley

    73/76

    Marti Hearst, Taxonomy Bootcamp 06

    Opportunities for Tagging

    New opportunity: Tagging, folksonomies (flickr de.lici.ous)

    People are created facets in a decentralized manner

    They are assigning multiple facets to items

    This is done on a massive scale This leads naturally to meaningful associations

    Conclusions

  • 8/8/2019 Facets Berkeley

    74/76

    Marti Hearst, Taxonomy Bootcamp 06

    Conclusions

    Flexible application of hierarchical faceted metadata is aproven approach for navigating large informationcollections.

    Midway in complexity between simple hierarchies and deepknowledge representation.

    Currently in use on e-commerce sites; spreading to other domains

    Systems are needed to help create faceted metadatastructures

    Our WordNet-based algorithm, while not perfect, seems like itwill be a useful tool for Information Architects.

  • 8/8/2019 Facets Berkeley

    75/76

  • 8/8/2019 Facets Berkeley

    76/76

    For more information:

    flamenco.berkeley.edu

    Thank you!

    Marti Hearst & Emilia Stoica