Approaches to Modeling and Learning User Preferences
Marie desJardins
University of Maryland Baltimore County
Presented at SRI International AI Center
March 10, 2008
Joint work with Fusun Yaman, Michael Littman, and Kiri Wagstaff
Overview
Representing Preferences Learning Planning Preferences Preferences over Sets Directions / Conclusions
Representing Preferences
What is a Preference? (Partial) ordering over outcomes
Feature vector representation of “outcomes” (aka “objects) Example: Taking a vacation. Features:
Who (alone / family) Where (Orlando / Paris) Flight type (nonstop / onestop / multistop ) Cost (low / medium / high) …
Languages: Weighted utility function CP-net Lexicographic ordering
Weighted Utility Functions Each value vij of feature fi has an
associated utility uij
Utility Uj of object oj = <v1j, v2j, …, vkj>: Uj = ∑i wj uij
Commonly used in preference elicitation Easy to model Independence of features is convenient Flight example:
U(flight) = .8*u(Who) + .8*u(Cost)+.6*u(Where) + .4*u(Flight Type) …
CP-Nets
Conditional Preference Network Intuitive, graphical representation of conditional
preferences under a ceteris paribus (“all-else-being equal”) assumption
who family > alone
where family : Orlando > Parisalone : Paris > Orlando
I prefer to take a vacation with my family, rather than going aloneIf I am with my family, I prefer Orlando to ParisIf I am alone, I prefer Paris to Orlando
Every CP-net induces a preference graph on outcomes:
The partial ordering of outcomes is given by the transitive closure of the preference graph
Induced Preference Graph
family > alone
family : Orlando > Parisalone : Paris >Orlando
who
where
alone Orlando
family Orlando
alone Paris
family Paris
Lexicographic Orderings Features are prioritized with a total ordering f1, …, fk
Each value of each feature is prioritized with a total ordering, vi1…vim
To compare o1 and o2: Find the first feature in the feature ordering on which
o1 and o2 differ Choose the outcome with the preferred value for that feature
Travel example: Who > Where > Cost > Where > Flight-Type > …
Family > Alone Orlando > Florida … Cheap > Expensive
Representation Tradeoffs Each representation has some limitations Additive utility functions can’t capture conditional
preferences, and can’t easily represent “hard” constraints or preferences
CP-nets, in general, only give a partial ordering, can’t model integer/real features easily, and can’t capture tradeoffs
Lexicographic preferences can’t capture tradeoffs, and can’t represent conditional preferences
Learning Planning Preferences
Planning AlgorithmsPlanning Algorithms
Domain-independent Inputs: initial state, goal state, possible actions Domain-independent but not efficient
Domain-specific Works for only one domain (Near-) optimal reasoning Very fast
Domain-configurable Use additional planning knowledge to customize the
search automatically Broadly applicable and efficient
Domain Knowledge for PlanningDomain Knowledge for Planning
Provide search control information Hierarchy of abstract actions (HTN operators) Logical formulas (e.g., temporal logic)
Experts must provide planning knowledge May not be readily available Difficult to express knowledge declaratively
Learning Planning KnowledgeLearning Planning Knowledge
Alternative: Learn planning knowledge by observation (i.e., from example plans)
Possibly even learn from a single complex example DARPA’s Integrated Learning Program
Our focus: Learn preferences at various decision points Charming Hybrid Adaptive Ranking Model
Currently: Learns preferences over variable bindings Future: Learn goal and operator preferences
HTN: Hierarchical Task NetworkHTN: Hierarchical Task Network
Objectives are specified as high-level tasks to be accomplished
Methods describe how high-level tasks are decomposed down to primitive tasks
travel(X,Y)
short-distance travel
long-distance travel
buyTicket(Ax,Ay) fly(Ax,Ay)travel(X,Ax)
getTaxi(X) rideTaxi(X,Y) payDriver
travel(Ay,Y)
Primitive actions
High-level taskstravel(X,Y)
travel(X,Y)
HTN operators
CHARM: Charming Hybrid CHARM: Charming Hybrid Adaptive Ranking ModelAdaptive Ranking Model
Learns preferences in HTN methods Which objects to choose when using a particular
method? Which flight to take? Which airport to choose?
Which goal to select next during planning? Which method to choose to achieve a task?
By plane or by train?
Preferences are expressed as lexicographic orderings A natural choice for many (not all) planning
domains
Summary of CHARMSummary of CHARM
CHARM learns a preference rule for each method. Given: an HTN, initial state, and the plan tree Find: an ordering on variable values for each decision point
(planning context) CHARM has two modes
Gather training data for each method Orlando = (tropical, family-oriented, expensive) is preferred to
Boise = (cold, outdoors-oriented, cheap) Learn preference rule in each method
Preference Rules Preference Rules
A preference rule is a function that returns <, =, or >, given two objects represented as vectors of attributes.
Assumption: Preference rules are lexicographic For every attribute there is a preferred value There is a total order on the attributes representing the
order of importanceA warm destination is preferred to a cold one. Among
destinations of the same climate, an inexpensive one is better than an expensive one….
Learning Lexicographic Preference Learning Lexicographic Preference ModelsModels
Existing algorithms return one of many models consistent with the data
The worst case performance of such algorithms is worse than random selection Higher probability of poor performance if there are fewer
training observations A novel democratic approach: Variable Voting
Sample the possible consistent models Implicit sampling: models that satisfy certain properties are
permitted to vote Preference decision is based on the majority of votes
Variable VotingVariable Voting
Given a partial order, <, on the attributes and two objects, A and B: D={ attributes that are different in A and B } D*={ most salient attributes in D with respect to < } The object with the largest number of preferred values for
the attributes in D* is the preferred object
X1 X2 X3 X4 X5
A 1 0 1 0 0
B 0 0 1 1 1
Learning Variable Ranks Initially, all attributes are
equally important Loop until ranks converge:
Given two objects, predict a winner using the current beliefs
If the prediction was wrong, decrease the importance of the attribute values that led to the wrong prediction
The importance of an attribute never goes beyond its actual place in the order of attributes
Mistake bounds algorithm, learns from its mistakes Mistake bound is O( n2 ),
where n is the number of attributes
Democracy vs. Autocracy
VariableVoting
Preferences Over Sets
Preferences over Sets
Subset selection applications:Remote sensing, sports teams, music playlists,
planning
Ranking, like a search engine?Doesn’t capture dependencies between items
Encode, apply, learn set-based preferences
+
Complementarity
+
Redundancy
User Preferences
Depth: utility function (desirable values)
Diversity: variety and coverage
Geologist: near + far views (context)
Example: prefer images with with more rock than sky
Rock: 25%Soil: 75%Sky: 0%
Rock: 10%Soil: 50%Sky: 40%
Encoding User Preferences
DD-PREF: a language for expressing preferred depth and diversity, for sets
utility.
Sky
SoilRock
Depth Diversity
or ?
Finding the Best Subset
Maximize
where
Depth
Diversity
utility of subset s
per-item utility
diversity value of s
per-feature diversity(1 - skew)
subset valuation
subsetpreference
Learning Preferences from Examples
Hard for users to specify quantitative values (especially with more general quality functions)
Instead, adopt a machine learning approach
1. Users provide example sets with high valuation
2. System infers:
• Utility functions
• Desired diversity
• Feature weights
• Once trained, the system can select subsets of new data (blocks, images, songs, food)
Depth: utility functions
Probability density estimation: KDE (kernel density estimation) [Duda et al., 01]
Diversity: average of observed diversities
Feature weights: minimize difference between computed valuation and true valuation
BFGS bounded optimization [Gill et al., 81]
Learning a Preference Model
% Sky
% Rock
Results: Blocks World
Compute valuation of sets chosen by true preference, learned preference, and random selection
As more training sets are available, performance increases (learned approximates true)
Mosaic Tower
Lower baseline
Rover Image Experiments
MethodologySix users: 2 geologists, 4 computer scientists
Five sets of 20 images each Each user selects a subset of 5 images from each set
EvaluationLearn preferences on (up to 4) examples,
select a new subset from a held-out setMetrics:
Valuation of the selected subset Functional similarity between learned preferences
Learned Preferences
Subset of 5 images, chosen by a geologist, from 20 total
Learned diversities:
Rock 0.8Soil 0.9Sky 0.5
Learned feature weights:
Rock 0.3Soil 0.1Sky 1.0
Learned utility functions:Sky
Soil
Rock
Subset SelectionSubset of 5 images, chosen by a geologist, from 20 total
5 images chosen from 20 images, using greedy DD-Select and learned prefs
5 images chosen by the same geologist from the same 20 new images
Future Directions
Future Directions Hybrid preference representation
Decision tree with lexicographic orderings at the leaves
Permits conditional preferences How to learn the “splits” in the tree?
Support operator, goal orderings for planning
Incorporate concept of set-based preferences into planning domains
Top Related