Web-based User Profiles -...
Transcript of Web-based User Profiles -...
Web-based User Profiles Challenges and Opportunities
Christin Seifert MDPS Workshop, Lyon
2013-12-13 (Friday..)
in a Nutshell
Client Component
Privacy Proxy
Federated Recommender
Source 1: Search System
Source 2 CBR System
...
Visits
Web sitesmajor web hubs long-tail content sites
cultural, educational, scientific content
users' regularwhereabouts
Web-based User Profiles
Ch1: Visualization -> transparency
-> trust -> adaptability
Ch2: Learning -> avoid manual entry -> fine grained data
-> large-scale usage in applications
O3: Applications ? privacy ? trust
? accuracy
Ch 3: Definition - rich user models
- user context - time framing
Web-based User Profiles
Ch1: Visualization -> transparency
-> trust -> adaptability
Ch2: Learning -> avoid manual entry -> fine grained data
-> large-scale usage in applications
O3: Applications ? privacy ? trust
? accuracy
Ch 3: Definition - rich user models
- user context - time framing
Web-based User Profiles
Ch1: Visualization -> transparency
-> trust -> adaptability
Ch2: Learning -> avoid manual entry -> fine grained data
-> large-scale usage in applications
O3: Applications ? privacy ? trust
? accuracy
Ch 3: Definition - rich user models
- user context - time framing
Goal: “Saying the ‘right’ thing at the ‘right’ time in the ’right’ way.” [Fischer, 2001]
User model: “.. is the knowledge about the user, either explicitly or implicitly encoded, that is used by the system to improve the interaction. [Kass & Finin, 1988]
User profile: “.. is a machine-processable representation of a user model for the purpose of user identification and personalization.” [Carberry et al., 2013]
User models and profiles
“In fuzzy terms, context is [...] the “everything else” of the environment. More precisely, context is the set of features in the environment that are not explicitly intended as input into the system being discussed. “ [Rhodes, 2000]
“Context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves.” [Dey & Abowd, 1999]
(after reviewing 150 notions of contexts) "A definition of context depends on the field of knowledge that it belongs to" [Bazire and Brézillon, 2005]
Context
Our Viewuser profile is aggregated information
long-term profile information (long-term interests, knowledge, regular tasks, birthday, name, ..) does not change over a “longer” period of time
short-term profile information (interests, task) is only valid for a “short” period of time [Li et al., 2007, Bennett et al., 2012]
context (device information, physical surroundings, social setting) is not part of user, can not be captured, or is not aggregated information
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumo- city- country- house-nr- postal code- state- streetSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumo- city- country- house-nr- postal code- state- streetSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumo- city- country- house-nr- postal code- state- streetSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
not considered: Individual traits (introvert/extrovert) usually captured by extensive psychological tests [Brusilovsky & Millan, 2004]
User Model
Long-Term Profile Short-Term Profile- demographics- interest- knowledge- behavioral patterns, tasks, goals- social connections- resource relations
- task- interest- session
Context- physical skills- physical activities- social- location- focus
device -physical surroundings -
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumoSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumoSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
Semantic Description
ExampleGoal: Finding literature about privacy in recommender systems
Task:
Literature search
Topics:
http://dbpedia.org/page/Recommender_system
http://dbpedia.org/page/Privacy
User Model
Long-Term Profile Short-Term Profile- demographics- interest- knowledge- behavioral patterns, tasks, goals- social connections- resource relations
- task- interest- session
Context- physical skills- physical activities- social- location- focus
device -physical surroundings -
extensive, semantically described user model
open: which parts can be inferred automatically, what can be exploited (for specific applications), how to convey the content to the users
User Model
Web-based User Profiles
Ch1: Visualization -> transparency
-> trust -> adaptability
Ch2: Learning -> avoid manual entry -> fine grained data
-> large-scale usage in applications
O3: Applications ? privacy ? trust
? accuracy
Ch 3: Definition - rich user models
- user context - time framing
Information Visualization
”Information visualization (InfoVis) is the communication of abstract data through the
use of interactive visual interfaces.“
[Keim et al., 2006]
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumo- city- country- house-nr- postal code- state- streetSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumo- city- country- house-nr- postal code- state- streetSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumo- city- country- house-nr- postal code- state- streetSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
Hierarchical/network data
Nominal DataGeo-spatial data
Time
Complex multivariate data with different types of variables Allow interactive adaptations
Hierarchical/network data
TreeMap [Johnson and Shneiderman, 1991]
Marks and ChannelsThe 8 Visual Channels - Position
Projection examples: Left are geographic projections and right are projections ofmultidimensional data (i.e. text documents) on a 2D surface while retaining thetopical similarity of documents.
Source Wikipedia
Source Granitzer
VA:II-18 Foundations © Granitzer/Seifert 2013
Information Landscapes [Sabol et al., 2009]
DOITrees [Heer & Card, 2004]
Hierarchical/network data
FDP [Fruchtermann & Reingold, 1991]
Danny Holten & Jarke J. van Wijk / Force-Directed Edge Bundling for Graph Visualization
Figure 7: US airlines graph (235 nodes, 2101 edges) (a) not bundled and bundled using (b) FDEB with inverse-linear model,(c) GBEB, and (d) FDEB with inverse-quadratic model.
Figure 8: US migration graph (1715 nodes, 9780 edges) (a) not bundled and bundled using (b) FDEB with inverse-linearmodel, (c) GBEB, and (d) FDEB with inverse-quadratic model. The same migration flow is highlighted in each graph.
Figure 9: A low amount of straightening provides an indication of the number of edges comprising a bundle by widening thebundle. (a) s = 0, (b) s = 10, and (c) s = 40. If s is 0, color more clearly indicates the number of edges comprising a bundle.
we generated use the rendering technique described in Sec-tion 4.1. To facilitate the comparison of migration flow inFigure 8, we use a similar rendering technique as the onethat Cui et al. [CZQ⇤08] used to generate Figure 8c.
The airlines graph is comprised of 235 nodes and 2101edges. It took 19 seconds to calculate the bundled airlinesgraphs (Figures 7b and 7d) using the calculation scheme pre-
sented in Section 3.3. The migration graph is comprised of1715 nodes and 9780 edges. It took 80 seconds to calculatethe bundled migration graphs (Figures 8b and 8d) using thesame calculation scheme. All measurements were performedon an Intel Core 2 Duo 2.66GHz PC running Windows XPwith 2GB of RAM and a GeForce 8800GT graphics card.Our prototype was implemented in Borland Delphi 7.
c� 2009 The Author(s)Journal compilation c� 2009 The Eurographics Association and Blackwell Publishing Ltd.
Hierachical Edge-Bundling [Holten & Wijk, 2009]
Dot (GraphVis) [Emden et al., 1993]
dictionaries are used to take into account the vocabulary and semantics of terms (concepts):
Figure 3. Long-term and short-term interests visualization for a user (“Up”). Time has been divided into 4 semesters based on homogeneity frequency of activities (semester 2 of 2008, semester 1 of 2009, semester 2 of 2009 and semester 1 of 2010). The graph is undirected, node-weighted, and edges weighted. Each node represents a user (histogram bars with different colors) or an interest (histogram bars with blue color). For one node, each bar represents the frequency of the node for a time period. The succession of the bars for each node is made in a clock-wise representation of time-periods, beginning here from semester 1 of 2009. For instance, for the node “up” the red bar represents his activity frequency in semester 1 of 2009, the orange bar represents his activity frequency in semester 1 of 2010, the yellow bar represents his activity frequency in semester 2 of 2008, and the green bar represents his activity frequency in semester 2 of 2009.
• Positive filters (domains concepts): these filters contain exclusive terms to retain in the document. This may be the projection of text documents on specific domain ontology to select only specific concepts in this area. For our experimentation, we are not interested in a specific domain, thus we used as positives filters all the concepts which appear more than a predefine threshold in the document (e.g., more than 2 or 3 times).
• Negative filters (empty concepts): these filters contain concepts having no meaning in the context of the study. Typical examples of empty concepts are articles of any languages. Depending on the languages
studied, some negative filters already exist and can be reused or enriched.
• Synonyms dictionaries: several terms may refer to a single concept of the studied area. Synonyms dictionaries link several terms referring to the same concept. Depending on the studied languages, a synonyms dictionary can be automatically built and enriched from the document.
Once interests (domain concepts) are discovered from text, they are projected toward users and their dates of use (time) in order to construct a 3-D co-occurrence matrice [17]. Depending on the desired temporal level of granularity in the analysis, it is possible to split the time in different periods (e.g.,
372372372
Facebook Profile [Tchunte et al. 2010]
Infovis and User Profiles
Open Learner Models [Bull & Kay, 2012]
Fig. 1. Skill meters in SQL Tutor [4]
Fig. 2. Skill meters and structured view in OLMlets [20]
Skill meters are the most commonly used simple overviews of the learner model contents, with a meter assigned to each topic or concept, which may include separate skill meters for sub-topics. (The latter allows simple structuring in the model presentation - e.g. [9].) Most skill meters show the level of the user's knowledge, understanding or skill as a subset of expert knowledge[9], [21]. The two examples in Figures 1 and 2 additionally show: (i) level of understanding as a proportion of areas covered [4]; and (ii) the proportion of areas of difficulty that can be attributed to
Fig. 5. The Flexi-OLM concept map [29]
Fig. 6. Tree structures in UM [30] and Flexi-OLM [27]
User ProfileTreeMaps, Landscapes and graph layout are hard to understand, and not easily interactively adaptable
DOITree approach seems most promising, if
we can filter (focus),
create a tree from the graph
Because we automatically infer the user profile (and make mistakes), we should provide explanations.
Danny Holten & Jarke J. van Wijk / Force-Directed Edge Bundling for Graph Visualization
Figure 7: US airlines graph (235 nodes, 2101 edges) (a) not bundled and bundled using (b) FDEB with inverse-linear model,(c) GBEB, and (d) FDEB with inverse-quadratic model.
Figure 8: US migration graph (1715 nodes, 9780 edges) (a) not bundled and bundled using (b) FDEB with inverse-linearmodel, (c) GBEB, and (d) FDEB with inverse-quadratic model. The same migration flow is highlighted in each graph.
Figure 9: A low amount of straightening provides an indication of the number of edges comprising a bundle by widening thebundle. (a) s = 0, (b) s = 10, and (c) s = 40. If s is 0, color more clearly indicates the number of edges comprising a bundle.
we generated use the rendering technique described in Sec-tion 4.1. To facilitate the comparison of migration flow inFigure 8, we use a similar rendering technique as the onethat Cui et al. [CZQ⇤08] used to generate Figure 8c.
The airlines graph is comprised of 235 nodes and 2101edges. It took 19 seconds to calculate the bundled airlinesgraphs (Figures 7b and 7d) using the calculation scheme pre-
sented in Section 3.3. The migration graph is comprised of1715 nodes and 9780 edges. It took 80 seconds to calculatethe bundled migration graphs (Figures 8b and 8d) using thesame calculation scheme. All measurements were performedon an Intel Core 2 Duo 2.66GHz PC running Windows XPwith 2GB of RAM and a GeForce 8800GT graphics card.Our prototype was implemented in Borland Delphi 7.
c� 2009 The Author(s)Journal compilation c� 2009 The Eurographics Association and Blackwell Publishing Ltd.
Marks and ChannelsThe 8 Visual Channels - Position
Projection examples: Left are geographic projections and right are projections ofmultidimensional data (i.e. text documents) on a 2D surface while retaining thetopical similarity of documents.
Source Wikipedia
Source Granitzer
VA:II-18 Foundations © Granitzer/Seifert 2013
Web-based User Profiles
Ch1: Visualization -> transparency
-> trust -> adaptability
Ch2: Learning -> avoid manual entry -> fine grained data
-> large-scale usage in applications
O3: Applications ? privacy ? trust
? accuracy
Ch 3: Definition - rich user models
- user context - time framing
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumoSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
Learning Tasks
Task detection
Topic detection
Mining Tasks
Resource Relations
Social Connections
Task DetectionOn the dektop [Rath et al., 2008]
bag-of-words, VSM, TF-IDF; SVM, NB, KNN
<10, predefined or user-defined tasks
80% accuracy, most informative feature is window title
In Web-setting unsupervised, defined by query similarity or time frame
<1% multitasking sessions with mostly 2 tasks [Buzikashvili, 2006]
Topic DetectionInterests are modelled using dbpedia categories
4 million “things”, 3.2 million classified in ontology
!
!
!
!
Hierarchical, multi-class, multi-label
persons 832kplaces 640k
creative works 370korganizations 210k
species 226k
Topic Detection!
!
Hierarchical training [Dumais & Chen, 2009]
17k classes, 50k training samples, LibSVM
F1 on top-level 0.57, 2nd level 0.47
Cluster Analysis Basics
Let x1
, . . .xn denote the p-dimensional feature vectors of n objects:
Feature 1 Feature 2 . . . Feature p
x
1
x1
1
x1
2
. . . x1
p
x
2
x2
1
x2
2
. . . x2
p
...xn xn
1
xn2
. . . xnp
no Target concept
c1
c2
...cn
ML:XI-16 Cluster Analysis © STEIN 2002-2013
c1c2
c1
����
�������
��������������
� ������ � �������
���� �
������
MADANI, CONNOR AND GREINER
C1
C2
C3
f1
f2
f3
f4
w12
w13
Features Classes
Instances
x1x2x3x4x5
C2
C3
C1f1
f2
f3
f4
Classes Features
compute
Figure 2: A depiction of the problem: the input can be viewed as a tripartite graph, possiblyweighted, and perhaps only seen one instance at a time in an online manner. Our goal isto learn an accurate efficient index, that is, a sparse weighted bipartite graph that connectseach feature to zero or more classes, such that an adequate level of accuracy is achievedwhen the index is used for classification. The instances are ephemeral: they serve onlyas intermediaries in affecting the connections from features to classes. The index to learnis also equivalent to a sparse weight matrix (in which the entries are nonnegative in ourcurrent work) (see Sections 3.2 and 3.2.1).
3.1 The Level of Human Involvement in Teaching and Many-Class Learning
Learning under myriad-classes is not confined to a few text-classification problems. There are anumber of tasks that could be viewed as problems with many classes and, if effective many-classmethods are developed, such an interpretation can be quite useful. In terms of the sources of theclasses, we may roughly distinguish supervised learning problems along the following dimensions(the roles of the teacher):
1. The source that defines the classes of interest, that is, the space of the target classes to predict.
2. The source of supervisory feedback, that is, the source or the process that assigns to eachinstance one or more class labels, using the defined set of classes. This is necessary for theprocurement of training data, for supervised learning.
In many familiar cases, the classes are both human-defined and human-assigned. These includetypical text classification problems (e.g., see Lewis et al. 2004 and Open Directory Project or Yahoo!directories/topics). In many others, class assignment is achieved by some “natural” or indirectactivity, that is, the “labeling” process is not as explicit or controlled. The labeling is a by-productof an activity carried out for other purposes. One example of this case is data sets obtained fromnews groups postings (e.g., Lang, 1995). In this case, users post or reply to messages, withoutnecessarily verifying whether their message is topically relevant to the group. Another exampleproblem is predicting words using the context that the word appears (the words are the classes). Inthese problems, the set of the classes of interest may be viewed as human-defined, but the labelingis implicit (collections of written or spoken texts in the word prediction task). The extreme case
2578
Topic Detection
IR based approaches [Madani et al., 2009]
instead of documents, classes are indexed
weights of features are learned
70k instances, 17k classes, 0-1 loss 0.35
Training Data
We need training data for supervised machine learning
tasks
topics
Web-based training data collection
Training Data
Training Data
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Figure 6: screenshot of overlay with additional information for a resource
4.4.4 Adaptions for Test Data Acquisition
This section describes adaptions to the prototype that were performed to meet the test setting re-quirements as presented in section 4.2.2. Basically, they comprise an extension of the injected wid-get’s user interface to record the execution of tasks, functionality to prevent user interaction beforehaving started a task, changes to the query process and logging of additional information. An updateof the object stores’ structure was necessary alongside with these modi�cations.
UI controls for task detection For recording tasks, an additional tab was added to the injectedwidget’s menu. Figure 7 shows the contents of this additional tab. The task to perform needs to be
Figure 7: screenshot of task de�nition user interface
selected at the input �eld shown at 02 . The user can select one of the prede�ned tasks ("annotatea webpage" or "write a blog entry") or choose "other". Choosing the latter will prompt to specify acustom label for the task after its execution. When choosing the task "other", an additional checkbox04 is shown to indicate, whether recommendations are desirable for this task or not. The user canadjust his level of expertise on the task at hand with the slider at 05 . Possible values range from0 (lowest) to 10 (highest). The topics related to the speci�ed task are de�ned at 06 . This input�eld features auto-completion for dbpedia-categories. This means, the user gets suggested dbpedia-
c� EEXCESS consortium: all rights reserved 28
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 4: features collected with user tests along with respective methods
feature method im-/explicit
prede�ned task name UI-control [select �eld] Ecustom task name UI-control [input �eld] Etask start-time UI-control [button] Etask end-time UI-control [button] Elevel of expertise UI-control [slider (range 0-10)] Etopics relevant to task UI-control [input �eld] Einput language of topics UI-control [select �eld] Eindicator, if recommendations are desirable UI-control [checkbox] Esearch queries UI-control [input �eld] Erating of recommendations UI-control [button (good/bad)] Eassessment if recommendations & interfacewere helpful
question E
assessment of sensitivity level of particularpersonal information
question E
disclosure level of personal information(subject to the recommender’s quality)
question E
clicked recommendations implicit Iignored recommendations implicit Idwell time at recommendation preview implicit Imouse clicks (+target) implicit Itextual input implicit Ibrowsing history implicit Ibrowser pro�le (plugins, ...) implicit I
The content creation task is to write a blog entry about a given topic. The topics to write about aresemi-de�ned: They comprise an important historical event, a cultural sight of the user’s hometown (oranother town of her choice) and a person, who played a signi�cant role in history. The semi-de�nedtopics provide the ability for the user to choose a topic, she already has some knowledge about. Theusers are instructed to query for additional resources, with which they can enrich their blog post whilewriting it.The predetermined tasks alternate with tasks of the user’s own choice, i.e. a possible sequence is
(all tasks executed within the browser):
1. annotate a web page
2. read newspaper article
3. annotate a web page
4. write a blog entry
5. watch funny video clips
6. annotate a web page
7. engage in a forum discussion
c� EEXCESS consortium: all rights reserved 22
D5.1Usage Pattern and Context Detection Speci�cation and Analysis
Table 2: Dimensions and attributes of the user pro�le
attribute im-/explicit expectedimpact
learningcomplexity
vocabulary
Interesttopic I H M skosweight I H M wi,woKnowledgetopic I M H skosweight I M H wi,woDemographicsprofession E M H -education level E M H -institution E M H -�rst name E L H foaflast name E L H foafbirthday E L H gumobirthplace E L H gumoaddress E L L gumoSocial Connectionsconnections in socialnetworks
I M M sioc/foaf
- strong/weak ties- type of connection (groups)Resource Relationsresource I M M oa/foaf- timestamp- annotation- been recommendedBehavioural Patterns / Tasks / Goalstasks I H H tmo
expected impact - Low ( L ), Medium ( M ), High ( H )the expected in�uence of a particular attribute on recommendation quality
learning complexity - Low ( L ), Medium ( M ), High ( H )the expected complexity to learn a particular feature of the user pro�le automatically
im-/explicit - Implicit ( I ), Explicit ( E )whether the feature has to be given by the user explicitly or can be mined implicitly (the user may also changeimplicitly mined features manually)
c� EEXCESS consortium: all rights reserved 15
User Profile
Web-based User Profiles
Ch1: Visualization -> transparency
-> trust -> adaptability
Ch2: Learning -> avoid manual entry -> fine grained data
-> large-scale usage in applications
O3: Applications ? privacy ? trust
? accuracy
Ch 3: Definition - rich user models
- user context - time framing
A Note on Privacy
explicit user pro�le
anonymizeduser pro�le
privacy-preservingproxy
user pro�levisualization
federatedrecommender
life is easy
.. and hard
embedded learning
exchange of ML models
Web-based User Profiles
Ch1: Visualization -> transparency
-> trust -> adaptability
Ch2: Learning -> avoid manual entry -> fine grained data
-> large-scale usage in applications
O3: Applications ? privacy ? trust
? accuracy
Ch 3: Definition - rich user models
- user context - time framing
Recommender
Adaptive VisualizationPersonalized
Search