Roles in social networks: methodologies and research...

18
Roles in social networks: methodologies and research issues Mathilde Forestier * , Anna Stavrianou, Julien Velcin, and Djamel A. Zighed Laboratoire ERIC, Université Lumière Lyon 2, Université de Lyon, 5 avenue Pierre Mendes France, 69676 Bron Cedex, France E-mail: {mathilde.forestier, anna.stavrianou, julien.velcin, abdelkader.zighed}@univ-lyon2.fr Abstract. The expansion of web user roles is, nowadays, a fact due to the ability of users to interact, discuss, exchange ideas and opinions, and form social networks through the web. The interaction level among users leads to the appearance of several social roles which can be characterized as positions, behaviors, or virtual identities. These roles may be developed in social networks, and they keep changing and evolving over time. In this article, a survey of the state-of-the-art approaches is presented regarding the identification of roles within the context of a social network. It is shown that social roles exist as a function of each other; they appear and evolve through user interaction. Different approaches are analyzed and additional characteristics that should be taken into account during the role analysis are discussed. Keywords: social network, social role, online discussion 1. Introduction With the advent of Web 2.0, the users have become not only consumers of information but also produc- ers [2]. They interact with each other, they participate in online discussions, they exchange information and opinions, they form social networks. The level of in- teraction among the users defines social roles which can be characterized as positions, behaviors, or virtual identities. These roles may be developed in social net- works formed through email exchanges, discussions in forums or Usenet newsgroups, and they keep changing and evolving over time. Defining a social role depends on the analysis con- text. Researchers who analyze email exchanges within companies [48] see the role more as a position (man- ager, secretary, etc.). At the same time, the role of a person inside a web discussion is more similar to a vir- tual identity: is the person I am talking to an expert? And if so, what level of expertise does she have [76]? Roles may be pre-defined (popular user, expert, etc.) * Corresponding author. E-mail: [email protected] or their existence may result through the observation of patterns of interaction. In this paper, a survey of the state-of-the-art is pre- sented regarding the identification of such roles that users may have or obtain within the context of a social network. As it will be shown, social roles exist and de- velop through user interaction: a role exists as a func- tion of another social role. Identifying roles inside social networks is, nowa- days, significant. Knowing, for instance, who the ex- pert is in a technical forum facilitates the extraction of the most appropriate answer to a question. Further- more, the identification of the people whose role is to influence is important especially in the case of vi- ral marketing [17] (sometimes referred to as "word- of-mouth"), which is based on the diffusion of infor- mation through the links connecting people in a net- work. People who influence the decisions of a com- munity play a significant role in the approval or re- jection of preferences and tendencies. Thus, identifica- tion of such roles enables the understanding and bet- ter analysis of interactions within social communities. Additionally, the comprehension of new social rela- tions within virtual communities is allowed, since be-

Transcript of Roles in social networks: methodologies and research...

Page 1: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Roles in social networks: methodologies andresearch issuesMathilde Forestier ∗, Anna Stavrianou, Julien Velcin, and Djamel A. ZighedLaboratoire ERIC, Université Lumière Lyon 2, Université de Lyon,5 avenue Pierre Mendes France, 69676 Bron Cedex, FranceE-mail: {mathilde.forestier, anna.stavrianou, julien.velcin, abdelkader.zighed}@univ-lyon2.fr

Abstract. The expansion of web user roles is, nowadays, a fact due to the ability of users to interact, discuss, exchange ideas andopinions, and form social networks through the web. The interaction level among users leads to the appearance of several socialroles which can be characterized as positions, behaviors, or virtual identities. These roles may be developed in social networks,and they keep changing and evolving over time. In this article, a survey of the state-of-the-art approaches is presented regardingthe identification of roles within the context of a social network. It is shown that social roles exist as a function of each other;they appear and evolve through user interaction. Different approaches are analyzed and additional characteristics that should betaken into account during the role analysis are discussed.

Keywords: social network, social role, online discussion

1. Introduction

With the advent of Web 2.0, the users have becomenot only consumers of information but also produc-ers [2]. They interact with each other, they participatein online discussions, they exchange information andopinions, they form social networks. The level of in-teraction among the users defines social roles whichcan be characterized as positions, behaviors, or virtualidentities. These roles may be developed in social net-works formed through email exchanges, discussions inforums or Usenet newsgroups, and they keep changingand evolving over time.

Defining a social role depends on the analysis con-text. Researchers who analyze email exchanges withincompanies [48] see the role more as a position (man-ager, secretary, etc.). At the same time, the role of aperson inside a web discussion is more similar to a vir-tual identity: is the person I am talking to an expert?And if so, what level of expertise does she have [76]?Roles may be pre-defined (popular user, expert, etc.)

*Corresponding author. E-mail: [email protected]

or their existence may result through the observationof patterns of interaction.

In this paper, a survey of the state-of-the-art is pre-sented regarding the identification of such roles thatusers may have or obtain within the context of a socialnetwork. As it will be shown, social roles exist and de-velop through user interaction: a role exists as a func-tion of another social role.

Identifying roles inside social networks is, nowa-days, significant. Knowing, for instance, who the ex-pert is in a technical forum facilitates the extractionof the most appropriate answer to a question. Further-more, the identification of the people whose role isto influence is important especially in the case of vi-ral marketing [17] (sometimes referred to as "word-of-mouth"), which is based on the diffusion of infor-mation through the links connecting people in a net-work. People who influence the decisions of a com-munity play a significant role in the approval or re-jection of preferences and tendencies. Thus, identifica-tion of such roles enables the understanding and bet-ter analysis of interactions within social communities.Additionally, the comprehension of new social rela-tions within virtual communities is allowed, since be-

Page 2: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

haviors of users in a certain way reveal a role [72] andusers often guide their interaction (decision of whomto talk to) based on these identifications [69].

In this paper the different methodologies that havebeen proposed for the purpose of role identification in-side a social network are discussed. The paper beginsby defining the notion of roles inside a social network,in Section 2. In Section 3, existing approaches are pre-sented regarding the identification of non-predefinedroles, while in Section 4 approaches concerning prede-fined roles such as the role of an expert or an influencerare dealt with. Section 5 discusses existing issues andchallenges followed by the conclusion in Section 6.

2. Social networks and roles

A social network is often represented by a graphwhose nodes represent the actors of the network (peo-ple, organizations) and the links between the nodesshow relationships. The graph can be either directedor not according to the relations it represents (e.g.friendship, co-authoring, post-replying). Moreover, thenodes may have attributes that characterize them (e.g.name and sex of an actor) and the edges may beweighted. Figure 1 shows a simple undirected networkconsisting of four nodes.

Fig. 1. An example of a simple social network.

The social network of Figure 1 represents an emailexchange among four actors (John, Mary, Peter andSarah). The graph reveals that Mary, Peter and Sarahshare emails, while Mary exchanges emails with Johnwho does not discuss through mails with Peter orSarah.

Figure 2 shows a more complex social network ex-tracted from the analysis of a real forum [62]. The ac-

tors are people who participate in the discussion, andthe links between them represent a “reply” relation-ship (i.e. if B replies to A during the discussion, a di-rected link is created from B to A). Thus, the interac-tion among users is more comprehensible. It is evidentthat some individuals are not connected to the network.This is because nobody replies to them and they do notreply to anyone else. This kind of graph representationallows us to see the community as a whole and the in-teraction between the actors.

Fig. 2. A social network extracted from a forum.

Such networks are studied with techniques of SocialNetwork Analysis [58] in order to analyze the char-acteristics of the various actors, detect patterns andidentify existing communities. Based on graph theoryprinciples, several measures are used for analysis pur-poses exploiting mainly the link structure of the net-work. Among these measures, there exists the degreecentrality which is divided into two measures for thecase of directed graphs: the in-degree which points outthe number of incoming links for a node, and the out-degree which is the number of outgoing links.

A social network can be analyzed as a whole, in thesense that all of its actors are represented and patternsare attempted to be identified that lead to the presenceof communities. It may also be seen from the view-point of a projection of the network on a particular ac-tor having all the neighbors at a distance which is de-fined a priori. In the latter case, the network is calledegocentric social network [12], since the center of thestudy is the individual rather than the whole network.

Page 3: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

In the rest of the article the roles are divided intotwo categories: the non-explicit roles and the explicitroles. The non-explicit roles represent social roles notdefined a priori. On the other hand, explicit roles con-cern predefined types of actors such as “experts” or“influencers” or even certain roles found inside virtualweb communities.

Although there exists no real consensus on a pos-sibly “global” definition of a role, we can give somereferences on well-spread acceptations of this concept.

Early research [50], mainly in the field of sociology,define the notion of a role within a social structure.They well differentiate between the notions of ‘role’and ‘position’, something that has led to the motiva-tion of several works [73,74]. The notion of position isdeeply detailed by Borgatti and Everett [8].

In the following, we give the difference they makebetween a role and a position in a nutshell.

Definition 1 For an individual, the position is a“well-defined place” in a social structure. Position isusually related to some kind of similarity: attitudes,mental health, production of scientific knowledge etc.1

Two actors occupy the same position if they are con-nected to the rest of the network in the same way. Forinstance, ‘parent’ and ‘child’ are both eligible posi-tions.

Definition 2 In a social structure, a role is a setof expectations that are coupled to positions. For in-stance, the role ‘parent’ is associated to some expecta-tions of what parents should do. The role ‘child’ is an-other position with other expectations related to whatchildren should do.

Positions and roles form together a social systemthat generates social relations. The roles result in spe-cific behaviors and interactions which can be observed(e.g., giving order, sending emails, etc.). According tothese definitions, actors with similar roles will sharecommon features and common patterns of relations[57], even if they do not share any direct relationship.

As mentioned in [8], a society is a network amongindividuals, whereas a social structure is the underly-ing network describing the relation between positionsand roles.

1For a more complete list: see [8].

3. Non-explicit roles

This section presents the roles that emerge by takinginto consideration either only the network structure, orjust the content of the exchanges (e.g., emails, posts),or both the structure and the content. The main idea,here, is to use unsupervised Machine Learning tech-niques in order to group automatically the data into(non-predefined) role categories. In a Machine Learn-ing perspective, the data dealt with are relational andthey violate the classical independence or exchange-ability assumptions usually made such as the assump-tion that “the observations are dependent because ofthe way they are connected” [4].

Basically, clustering algorithms using the graphstructure [20,32] and/or the textual content of the ex-changed documents (e.g., emails, posts) can be used[7,64]. This means that no (or just little) backgroundknowledge is used in the definition of the roles. It is as-sumed that the roles will emerge from the regularitiesfound both in the structure of the human relations andin the features describing people, including also theproduction of textual contents (messages from forums,emails, etc.). For instance, the position of a “manager”in a company will be identified by the kind of vocab-ulary she uses in her emails depending on the positiontaken by the receiver, whether this is a “secretary” oranother “manager”. Within this context, it is evidentthat a specific behavior (e.g. sending an email withfrequently-appeared words such as “meeting”, “pre-pare”, “arrange”) consists of an observation that leadsto the identification of the role “manager-secretary”.

The following section is divided into two main ap-proaches: the first approach is based on the blockmod-els and uses mainly the structure of human relationsand some predefined features, while the second ap-proach is based on probabilistic bayesian models anduses both structure and (mainly) existing textual con-tent.

3.1. Identifying positions using blockmodels

Doreian et al. [20] provide a good overview onblockmodels in social networks. Blockmodeling is analgebraic framework that deals with various issues ofsocial networks such as the identification of commu-nities and their in-between relations, roles, etc. To thisextent, role detection, and the relations between them,can be seen as an application of this general frame-work. Blockmodeling focuses mainly on the networkstructure, but it can also deal with node attributes and

Page 4: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

multiple relations [32]. Handling multiple relations, inthis case, involves dealing with multigraphs.

The position can be either given by the analystor it may automatically be estimated in an unsuper-vised way. If it is predefined, based, for instance, ona node attribute (e.g., sex: male or female), the diffi-culty lies mainly in calculating the ties between thesepositions using the observed inter-individual relations.This means that the roles are estimated given the posi-tions. As a result, a measure should be used to quantifythe equivalence [8,73] between actors and the relationstrength between their positions. If the type of positionis not predefined, the algorithm must calculate homo-geneous clusters and inter-cluster ties in the meantime,that is positions and roles. In both cases, the notion ofequivalence between the graph nodes is crucial. Thissurvey focuses on the second case which seems to bemore appropriate for role detection.

Thus, a crucial point is to define a correct relationof equivalence depending on the task in hand. Differ-ent level of equivalences are proposed in the literature[8,73], such as structural, strong, regular, and automor-phic equivalence. The structural equivalence (referringhere to the distinction made in [8,24]), usually used insocial network analysis, leads to placing people intogroups sharing similar interests. The notion of similar-ity used here relies only on sharing the same neighbor-hood (we may refer to the motto: “friends of my friendare also my friends”). It leads to the construction ofcommunities of “similar” actors, who share commoncharacteristics such as practising hobbies, appreciatingsimilar movies, playing games.

Moreover, it is quite related to the task of extractingcliques (sets of highly connected nodes) from graphs[21]. The structural equivalence is too restrictive tocapture the abstract notion of roles. On the contrary,the regular and the automorphic equivalences try tocapture the sociological notion of a relational role. Inthis case, two actors are considered to have the samerole if they are linked to people with similar roles, al-lowing, in this way, two equivalent actors to be placedin completely different parts of the network. This leadsnaturally to form groups associated to specific roles,such as ‘parent’, ‘child’, ‘manager’, ‘secretary’, etc.Contrary to the structural equivalence, the underlyingclustering principle that keeps groups together is cohe-sion or proximity instead of similarity [8].

Methodologies A blockmodel is a smaller compre-hensible (graph) structure coded by a square binarymatrix 0/1. This matrix, also called the image, is asso-

Fig. 3. The process of blockmodeling.

ciated to a type of relation (friendship, work relation-ship, etc). Several blockmodels can be built to explainthe activity of a complex social network based on re-lational data. The goal of blockmodeling is to reducea large, complex relational network into one or severalimages. Mathematically speaking, the objective is tobuild an homomorphism corresponding to the chosenequivalence (e.g. the regular equivalence).

To give the reader a better insight, Figure 3 showsthe process from the original social network composedof 8 people (i) to the representation of the interactionsbetween roles (v). The 0/1 matrix (ii) is rearranged bypermuting both rows and columns in such a way thatdata are more interpretable. Depending on the contextand the application task, the permutation can be per-formed either under supervision (e.g., depending onthe attribute values) or in a completely unsupervisedway.

The rearranged matrix (iii) exhibits four main blocks:three zeroblocks and a block containing some 1’s. Us-ing predefined criteria related to the equivalence no-tion to be used, it leads to the block model (iv) show-ing two positions A and B that are related as: A → B.This is the case, for instance, with the roles of “father”and “child”, illustrated in the “role network” (v).

Page 5: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

No matter whether the positions are given or not, themain task is to enumerate the role graphs (the “imagegraphs”) and to test them on raw data. To this extent,the atomic operation is to compare them using an ap-propriate “fitness” measure [35]. This measure eval-uates the closeness between the ideal image and thetested current image. Another possible method is touse local optimization heuristics and meta-heuristics.

The algorithm CONCOR [10] builds a hierarchicaltree (a dendrogram) by iteratively split the data intotwo blocks. Instead of using the structural equivalence,REGE [9] uses the regular equivalence to build block-models. In the end, each original node is associatedto exactly one role in the image graph. In these earlyworks, some stochastic models have already been pro-posed [25,38]. However, they mainly focus on struc-tural equivalence and they assume that a partition ofpeople is already known. This partition can be givenusing the attributes that characterize the network nodes(for instance, the social category).

Fienberg and Wasserman [25] developed a proba-bilistic model for structural equivalence of actors in anetwork, under which the probabilities of relationshipswith all other actors are the same for all actors in thesame class. This can be viewed as a stochastic ver-sion of a block model. It can represent clustering, butonly (again) when the cluster memberships are known.Wasserman and Anderson [71] as well as Snijders andNowicki [60] extended these models to latent classes;the difference is that these latent class models do notassume cluster memberships to be known, but insteadestimate them from the data.

Holland and Leinhardt [38] propose a probabilis-tic model p1 to analyze social networks. This modeldoes not explicitly take into account the blockmodeleffect, defining implicitly a kind of “simple” block-model [70]. Their assumptions correspond simply toa stochastic version of the notion of structural equiva-lence. Wang and Wong [70] extend the model p1 withthe block structure and propose a stochastic block-model. This model uses both the information of intra-node attributes and inter-nodes relations. The modelparameters are classically estimated by the Maximumlikelihood principle. The authors apply their model tothe two datasets taken from the experiments of Hansell[33] and Sampson [56].

Handcock et al. propose a new latent position clustermodel (LPCM) [32]. Contrary to previous approaches,they integrate into a probabilistic model the inter-individual distance. In this way, they can take into ac-count the graph transitivity, which shows that actors

tend to relate to each other when they share commonattributes (e.g. age, gender, geography, race). The au-thors propose a bayesian approach to estimate the bestnumber of clusters. To validate their model, they use afriendship network among a group of 69 students takenfrom [34,67].

Wolfe and Jensen [75] extend previous works onstochastic blockmodels [60,70] by allowing multipleroles. They propose a probabilistic model where an in-teraction (say, an edge of the graph) is generated byone role each time. Airoldi et al. [4] improve the pre-vious latent stochastic blockmodel in such a way thateach object can belong more or less to several clusters.This entails that an individual can play several latentroles at the same time, which is quite similar but notthe same as in [75].

The authors propose the mixed membership stochas-tic blockmodel (MMB) in order to adapt the traditionalmodels to social networks. In this proposition, eachobservation is associated to a membership vector thatrelates it to the different clusters, similarly to the clus-tering approaches based on mixture models. It allowscapturing various aspects of the documents, such asthe underlying topics. They use variational inferenceto estimate the parameters of the model.

Fu et al. [27] take into account the natural evolu-tion of networks when proposing the new dynamicmixed membership stochastic blockmodel (dMMSB).This very important characteristic was never really in-cluded into the previous models, at least using block-modeling. This means that the actors can play variousroles through time depending on different varying rea-sons. The roles themselves can evolve. The model pro-posed by the authors is based on the previous work ondynamic topic modeling [6] and static model that cap-ture role correlations [4]. More precisely, it augmentsthe MMSB model with a state space model similar tothat used in Dynamic Topic Models. This new proba-bilistic model is able to track across time the evolvingroles of the actors.

3.2. Estimating roles using probabilistic models ontextual content

Another modern approach to role analysis uses un-supervised hierarchical bayesian models, mainly ontextual datasets. The works presented in this sectioncan be seen as the convergence between bayesianand social networks. The authors argue that the rela-tional structure is not enough when analyzing textualdatasets, such as emails, blogs, scientific papers. The

Page 6: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

idea is, in this case, to use the textual content associ-ated to the graph edges (email, post, message) in ad-dition to the relational structure. It relies on previousworks on topic models that extract topics from texts as-suming a probabilistic generative model and associatea mixture of topics for each text [7]. In this context, atopic z is defined as a multinomial distribution p(w/z)over a given vocabulary (for instance, a list of wordsw).

In this theoretical framework, several models havebeen proposed using implicitly or explicitly the no-tion of role. Following the work of [64] with the Au-thor Topic model (AT), McCallum et al. [48] pro-pose three bayesian hierarchical models to deal withroles with email datasets. The Author Recipient Topicmodel (ART) is a directed graphical model of words ina message generated given their author and a set of re-cipients. In this model, the role is implicit because therole lies in the set of topics (in other words, the kindof vocabulary) associated to each tuple (author, recip-ient). The authors in [48] improve their model by twovariant models named Role Author Recipient Topicmodels (RART1 and RART2). In these two models,the role is explicitly modeled in the bayesian networkas a latent random variable. A role is therefore a topicmixture characterizing the relation of two persons (thatis, the author and the recipient).

Recent work introduces the task of Conference Min-ing [14]. In this work, the authors focus on the spe-cific role of experts or mavens (they use both terms in-terchangeably). Roughly speaking, additional dimen-sions are added such as the time and the identity of thesource in order to provide a global analysis of topicaltrends in the scientific literature. The idea behind thisis to consider the semantics-based intrinsic structureof the words present between conferences, following atopic-model point of view.

Methodologies Figure 4 shows the bayesian gen-erative models that underlie the Author RecipientTopic (ART) model and the (first) variant Role ART(RART1).

These graphical models try to capture the generativeprocess that builds the dataset. Whereas the Author-Topic model estimates a mixture θ of topics z foreach author a ∈ A individually, the ART model asso-ciates each pair author-recipient (a, a) to a mixture θ.In other words, ART conditions the per-message topicdistribution jointly on both the author and individualrecipients. This permits to describe the relation be-tween people using this mixture of topics. An estima-

Fig. 4. The models ART (left) and RART1 (right) proposed in [48].

tion procedure similar to a clustering process permitsthen to discover people’s implicit roles based on thisrelation. The two variants of RART are both an exten-sion of ART with an additional level of latent variables.These additional latent variables g (author’s role) andh (recipient’s role) represent roles associated to peoplein the network and estimate explicitly the distributionsp(r/a), r ∈ {g, h}. A person a can have multiple rolesr simultaneously, and there she is associated to a mix-ture ψ over roles. The generation of words w does notdepend on the authors, but rather on their roles.

Figure 5 shows the output of the learning processbased on a RART-based model. Given a predefinednumber k of roles (here, 3 roles) and the textual ex-changes, such as emails, we obtain a matrix k × k oftopic mixtures. Each element of this matrix θi,j is amixture of topics. θi,j can be seen as the kind of sub-jects a person playing the role ri uses when she talkswith a person playing the role rj . In this model an au-thor can play several roles, depending on the recipi-ent she is writing to. This is the reason why the au-thor a is associated to a mixture ψ over roles. Figure 6illustrates the multinomial distribution associated to aspecified person with 3 roles.

Fig. 5. The role network estimated by a RART-based model.

Learning this kind of probabilistic models meansestimating the (often numerous) model parameters.

Page 7: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Fig. 6. The distribution of the author a over the 3 rolesR = {r1, r2, r3}.

There are two main approaches to estimate these pa-rameters, which are related to the classical inferenceproblem in bayesian networks. The first one is the ap-proximate inference with variational methods, used in[6,7]. The second one is based on Monte-Carlo simu-lation methods, such as Gibb’s sampling [64].

The authors evaluate their models with both the En-ron email corpus and the email personal correspon-dence of McCallum. The evaluation is based on a qual-itative evaluation and on a quantitative measure thatestimates the predictive power of the models. For in-stance, the predictive power of such models can beestimated using the perplexity measure based on themaximum likelihood principle.

In the task of Conference Mining, Daud et al. [13]propose an original model for discovering the la-tent topics between the authors, venues (conferencesor journals) and time simultaneously. The authorscall their model STMS as in “semantics and tem-poral information-based maven search”. Discoveringmavens, i.e. people with a given expertise, is just aby-product of this model. According to some query qcomposed from a limited number of words w defin-ing the area of expertise, the authors a are rankedby their probability values p(a/q). By the Bayes ruleand some usual assumptions, p(a/q) is proportional top(q/a) which is equal to

∏w∈q p(w/a). The point is

that the probability p(w/a) is calculated on the topicbasis described by the generative model: p(w/a) =∑

z p(w/z)p(z/a). These quantities p(w/z) (distribu-tion of the words w given the topic z) and p(z/a) (dis-tribution of the topics z given the author a) are usualquantities already present in previous models, such asthe LDA [7] and AT-based models [64].

3.3. Summary

Traditional blockmodels are more related to thegraph theory. They are maybe more adapted to dealwith social networks. They use mainly the relationalstructure, but ignore the exchanged message content.

The evolution of such graph-based models towardsprobabilistic models is a first step to use more localinformation. Topic-based bayesian models use actu-ally this crucial textual information, but they lack theglobal view of blockmodels. The convergence of bothmodels is considered a challenge nowadays.

4. Explicit roles

In some cases, the role is already predefined and itsidentification inside a social network regards the detec-tion of certain criteria that are satisfied by some users.This section deals with two roles that are given greatattention to in the existing literature: the experts andthe influencers (or influentials). In addition other ex-plicit roles are discussed that may exist inside onlinediscussion groups.

4.1. Identifying experts

Several works deal with the identification of expertsinside a social network. These works use various met-rics in order to identify expertise and they define theexperts as follows:

Definition 3 An expert is a person who has knowledgeabout a topic discussed inside a social network, and,as such, her opinions and ideas can be trusted.

Identification of expertise has appeared early in lit-erature independently of the existence of a social net-work [49,65]. TREC 2006 [61] also proposed an ex-pert finding task, where most participants used In-formation Retrieval techniques to identify experts. Inthis article, the focus is on the identification of ex-pertise within social networks and especially withinonline communities, since the experts are the peo-ple to whom the other social network members willgo to in order to seek advice or help. The needfor experts is often seen in forums dealing withtechnical or even health issues. Examples includethe Microsoft Answers (http://answers.microsoft.com)or the Technology Network Community of Oracle(http://forums.oracle.com/).

Within such online communities, the postings arequestions or answers on a certain subject. Knowing theexperts, facilitates the identification of the answers thatare more likely to be correct and/or complete. More-over, differentiating the quality replies amongst hun-dreds of other postings allows a reader to quickly findout the posting worth being read.

Page 8: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Methodologies Forum question-answering is in thecenter of the study in [1]. The authors study the exper-tise across various topics, by identifying patterns andbehaviors regarding how people question and answerthrough postings inside a forum. They point out thatlinks in such a social network show more the topicsthat a user is interested in rather than her expertise.

Techniques for the identification of expertise insideforums are proposed in [76]. The post-reply commu-nity analyzed is the Java Forum that is represented as adirected graph where the edges show a “reply” relationbetween two users (the nodes). The experts are consid-ered to be the actors who can answer appropriately aquestion and the measures used to rank these expertsare the following:

1. The “outdegree” which designates how manyreplies a user sends,

2. the “indegree” which shows to how many differ-ent people a user answers,

3. the variation between asking and replying (forthe same user) measured by the z-score,

4. a PageRank-based measure which takes into ac-count the person to whom the answer is sent to.For example, a user who replies is generally con-sidered to be more of an expert than the user whoreceives the reply. Moreover a person who repliesto an expert becomes an expert herself.

Applying these measures, showed that the structureof the network leads to different expertise ranking re-sults.

Expertise is usually related to a topic, even if somenetwork actors may be experts in multiple topics. Iden-tifying experts on a specific topic is dealt with in [5] byconstructing topical profiles showing the probability ofa person being an expert of a particular skill. This prob-ability is calculated by using similarity vector-basedtechniques. Each skill is characterized by a vector ofkeywords. Each person is considered to be relevant(or not relevant) to a skill according to whether sheis present (or absent) inside documents related to thatspecific skill. The presence may be pointed out by hav-ing been mentioned inside or having authored a docu-ment.

In [16] the network actors are the ones who ex-change emails and the objective of the work is to ex-tract the experts on a certain subject that is discussedthrough the emails. The authors deal with the relativerather than the absolute expertise of two individuals,pointing out which one of the two people has moreknowledge than the other on a certain topic. They use

PageRank [52] or HITS-based measures [45] and theyconclude that PageRank outperforms all other algo-rithms.

4.2. Identifying influencers

Another role whose identification inside a socialnetwork has received a lot of attention is that of an in-fluencer:

Definition 4 An influencer (or influential) is a per-son who has the ability to influence the decisions orthoughts of other people inside a social network.

The influencers are the right people to market to[17], the "market-movers" [3]. They are the ones thatcan accelerate the diffusion of innovation whether thisinvolves the launching of new products or novel mar-keting, social, and political ideas. Knowing the influ-encers can lead to the reduction of the lag betweenknowing (be informed) and doing (accept and apply anew idea), and, thus, the spread of new ideas becomesquicker and more efficient [68].

In the case of blogs, which are regarded nowadaysas a major way of spreading information, identifyinginfluencers can lead to the extraction of the most rep-resentative blog posts of a blog site. Additionally, areader of a community blog (one where many authorscontribute to its content) may give priority to the postswritten by the influencers, instead of reading all posts[2,3].

The influence is often measured by the number ofpeople being influenced. The influenced people areusually linked to the ones who influence through a re-lation such as friendship, collaboration, etc. Influencemay diminish or increase over time, and, thus, Agar-wal et al. [3] define four types of influencers accordingto the temporal length of their influence (long-term,avg-term, transient and burgeoning influencers).

Methodologies The identification of influencers insocial networks is dealt with in several works, takinginto consideration various parameters based mainly onthe behavior of users inside the network. Influencersmay be identified inside community blog sites [3] oreven in collaborative systems via implicit influence be-tween users [18,36].

Domingos [17] points out that influence is asymmet-ric. This means that a person may influence more thanbeing influenced, something that may give her priorityin the influence ranking list over others. In the same

Page 9: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

work, it is noted that the network value of a persondepends on the network value of the people she influ-ences. For example, a person who influences peoplethat can, in turn, influence others, is considered to havea high influence power. Thus, the importance of an in-fluencer is also based on the indirect influence overpeople [18,55].

Several parameters are considered in the existing lit-erature for the identification of influencers. In [3], acommunity blog influencer is identified by:

– the degree of recognition by others, measured bythe number of inlinks towards this post,

– the user activity specified by the number of postedcomments,

– the novelty of ideas measured by the number ofoutlinks and

– the length of the posts.

The degree centrality of users in a network (i.e. howpopular they are) as well as their activity history (thenumber of groups they participate in, average numberof updated content per day, etc.) is also used in [44] inorder to identify influencers.

Furthermore, the propagation of trust and credibilityinside communities [47,51,77] is considered to be re-lated with the ability of someone being an influencer.In [68] the adoption of a new idea is said to be in-fluenced by the direct ties of an individual inside thesocial network and, moreover, her position in the net-work (structure/hierarchy). The opinion leaders (i.e.influencers) are identified by extracting the most pop-ular actors from the network (e.g. the ones whom peo-ple seek advice from) and matching them against themembers who are closest to them (e.g. if B goes to takeadvice from A, and C from B, then, based on transitiv-ity, consider that C gets advice from A).

Rohan et al. [66] focus on the identification of in-fluencers inside communities with the purpose of plac-ing advertisements to their profiles. According to thiswork, the criteria that can characterize someone as be-ing an influencer of a social network community are:

– The popularity of the network actor within thecommunity,

– the number of friends,– the group membership,– the number of user interactions,– the quality of content in the user profile,– the common interests with the other community

members (based on user-profile),

– whether a dynamic changing of the size of thecommunity is involved and

– the activity inside a user-profile.

The users are ranked according to influence with theapplication of a PageRank-based algorithm. The influ-ence is measured according to the weight of the cluster(declared friendships in cluster/total friendships in andout of cluster). The change that happens in the weightof the cluster in case a member is removed reveals theinfluence. The higher the change is, the higher the in-fluence.

Influence and communities is also discussed in [59],where the authors focus on influence maximization.They propose the identification of events such as buy-ing a product or adopting a new idea, since such eventshave the ability to influence neighbor user-nodes.

Agarwal et al. [2] differentiate between identifica-tion and propagation of influence. They point out thatthe location of a node in a social network reveals howinfluence can spread rather than which node has theability to influence. As a result, it could be evident toidentify nodes that could influence but without beingable to show if they will definitely influence. On thecontrary, influencers are easier to be identified throughblogs since there is user-produced content such ascomments.

Based on the different methodologies proposed, inthe existing literature, the researchers interested in theidentification of expertise or influence should take intoaccount the properties that appear in Table 1.

4.3. Identifying social roles in online discussiongroups

Online discussions include forums, blogs, emails,etc. Although they have different formats, they sharethe characteristic that people interact through them byusing a virtual structure, allowing to measure their par-ticipation and behavior.

Sociologists have studied the different kinds of rolesthat may appear. Golder and Donath [30] carried outan ethnographic study of the behavior and social roleson the Internet. They have proposed a typology of dif-ferent social roles in virtual discussions. This typologyis described in Table 2.

The authors defined a general typology of socialroles in online communities regardless of context (i.e.the type of forum discussed: politics, help, question /answer etc.).

According to Welser et al. [72], each social role hasa certain “signature” that can be understood as the set

Page 10: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Table 1Social Network Properties that should be considered by the re-searchers during expert/influencer identification.

Property Explanation Measure

Activity of user in the network Whether the user is active e.g. within a com-munity blog.

Updated content, number of groups in whichpeople participate, number of postings sent,community leaders (e.g. the ones who havecreated the community), maximization of thenumber of communities influenced [59].

Content of postings Content needs to be of high quality and rele-vant to the interests of a community.

Keyword extraction [22]. Sometimes thelength of the post shows quality [2,3].

Expertise (Applies only to influ-encers’identification)

People who have knowledge, influencequicker since they can be trusted / people askoften advice from them.

This is something gained and recognizedby others. Could be found through expert-finding methods, taking into account the net-work structure and the activity /interactionsof each user inside the network.

Influence Ability to influence those who can also influ-ence in their turn [17] / degree of influenceversus being influenced.

Reach [2] (how other members of the net-work can be reached).

Network structure Not all links have the same importance /weight.

Direct ties inside the network, connectivity,popularity, position inside the network, hier-archy of network [68], degree, closeness, be-tweenness, clustering coefficient, etc., rela-tion with others (friends, family)

Novelty of ideas Presence of ideas/opinions not discussed be-fore.

Radiality [2] e.g. to how many posts does thespecific posting refer to?

Recognition by others Is a certain posting referenced by others?(inlinks)[2]

count citations (of blog posts for example),have these people been quoted and by howmany (quality) posts/articles/blogs?

Trust and Credibility Does the information come from someonefamiliar? Is this person generally trusted in-side the community? Do her comments arequality ones? Has she always been trustedand being reliable in the past?

Quality of response [2], past experience (rel-ative to the length of being present inside anetwork - people new to a network are usu-ally trusted less) [2]

of behavioral and structural patterns of people’s partic-ipation. They identify and study two social roles: theanswer people and the discussion people that appearin newsgroups. The role of answer people refers to therole of replying to a thread initiated by others while re-plying to different people as well (answer people donot usually reply several times to the same person). Onthe contrary, the discussion people belong to a verydense egocentric network and reply to threads initiatedeither by themselves or others. These social roles areanalyzed by several measures:

– Authorlines represents the volume of contributionfor a single actor across all the weeks of a givenyear. Futhermore, Authorlines differentiates thethread initiated by the given actor from those oneswhich are not [69],

– local network neighborhoods represent the egonetwork for each conversation participant

– Distribution of Neighbor’s Degree is a histogramwhich shows the distribution of the neighbors’ de-gree for each actor,

– coding behavior from Message Content: ques-tions, answers, answering related behavior anddiscussions.

.Fisher et al. [26] argue that social roles may depend

on the context. Indeed, the social behavior can be dif-ferent if a user posts inside a help forum as opposed toa flame forum. Based on this, the authors focus on in-dividual and collective behavior inside different news-groups. They construct a second degree egocentric net-work of the online discussion in order to explore the in-dividual behavior and they calculate the distribution ofeach neighbor’s out-degree distribution (the in-degreeis the number of actors who replied to an actor and theout-degree is the number of persons to whom an actorhas replied). This measure explains how an author re-

Page 11: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Table 2Typology of social roles [30].

Social role description

celebrity The prototypical central figure, prolificposters who spend a great deal of time andenergy contributing to their newsgroup’s

community.newbie (new

user)Little communicative competence and

maybe few similarities with the rest of thegroup.

lurker Reader of the newsgroup’s conversationsbut without participating.

flamer Key behavior strategy is the intimidationthrough very aggressive language, yelling

and controversial speech.troll Pretending to be someone else and makes

others believe so.ranter Posting with a high frequency and can be

confound with a celebrity, with thedifference that a ranter does not participate

in conversation threads not initiated byherself.

sponds to an actor who is poorly or well-connected toothers. In a way it shows whether the specific authortalks to people who reply to a lot of other people. Inorder to identify these roles they use mainly the in- andout-degree statistics and the degree distribution coeffi-cient.

Fisher et al. conclude that in a discussion news-group, people make a reputation with high participa-tion while in a question/answer group people maketheir reputation by sending more answers (rather thanquestions).

In [37], Himelboim et al. study social roles in po-litical discussions and define the social role of a dis-cussion catalyst. This social role refers to an individ-ual who influences the information that enters a news-group and affects the discussion evolution within it.They evaluate the posting behavior by three measures:

– the reply share: the proportion of replies in thethreads initiated by an author to the total numberof replies in the newsgroup,

– the replier share: the proportion of the newsgroupauthors who post messages in an author’s threadto the total number of newsgroup participants,

– the success ratio: the proportion of threads an au-thor initiates which have received replies from atleast two other authors.

A discussion catalyst has a high reply share, repliershare and success ratio.

Always in political discussions, Kelly et al. [42] ex-plore three social roles. The fighters, the friendliesand the fringe. The role of fighters represents the greatmajority of actors who are the ones that respond moreoften to opponents rather than to allies. The role offriendlies refers to a smaller group of actors who re-spond to allies more often than to opponents. And, fi-nally, the fringe represents a marginal group that raisesinteresting questions for qualitative study.

In order to identify these three social roles, Kellyet al. analyse political newsgroups and they focus onthe in-degree and out-degree egocentric networks witheach node containing the actor’s political affiliation.

Viegas and Smith [69] propose a new interfacenamed "Newsgroup Crowd" in order to automaticallyvisualize which actor is important and which oneseems less important inside newsgroups. Their studyinvolves two levels of the concept of social roles: thesocial roles within the newsgroup and the social rolesacross the newsgroup. They identify the importance ofactors by evaluating the following characteristics:

– the number of days during which an author hasbeen active for a certain time period,

– the author’s average number of postings perthread in a newsgroup,

– how recently an author has been active in thenewsgroup and her overall posting activity in theUsenet as a whole,

– the author’s number of postings during a certaintime-period for a particular newsgroup,

– the author’s total number of postings in the wholeset of Usenet newsgroups,

– the first and the last day that the author was seenin a specific newsgroup,

– the top five newsgroups where an author has beenactive,

– Authorlines.

Based on these characteristics, the typology of rolesis presented in Table 3 for the authors that participatein newsgroups, as well as across newsgroups (secondpart of table). The measures used by the authors to ex-tract social roles from online discussions, are summa-rized in Table 4.

5. Discussion

The aforementioned methodologies are summarizedin Tables 5 and 6, for non-explicit and explicit rolesrespectively. Table 5 points out the name of the model

Page 12: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Table 4Properties that should be considered for social role identificationfrom online discussions.

Property Explanation Measure

Egocentric network It helps to understand the place of the indi-vidual in relation to its neighbors. It is a moreaccurate view of the social network based onthe individual.

Degree, Distribution of Neighbor’s degree,etc.

Network structure The place and importance of the individualin the social network.

Link structure: in-degree and out-degree.

Content of postings The kind of posting (i.e. question post, an-swer post etc.).

Some authors have manually categorized thepost content to categorize actors.

Thread analysis Reconsider the actor participation in the con-text (i.e. the thread).

Reply share, replier share, success ration,AuthorLines.

Activity of the poster in the discussion It measures the actor activity in the discus-sion and in the group.

Actor’s average number of posts per threadin a newsgroup, number of posts in a news-group, number of posts in Usenet, number ofactive days, etc.

Table 3

Typology of roles for the authors that participate in newsgroups [69].

Social role description

answer person(or “pollinator”)

High number of active days, and a lowpostings per thread ratio.

debater High number of active days, and a very highpostings per thread ratio.

’bursty’contributors

Low number of active days, moderate tohigh postings per thread ratio.

newcomers andquestion askers

Low number of active days, and a lowpostings per thread ratio.

answer person(or “pollinator”)

High number of active days while he mostlyresponds to threads started by other authors

with one or few messages sent to eachthread

debater High number of active days while he mostlyresponds to threads started by other authorsby sending a large number of messages per

threadspammer-like

behaviorHigh number of days active, almost entirely

initiate threads which then receive nofollow-up messages from this author

balanced con-versationalist

Initiates about as many threads as he repliesto and has the same postings per thread ratioon both initiated and non-initiated threads.

proposed by each author and specifies whether the ap-proach is probabilistic, focused on extracting one orseveral roles and whether it takes into consideration thestructure, the content and the time. The last column isused in order to add further information on the partic-ular model. Table 6 concentrates on explicit roles anddistinguishes between content analysis and use of userbehavior in order to identify a role. The terms proba-bilistic and stochastic are used interchangeably. Even

if their meaning is not equal, their usage is quite equiv-alent in our context.

Based on these approaches, it is evident that identi-fying roles inside social networks is a research issue,still presenting several challenges. The complexity ofthe human behavior on the Web and the human re-actions and interactions within online discussions thatform social networks make the task of extracting andidentifying social roles difficult to achieve. However,patterns characterizing each type of role can be identi-fied among individuals with a certain type of behavior.In this section, we discuss the existing approaches andwe present issues related to the extraction of roles insocial networks.

Social roles through interaction. One important no-tion is that a social role can mainly be identifiedthrough the interactions among people [29]. A personhas a role in relation to something or someone. Eventhough, some approaches may use additional, a prioriinformation (e.g. sex, postings, etc.), most of the ap-proaches focus on the social network links which rep-resent the interactions between individuals (email ex-changes, postings in forums). In this way, the socialrole of an actor is analyzed only relatively to the otherroles i.e. the expert of a network is automatically moreexpert than the rest of the network actors.

Moreover, interactions include communication codes.For instance, people adapt their vocabulary dependingon whom they talk to, and they do not speak with theirsupervisor as they speak with their colleagues. Thesame also stands for posts sent during an online con-versation. In this context, Donath [19], specifies thatthrough the text of an author, it is possible to see how

Page 13: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Table 5Summary of the non-explicit role identification approaches pointing out the properties on which method is focused on. B stands for genericBlockmodel. sB stands for stochastic Blockmodel.

Ref Authors Year Model Structure Content Probabi--listic

Severalroles

Temporalroles

Comment

[46,35,74]

various authors 1971+ B x Among the first works dedi-cated to blockmodeling.

[38] Holland, Lein-hardt

1981 p1 x x Extends the classical block-models to a probabilisticframework.

[71] Wasserman, An-derson

1987 sB x x Extends the model p1 to in-clude latent classes.

[70] Wang, Wong 1987 sB x x x Takes into account the at-tribute’s values.

[24] Faust 1988 B x Comparison of several meth-ods for traditional blockmod-eling (structural and generalequivalence).

[60] Snijders andNowicki

1997 sB x x An alternative model for [71].

[75] Wolfe and Jensen 2004 sB x x x Each individual can play sev-eral roles.

[48] McCallum et al. 2005 ART,RART1,RART2

x x x x Topic models based on bothtextual content and structure,that embeds the notion of role.

[32] Handcock et al. 2007 LPCM x x x Takes into account the transi-tivity within clusters and thehomophily on attributes.

[4] Airoldi et al. 2008 MMB x x x This work can be viewed asa first attempt to merge topicmodels and block models.

[13,14] Daud et al. 2009 STMS x x x x Topic models for conferencemining.

[27] Fu et al. 2009 dMMSB x x x x Extends the MMB model totake the temporality of the datainto account.

she interacts within an online environment. A com-munity shares some linguistic codes difficult to under-stand for a newcomer. For example, some abbrevia-tions are general to the whole community of writersand others are more specific to a group. These codesallow individuals to recognize each other inside thecommunity and protect themselves from external at-tacks (troll, flamer).

Therefore, it is important to take into account theinteraction content for the social role extraction. Thisinteraction content is used in some probabilistic andblock models but not, yet, for the social role extractionon web discussions.

Text content. At the moment, the identification ofexperts and influencers inside online communities ismainly link- or activity-based. The semantic presenceis not really taken into account. However, the content

of the posts sent can reveal a lot of information re-garding the role of the respective authors. Text Min-ing [23,41,63] as well as Opinion Mining techniques[15,28,40] may be applied in order to identify patterns,topic and opinion evolutions. The quality of the posts[22,39] may reveal the experts and the opinion evolu-tion inside the whole network may facilitate the iden-tification of the social network actors who influence.

One or several social roles? As aforementioned,some approaches aim at extracting one social role foreach actor, while others assume several roles per ac-tor. This depends on the type of role to be extracted.Indeed, probabilistic models and blockmodels attemptto identify the social roles as positions e.g. in a com-pany, people can be supervising while being super-vised, thus, an actor can effectively play several roles.On the other hand, during a web conversation (e.g.

Page 14: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Table 6Summary of the explicit role identification approaches distinguishing between bahavior or content-oriented methods.

Ref Authors Year SNA Participation Be-havior

ContentAnalysis

Social Role

[16] Dom et al. 2003 x Expert identification.[5] Balog and De Rijke 2007 x Expert identification.[76] Zhang et al. 2007 x x Expert ranking.[1] Adamic et al. 2008 x x Expert identification.

[68] Valente et al. 1999 x Influencer.[17] Domingos 2005 x Influencer.[59] Scripps et al. 2007 x Influencer.[3] Agarwal et al. 2008 x Influencer.[66] Rohan et al. 2008 x x Influencer.

[30] Golder and Donath 2004 x x x Celebrity, Newbie, Lurker,Flamer, Troll and Ranter.

[69] Viegas and Smith 2004 x x Answer person, Debater,‘Bursty’ contributor, New-comers or Question asker,Spammer-like behavior andbalanced conversionalist.

[26] Fisher et al. 2006 x x Questioner and Replier person.[42] Kelly et al. 2006 x x Friends, Foes and Fringes in

political Discussion .[72] Welser et al. 2007 x x x Answer and Discussion person.[37] Himelboim et al. 2009 x x Discussion Catalysts.

a forum), the social role can be seen as a reputation[19,54]. Therefore, individuals have only one socialrole defined by their participation in the relevant dis-cussions. For example, in [76] the identification of theexpertise level per actor leads in a ranking of experts.The same actor cannot be an expert and a non-expert atthe same time, even though the same actor may changeroles over time. At each time instance, the role is onlyone per actor.

Temporal dimension of social roles. A role may bedynamic since it may change over time. This has tobe taken into account during the role identificationprocess. For instance, an influencer on a domain maystop influencing after a certain time period. Similarly,the identified expert of a technical network may beranked lower when another expert becomes memberof the particular network. As a result, topic and tem-poral criteria need to be incorporated into proposedapproaches. Unlike work in communities [31,53], theevolution of social roles over time is fairly reflectedin the articles cited. Although Fu et al. [27] raise thequestion of the temporal evolution of social roles, itseems evident that people do not have the same socialrole over time. For example, it could be interesting toanalyze the expertise level of a Java forum participant

who begins as non-experienced and gradually becomesJava professional or expert.

Intra- and inter-community roles. It is worth notingthat apart from the roles described in this article, thereexist roles that are defined by the structure of the com-munities where the actor belongs to [11,59]. For exam-ple, Scripps et al. [59] emphasize the position of a nodewithin the community structure of the network. As aconsequence, the role of a user depends not only on itsbehavior towards his neighbors, but also on the com-munities where these neighbors are part of. The user’sbehavior is measured by the popularity of the node(degree), social network analysis measures (closeness,betweenness), the rank (PageRank, HITS) and a newmeasure that they propose which gives informationabout how a node is related to the communities of thenetwork. Based on the actor’s position within the com-munity, an actor may be a kind of “bridge” passing in-formation from one community to another or someonewith a lot of links within the community. Similar ap-proaches use a set of social network analysis metrics(e.g. centrality, betweenness) to extract the individualswho play a role. The identification or extraction of so-cial roles may be enhanced with the use of such mea-sures.

Page 15: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

Evaluating approaches. In the case when the rolesare explicit and undoubtedly clear (e.g. the role of amother, a manager etc.), or they are based on well-defined criteria (e.g. maximum number of inlinks in anetwork), evaluating a role identification technique isquite straightforward. Nevertheless, when the criteriaof a role include subjectivity, such as in the case ofranking experts or influencers within a social network,then organizing experiments and evaluating results isnot evident [2]. In this context, evaluation of role iden-tification methodologies presents a research challenge.

The evaluation of influence identification can bedone by seeing whether people that are supposed to beinfluenced are indeed influenced. Throughout the liter-ature, there are some evaluation propositions. For in-stance, web sites that host liked user posts (such as thedigg.com) are used [3], assuming that posts of influ-encers are often liked and, as a result, they may appearin such sites.

Simulations which show that new ideas are diffusedquicker when they are initially directed towards theopinion leaders are proposed [68] as well as evalua-tions of systems by propagating a new game throughFaceBook [44]. This latter has focused on analysingthe total number of users who played the game as wellas the number of influencer’s invitations accepted. Fur-thermore, the Independent Cascade Model [43] hasbeen used [59], based on the probability with which anactivated node will activate its neighbors.

Regarding identification of expertise, human ratersare usually asked to participate [76]. These raters readposts of forums written by users in order to rate theexpertise of each one of them.

6. Conclusion

This survey article presents a state-of-the-art of ap-proaches regarding the identification of roles within asocial network. Roles may be predefined, based on cer-tain criteria such as a maximum number of out- or in-links. Roles may also emerge from the link structureof the network. In any case the extraction of roles issignificant for various reasons ranging from market-ing/industrial (e.g. the case of viral marketing) to user-oriented interests (talk to or avoid certain people insideforums).

The status of a role depends on the context. A per-son has a role in relation to something or someone. Ap-proaches such as the blockmodel and the probabilis-tic model reflect a more objective reality, in the sense

that the role of a “manager” or the role of a “child”is a role based on definitions accepted by everyone.On the other hand, approaches that aim at the identifi-cation of roles (e.g. experts or influencers) within on-line discussions, are more subjective, since it is not al-ways straightforward to rank two actors that have simi-lar characteristics. The social role of actors who partic-ipate in online discussions depends on their interests,their activity, their recognition by others. Thus, theseare characteristics that are not defined the same way byeveryone.

The majority of the current approaches whose ob-jective is to identify roles inside communities are basedon the link analysis of the social network. Future per-spectives aiming to enhance such approaches shouldconsider additional dimensions such as the temporalone, the content of the exchanged messages (existingopinions, vocabulary, etc.), the presence and influenceof actors by communities they belong or they do notbelong to. Text and Opinion Mining techniques shouldbe involved in the analysis of actor-generated content,the evolution of interactions through time should betaken into account and the way in which a commu-nity may affect the emergence of different roles shouldbe considered. Moreover, benchmarking methodolo-gies should be studied in order to facilitate the task ofevaluation of such methods.

References

[1] L.A. Adamic, J. Zhang, E. Bakshy, and M.S. Ackerman. Knowl-edge sharing and yahoo answers: everyone knows something.In: Proceeding of the International Conference on World WideWeb (WWW’08), pages 665–674, Beijing, China, 2008. ACMPress.

[2] N. Agarwal and H. Liu. Blogosphere: research issues, tools, andapplications. SIGKDD Exploration, 10(1):18–31, 2008. IEEEPress.

[3] N. Agarwal, H. Liu, L. Tang, and P.S. Yu. Identifying the in-fluential bloggers in a community. In: Proceedings of the In-ternational Conference on Web search and web data mining(WSDM’08), pages 207–218, Stanford, CA, USA, 2008. ACMPress.

[4] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing. Mixedmembership stochastic blockmodels. Journal of Machine Learn-ing Research, 9:1981–2014, 2008. JMLR.

[5] K. Balog and M. De Rijke. Determining expert profiles (with anapplication to expert finding). In: Proceedings of InternationalJoint Conference on Artificial Intelligence (IJCAI’07), pages2657–2662, Hyderabad, India, 2007. AAAI.

[6] D.M. Blei and J.D. Lafferty. Dynamic topic models. In: Pro-ceedings of the International Conference on Machine learn-ing (ICML’06), pages 113–120, Carnegie Mellon, Pennsylvania,USA, 2006. ACM Press.

Page 16: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

[7] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet alloca-tion. Journal of Machine Learning Research, 3:993–1022, 2003.JMLR.

[8] S.P. Borgatti and M.G. Everett. Notions of position in social net-work analysis. Journal of Sociological Methodology, 22:1–35,1992. JSTOR.

[9] G. Borgatti Martin and P. Stephen. Two algorithms for comput-ing regular equivalence. Journal of Social Networks, 15(4):361-376, 1993. Elsevier.

[10] R.L. Breiger, S.A. Boorman, and P. Arabie. An algorithmfor clustering relational data with applications to social net-work analysis and comparison with multidimensional scaling*1. Journal of Mathematical Psychology, 12(3):328–383, 1975.Elsevier.

[11] B. Chou and E. Suzuki. Discovering community-oriented rolesof nodes in a social network. In: Proceedings of the Data Ware-housing and Knowledge Discovery (DaWaK’10), pages 52–64,Bilbao, Spain, 2010. Springer.

[12] K.K.S. Chung, L. Hossain, and J. Davis. Exploring socio-centric and egocentric approaches for social network analysis.In: Proceedings of the International Conference on KnowledgeManagement (KMAP’05), Wellington, New Zealand, 2005.

[13] A. Daud, J. Li, L. Zhou, and F. Muhammad. A generalizedtopic modeling approach for maven search. In: Proceedings ofthe Advances in Data and Web Management (APWeb WAIM’09),pages 138–149, Suzhou, China, 2009. Springer.

[14] A. Daud, J. Li, L. Zhou, and F. Muhammad. Conference min-ing via generalized topic modeling. In: Proceedings of the Ma-chine Learning and Knowledge Discovery in Databases (ECMLPKDD’09), pages 244–259, Bled, Slovenia, 2009. Springer.

[15] X. Ding and B. Liu. The utility of linguistic rules in opin-ion mining. In: Proceedings of the International Conferenceon Research and Development in Information Retrieval (SI-GIR’07), pages 811–812, Amsterdam, The Netherlands, 2007.ACM Press.

[16] B. Dom, I. Eiron, A. Cozzi, and Y. Zhang. Graph-based rank-ing algorithms for e-mail expertise analysis. In: Proceedings ofthe SIGMOD Workshop on Research Issues in Data Mining andKnowledge Discovery (DMKD’03), pages 42–48, San Diego,California, USA, 2003. ACM Press.

[17] P. Domingos. Mining social networks for viral marketing.Journal of Intelligent Systems, 20(1):80–82, 2005. IEEE Press.

[18] P. Domingos and M. Richardson. Mining the network valueof customers. In: Proceedings of the ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining(KDD’01), pages 57–66, San Francisco, CA, USA, 2001. ACMPress.

[19] J.S. Donath. Identity and deception in the virtual community.Communities in cyberspace, pages 29–59, 2009. PsychologyPress.

[20] P. Doreian, V. Batagelj, and A. Ferligoj. Generalized block-modeling. Cambridge University Press, 2005.

[21] J. Edachery, A. Sen, and F. Brandenburg. Graph clustering us-ing distance-k cliques. In: Proceedings of the International Con-ference on Graph Drawing, pages 98–106, Stirìn Castle, CzechRepublic, 1999. Springer.

[22] C. Elkan. Method and system for selecting documents by mea-suring document quality, US Patent App. 10/004,514, 2007.Google Patents.

[23] W. Fan, L. Wallace, S. Rich, and Z. Zhang. Tapping thepower of text mining. Communications of the ACM, 49(9):76–

82, 2006. ACM Press.[24] K. Faust. Comparison of methods for positional analysis: struc-

tural and general equivalences* 1. Journal of Social Networks,10(4):313–341, 1988. Elsevier.

[25] S.E. Fienberg and S.S. Wasserman. Categorical data analysisof single sociometric relations. Journal of Sociological Method-ology, 12:156–192, 1981. JSTOR.

[26] D. Fisher, M. Smith, and H.T. Welser. You are who youtalk to: detecting roles in Usenet newsgroups. In: Proceedingsof the Hawaii International Conference on System Sciences(HICSS’06), pages 59b–59b, Island of Hawaii, USA, 2006.IEEE Press.

[27] W. Fu, L. Song, and E.P. Xing. Dynamic mixed membershipblockmodel for evolving networks. In: Proceedings of the In-ternational Conference on Machine Learning (ICML’09), pages329–336, Montreal, Canada, 2009. ACM Press.

[28] A. Ghose, P.G. Ipeirotis, and A. Sundararajan. Opinion min-ing using econometrics: a case study on reputation systems. In:Proceedings of the Association for Computational Linguistics(ACL’07), pages 416–423, Prague, Czech Republic, 2007. ACL.

[29] E. Goffman. The presentation of self in everyday life, Double-day, 1959.

[30] S.A. Golder and J. Donath. Social roles in electronic commu-nities. In: Proceedings of the International Conference of Inter-net Research (IR’04), pages 13–22, Brighton, England, 2004.Citeeser.

[31] D. Greene, D. Doyle, and P. Cunningham. Tracking the evo-lution of communities in dynamic social networks. In: Pro-ceedings of the International Conference on Advances in SocialNetworks Analysis and Mining (ASONAM’10), pages 176–183,Odense, Denmark, 2010. IEEE Press.

[32] M.S. Handcock, A.E. Raftery, and J.M. Tantrum. Model-basedclustering for social networks. Journal of the Royal StatisticalSociety: Series A (Statistics in Society), 170(2):301–354, 2007.Wiley Online Library.

[33] S. Hansell. Cooperative groups, weakties, and the integrationof peer friendships. Journal of Social Psychology Quarterly,47(4):316–328, 1984. JSTOR.

[34] K. M. Harris, F. Florey, J. Tabor, P. S. Bearman, J. Jones, andR. J. Udry. The national longitudinal study of adolescent health:research design. Technical report, Carolina population center,USA, 2003.

[35] G. H. Heil and H. C. Whit. An algorithm for constructinghomomorphisms of multiple graphs, Department of Sociology,Harvard University, 1974. Unpublished paper.

[36] J.L. Herlocker, J.A. Konstan, L.G. Terveen, and J.T. Riedl.Evaluating collaborative filtering recommender systems. Jour-nal of Transactions on Information Systems, 22(1):5–53, 2004.ACM Press.

[37] I. Himelboim, E. Gleave, and M. Smith. Discussion catalystsin online political discussions: content importers and conver-sation starters. Journal of Computer-Mediated Communication,14(4):771–789, 2009. Wiley Online Library.

[38] P.W. Holland and S. Leinhardt. An exponential family of prob-ability distributions for directed graphs. Journal of the AmericanStatistical Association, 76(373):33–50, 1981. JSTOR.

[39] M. Hu, E-P. Lim, A. Sun, H.W. Lauw, and B-Q Vuong. Mea-suring article quality in wikipedia: Models and evaluation. In:Proceedings of the ACM Conference on Information and Knowl-edge Management (CIKM’07), pages 243–252, Lisboa, Portu-gal, 2007. ACM Press.

Page 17: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

[40] M. Hu and B. Liu. Mining and summarizing customer reviews.In: Proceedings of the ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining (KDD’04), pages168–177, Seattle, WA, USA, 2004. ACM Press.

[41] A. Kao and S. Poteet. Text mining and natural language pro-cessing - introduction for the special issue. SIGKDD Explo-rations, 7(1):1–2, 2006. ACM Press.

[42] J.W. Kelly, D. Fisher, and M. Smith. Friends, foes, and fringe:norms and structure in political discussion network. In: Proceed-ings of the International Conference on Digital Government Re-search (DG.O’07), pages 21–24, San Diego, California, USA,2006. ACM Press.

[43] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spreadof influence through a social network. In: Proceedings of theACM SIGKDD International Conference on Knowledge Discov-ery and Data Mining (KDD’03), pages 137–146, Washington,DC, USA, 2003. ACM Press.

[44] E.S. Kim and S.S. Han. An analytical way to find influ-encers on social networks and validate their effects in dis-seminating social games. In: Proceedings of the InternationalConference on Advances in Social Network Analysis and Min-ing (ASONAM’09), pages 41–46, Athens, Greece, 2009. IEEEPress.

[45] J.M. Kleinberg. Authoritative sources in a hyperlinked en-vironment. Journal of the ACM, 46(5):604–632, 1999. ACMPress.

[46] F. Lorrain and H.C. White. Structural equivalence of individ-uals in social networks. Journal of Mathematical Sociology,1(1):49–80, 1971. Routledge.

[47] P. Massa and P. Avesani. Trust metrics on controversial users:balancing between tyranny of the majority and echo chambers.Journal on Semantic Web and Information Systems, 3(1):39–64,2007. Citeseer.

[48] A. McCallum, X. Wang, and A. Corrada-Emmanuel. Topic androle discovery in social networks with experiments on ENRONand academic email. Journal of Artificial Intelligence Research,30(1):249–272, 2007. AI Access Foundation.

[49] D.W. McDonald and M.S. Ackerman. Expertise recommender:a flexible recommendation system and architecture. In: Proceed-ings of the ACM International Conference on Computer Sup-ported Cooperative Work (CSCW’00), pages 231–240, Philadel-phia, Pennsylvania, USA, 2000. ACM Press.

[50] S.F. Nadel and M. Fortes. The theory of social structure, FreePress, 1957.

[51] J. O’Donovan. Capturing trust in social web applications. Com-puting with Social Trust, 1:213–257, 2009. Springer.

[52] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerankcitation ranking: Bringing order to the web. Technical Report,Stanford InfoLab, USA, 1998.

[53] G. Palla, A.L. Barabási, and T. Vicsek. Quantifying socialgroup evolution. Nature, 446(7136):664–667, 2007. NaturePublishing Group.

[54] A. Revillard. Les interactions sur l’Internet. Terrains ettravaux, 1:108–129, 2000. ENS Cachan.

[55] M. Richardson and P. Domingos. Mining knowledge-sharingsites for viral marketing. In: Proceedings of the ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining (KDD’02), pages 61–70, New York, NY, USA, 2002.ACM Press.

[56] S.F. Sampson. Crisis in a cloister, unpublished Ph.D. Disserta-tion, Dept. of Sociology, Cornell University, USA, 1969.

[57] J.E. Schwartz and M. Sprinzen. Structures of connectivity.Journal of Social Networks, 6(2):103–140, 1984. Elsevier.

[58] J. Scott. Social network analysis, Sage Publications, 1988.[59] J. Scripps, P.N. Tan, and A.H. Esfahanian. Node roles and

community structure in networks. In: Proceedings of theWorkshop on Web Mining and Social Network Analysis (We-bKDD/SNAKDD’07), pages 26–35, San Jose, California, USA,2007. ACM Press.

[60] T.A.B. Snijders and K. Nowicki. Estimation and prediction forstochastic blockmodels for graphs with latent block structure.Journal of Classification, 14(1):75–10, 1997. Springer.

[61] I. Soboroff, A. P. de Vries, and N. Craswell. Overview of theTREC 2006 enterprise track. In: Proceedings of the Text Re-trieval Conference (TREC’06), Gaithersburg, MD, USA, 2006.Citeseer.

[62] A. Stavrianou. Modeling and mining of web discussions, PhDDissertation, University of Lyon, France, 2010.

[63] A. Stavrianou, P. Andritsos, and N. Nicoloyannis. Overviewand semantic issues of text mining. SIGMOD Record, 36(3):23–34, 2007. ACM Press.

[64] M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Proba-bilistic author-topic models for information discovery. In: Pro-ceedings of the ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD’04), pages 306–315, Seattle, WA, USA, 2004. ACM Press.

[65] L. Streeter and K. Lochbaum. Who knows: a system based onautomatic representation of semantic structure. In: Conferenceof Recherche d’Information Assistée par Ordinateur (RIAO’88),pages 380–388, Cambridge, MA, USA, 1988. CID.

[66] S.G. Sheffer T. Rohan, T.J. Tunguz-Zawislak and J. Harmsen.Network node ad targeting, United States Patent Application,2008.

[67] R.J. Udry. The national longitudinal study of adolescenthealth: (add health) waves i and ii 1994–1996; wave iii 2001–2002. Technical report, University of North Carolina, USA,2003.

[68] T.W. Valente and R.L. Davis. Accelerating the diffusion of in-novations using opinion leaders. The Annals of the AmericanAcademy of Political and Social Science, 566(1):55–67, 1999.Sage Publications.

[69] F.B. Viégas and M. Smith. Newsgroup crowds and author-lines: Visualizing the activity of individuals in conversationalcyberspaces. In: Proceedings of the Hawaii International Con-ference on System Science (HiCSS’04), island of Hawaii, USA,2004. IEEE Press.

[70] Y.J. Wang and G.Y. Wong. Stochastic blockmodels for di-rected graphs. Journal of the American Statistical Association,82(397):8–19, 1987. JSTOR.

[71] S. Wasserman and C. Anderson. Stochastic a posteriori block-models: construction and assessment. Journal of Social Net-works, 9(1):1–36, 1987. Elsevier.

[72] H.T. Welser, E. Gleave, D. Fisher, and M. Smith. Visualizingthe signatures of social roles in online discussion groups. Jour-nal of Social Structure, 8(2):1–31, 2007.

[73] D.R. White and K.P. Reitz. Graph and semigroup homomor-phisms on networks of relations. Journal of Social Networks,5(2):193–234, 1983. Elsevier.

[74] H.C. White, S.A. Boorman, and R.L. Breiger. Social structurefrom multiple networks. I. Blockmodels of roles and positions.American Journal of Sociology, 81(4):730–780, 1976. JSTOR.

[75] A. Wolfe and D. Jensen. Playing multiple roles: discovering

Page 18: Roles in social networks: methodologies and research issuesmediamining.univ-lyon2.fr/velcin/public/publis/SNRolesWI... · 2015-07-16 · Identifying roles inside social networks is,

overlapping roles in social networks. In: Proceedings of theWorkshop on Statistical Relational Learning and its Connec-tions to Other Fields (ICML-SRL’04), Banff, Alberta, Canada,2004. ACM Press.

[76] J. Zhang, M.S. Ackerman, and L. Adamic. Expertise net-works in online communities: Structure and algorithms. In: Pro-ceedings of the International Conference on World Wide Web

(WWW’07), pp. 221–230, Banff, Alberta, Canada, 2007. ACMPress.

[77] C.N. Ziegler and J. Golbeck. Investigating interactions of trustand interest similarity. Decision Support Systems, 43(2):460–475, 2007. Elsevier.