Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. ·...

18
Software Engineering Research Unit Via Sommarive, 18 - 38123 POVO, Trento - Italy http://se.fbk.eu/ Semi-Automated Identification of Actors’ Intentions in OSS communities Itzel Morales-Ramirez and Anna Perini November 2013 Technical Report

Transcript of Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. ·...

Page 1: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

Software Engineering Research UnitVia Sommarive, 18 - 38123 POVO, Trento - Italyhttp://se.fbk.eu/

Semi-Automated Identification ofActors’ Intentions in OSS communities

Itzel Morales-Ramirez and Anna Perini

November 2013

Technical Report

Page 2: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

1 Introduction

Open Source Software (OSS) development rests on distributed, collaborativecommunities made of heterogenous and diverse stakeholders, such as users,developers, and analysts [7]. Open forum and mailing lists are commonlyused tools that enable distributed collaboration tasks for solution design,code writing, software deployment, maintenance and evolution. Mailing listdiscussions are highly exploited by the stakeholders to provide bug reports,feature requests or simply to ask for clarification. Discussants express theirarguments mainly in unstructured natural language (NL) text. The imme-diacy that this channel of communication offers to its users, makes it thepreferred one, but this can result in huge threads of e-mails to be carefullychecked by analysts in order to identify information relevant for softwaredevelopment tasks [4].

This calls for automated support to information extraction from these on-line discussions, and particularly to identify bug notification and new featurerequests.

In other domains, which focus on human-computer interaction or computer-mediated human interaction, intent-based information extraction is proposed.Intent-based information extraction roots in Speech-act theory (SAT), orig-inally formulated by Austin and Searle [3], and in their application in com-putational linguistic [20]. According to SAT, speakers “utterances automati-cally create expectations which guide the hearer towards the speaker’s mean-ing” (Grice’s claim [23]), which can aim at persuading, inspiring or gettingthe hearer to do something.

There is a vast literature on SAT applications, from spoken dialogueanalysis to written conversation analysis. In our research we are inspired byrecent works on online written conversations, such as chat and e-mail threads,which are considered almost synchronous conversations, as for instance [?],in which SAT is applied to identify deceptive participants in online conver-sations, and [16] in which discussion threads in student forum are analysedwith the objective of identifying missing answers to student questions by theteachers.

In this paper, we focus on mailing list discussions in OSS. They consist inset of e-mail threads. Each thread is initiated by a mailing list participant,who proposes a topic, usually explicitly stated in this root e-mail’s subject.A chain of reply messages usually follow. We propose to model this type ofdiscussions as set of linguistic and non-linguistic acts, that we call convers-

1

Page 3: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

act . Following SAT, convers-act are associated to discussant’s intentions,and these intentions are related to bug notification and feature requests. Wedefine a structured method to perform manual annotation of convers-act inOSS mailing list discussions, by adapting and extending SAT catalogues.

Exploiting popular computational linguistic tools, we define a semi-automatedprocess to identify discussants’ intentions. This semi-automated process hasbeen applied to samples of mailing list discussions taken from a real OSSproject, thus providing an experimental validation of its effectiveness, mea-sured in terms of precision and recall.

The remainder of the paper is structured as follows. In Section 2 wegive some background on SAT and a state-of-the-art NL processing tool. Weintroduce our modelling of online discussions in Section 3. In Section 4 wedescribe our approach to automate the intention identification. The relatedwork is presented in Section 5.

2 Background

In this section we recall basic definitions from Philosophy of Language,namely Speech Act Theory (SAT), we build on, and about the General Ar-chitecture for Text Engineering (GATE) tool for information extraction thathas been used in the work described in this paper.

2.1 Speech Act Theory

When a person says something s/he attempts to communicate certain thingsto an addressee by getting him or her to be affected by the speaker’s intention,in other words each utterance in a conversation corresponds to an actionperformed by the speaker. This is in a nutshell the basic claim of Speech ActTheory (SAT) developed by Austin and Searle in the field of Philosophy ofLanguage [18, 23].

Specifically, Austin [2] states that an utterance can be associated to lo-cutionary, illocutionary and perlocutionary acts. Locutionary act is the actof “saying something”, illocutionary act makes reference to the way in whichthe locutions are used and in which sense, and the perlocutionary act is theeffect on the audience that may be achieved. So for instance consideringthe utterance ”Today is Friday”, the locutionary act corresponds to the ut-terance of this sentence, the illocutionary act corresponds to the speaker’s

2

Page 4: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

intention to make the audience aware that she believes that today is Friday,and the effect, i.e. the perlocutionary act, is that the audience got convincedabout the speaker’s belief. Austin proposes a taxonomy of illocutionary actsand Searle works further in this direction, considering speech acts as theminimal units of linguistic communication [18]. He also revises Austin’s tax-onomy and classifies illocutionary acts along five high level categories in [19],moreover he distinguishes between direct and indirect speech act, thus recog-nising the role of non-linguistic acts, besides the linguistic ones, in humanconversations.

Several works in Computational Linguistics inspired from SAT to derivemodels for human conversation and dialogues, with the purpose of enablingautomated classification and retrieval e.g. [12, 20, 11, 21]. These works con-sider synchronous spoken conversation or dialogues, as in the case of phoneconversations. More recently also computer-mediated written conversations,as chats and e-mail threads are analysed through SAT based methods [22, 6],indeed they can be considered as almost synchronous conversation. In theHCI field, SAT is used to model human-ECA (Embodied ConversationalAgent) interactions [15].

For our research purpose we consider a revised version of previous tax-onomies of speech acts, which is proposed by Bach and Harnish in [3]. Intheir taxonomy there are four main kinds of communicative illocutionaryacts, namely constantives, directives, commissives, and acknowledgements.Constantives express the speaker’s belief and her intention or desire that thehearer has or forms a like belief. Examples are “I have to confess that I loveyou” or “I admit the pink flower is my favourite”. Directives express thespeaker’s attitude toward some prospective action that should be performedby the hearer and her intention that her utterance be taken as a reason forthe hearer’s action. For example, if I say to you: “Open the door, please!”, or“You have to buy milk”, I intend to motivate you to perform those actions.Commissives express the speaker’s intention to commit to do something. Asfor instance when I say to you: “I am going to repair the chair”, I intend tomake you believe that I’ll be committed to repair the chair. And acknowl-edgements express feelings regarding the hearer or the speaker’s intentionthat her utterance satisfies a social expectation. For instance, “Please ex-cuse me”.

To explain the different types of utterances and how the communicationis performed, 5 key elements are proposed in [3]: [S] speaker; [H] hearer; [e]an expression (typically a sentence); [L] in a language; and [C] the context.

3

Page 5: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

Applying them to one of the above used example, the directive speech act- “Open the door, please!” - is as follows: [S] is me; [H] hearer is you; [e]is “Open the door, please!” ; [L] English, spoken natural language; and [C]could be a situation in which we are exiting the office and I’m carrying aheavy box.

2.2 Information Extraction and the GATE Framework

Information extraction refers to the extraction of relevant information fromunstructured text, such as entities and relationships between them, thus pro-viding facts to feed a knowledge base [8]. GATE [9] is a framework for de-veloping and deploying software components for processing human language,which can support a wide range of NL processing tasks for information ex-traction. It is actually widely used both in research and application work indifferent fields (e.g. cancer research, web mining, law). It is also considereda robust and reliable framework that has been evolved for 18 years. GATEprovides the facility of using different plugins or components for text pro-cessing, such as ontologies, Stanford parsers, WordNet, and others. Amongthe core modules of GATE that we used for the work presented in this paper,are:-Sentence splitter, which splits the text into sentences, using RegEx splitterthat is based on regular expressions.-Tokeniser, which identifies basic “tokens” or words in the text.-Part-of-speech (POS) tagger, which associates tokens with parts of speechsuch as noun, verb, and adjective, based on the Hepple tagger 1.-Morphological analyser, which lemmatises the tokens to provide words intheir root form.-Gazetteer, which is a list of lists, where each list is a comprised of wordsthat are associated with a key domain concept.-Java Annotation Patterns Engine (JAPE), which enables the creation ofrules in the form of regular expressions written in the left-hand side (LHS)of the rule, and annotations in the right-hand side (RHS).

1Part-of-speech tags taken from http://gate.ac.uk/sale/tao/splitap7.html#x37-761000G

4

Page 6: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

3 Modelling Online Discussions of OSS com-

munities

In this section, we describe how we applied SAT concepts in the context ofonline discussions through mailing lists, as used in OSS development. Wefirst define key concepts and propose a method for modelling a conversationas set of linguistic and non-linguistic acts.

Online discussions. These discussions are organised as a set of threads ofe-mails. A thread is initiated by a member of the mailing list, who proposesa topic to be discussed (i.e. the field Subject: of such an e-mail). Thediscussion develops as a thread of replies by interested people who write theircontribution in NL. Different behaviours by the discussants to the resultingconversation emerge: someone asks about a topic, or states problems relatedwith it; others provide suggestions, answers questions or simply add details.In Figure 1 we present an excerpt of an online discussion through a mailinglist and explain the SAT elements that we have adapted for our researchwork. The [S] represents the writer who is the person in the field From:,the [H] is the addressee who is the person in the field To: (it can be alsoaddressees), each [e] is a sentence in the body of the message, [Lt] is theEnglish language that can be of type: informal, formal, or technical. And[Cs] is the online virtual space, in our case the online discussions through amailing list.

Discussants’ intentions. In terms of SAT concepts we can characterisee-mails in discussion threads as a set of linguistic acts (speech acts) and non-linguistic acts, such as e-mail attachments, URL links, code, etc. We willuse the term convers-act to name both types of acts. In the bottom part ofFigure 1 the highlighted sentence shows a convers-act suggesting to look atan URL link, ”See http://. . . ”. Convers-acts, hence, are composed of verbal,syntactic and semantic aspects that reflect the intention of a writer. In thispaper we consider that a intention is found in a sentence of a convers-actwhich is reified by a sequence of specific words. For instance, sentences like“Please help me . . . ”, whose interpretation made by an addressee can berecognised as the intention of requesting.

5

Page 7: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

Figure 1: Excerpt of an online discussion in OSS.

Convers-acts in online discussions. We have elaborated a taxonomyof categories and subcategories of convers-act shown in Table 1, which isadapted from [3]. Column Category refers to the four main kinds of convers-acts, the column Subcategory refers to the specific type of convers-act, andthe column Intention explains the way in which a sentence is interpreted toidentify the intention(s) contained in it by means of specific words. In thislast column we use the previously mentioned elements that take place in aconvers-act , i.e. [S] speaker, [H] hearer, and so on.

Selection of seed words reifying intentions. To find out what are thewords reifying the convers-act we want to identify, we first examined crawledmessages from Apache OpenOffice bugzilla2 to find regularities in writtenmessages. We analysed empirical data from other OSS project to not bebiased in defining the kind of words regularly used by discussants in onlinediscussions. We executed the following process, which is depicted in Figure 2:

(a) Crawling of data: we used a tool for crawling data, called Teleport Ultra3.We first executed a simple search on Apache OpenOffice bugzilla platformusing a word, for example feature, in the searching box. Then we took the

2https://issues.apache.org/ooo/3http://www.tenmax.com/Teleport/ultra/home.htm

6

Page 8: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

Table 1: Categories of convers-acts

Category Subcategory Intention

Constantives Assertives In writing e, S asserts that e if S expresses the intention thatH believes that e

Informatives In writing e, S informs H that e if S expresses the intentionthat H forms the belief that e

Confirmatives In writing e, S confirms (the claim) that e if S expresses theintention that H believes that e because S has support for e.

Concessives In writing e, S concedes that e if S expresses the intentionthat H believes that e

Descriptives In writing e, S describes ”something” if S expresses the in-tention that H believes that e is describing ”something”

Suggestives In writing e, S suggests that e if S expresses the intention thatH believes that there is a reason, but not sufficient reason, tobelieve that e

Suppositives In writing e, S supposes that e if S expresses the intentionthat H believes that it is worth considering the consequencesof e

Responsives In writing e, S responds the e if S expresses the intention thatH believes that e

Directives Requestives In writing e, S requests H to an action if S expresses theintention that H does an action because (at least partly) ofS ’s desire

Questions In writing e, S questions H as to whether or not e if S ex-presses the intention that H tells S whether or not e becauseof S ’s desire

Requirements In writing e, S requires H to an action if S expresses theintention that H does an action because of S ’s sentence

Expressives Thank In writing e, S thanks H for ”something” if S expresses theintention that H believes that S is grateful to H for ”some-thing”

Accept In writing e, S accepts to a previously claimed sentence by Hif S expresses the intention that he or she believes a previouslyclaimed sentence by H

Reject In writing e, S rejects to a previously claimed sentence by H ifS expresses the intention that he or she dissents a previouslyclaimed sentence by H

Negative opinion In writing e, S expresses an opinion using words that can likelyconvey negative emotions that H must consider.

Positive opinion In writing e, S expresses an opinion using words that can likelyconvey positive emotions that H must consider.

Attach (non-linguistic) Link S expresses an attachment in doing the action of writing anURL with the intention that H will do a click on it.

Code S expresses an attachment in doing the action of adding sourcecode with the intention that H reads it to figure out what ishappening or to implement it.

Log S expresses an attachment in doing the action of adding a logfile with the intention that H reads it to understand what ishappening.

7

Page 9: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

web link appearing in the web address box and used it for the crawlingtool.

(b) Parsing of data: we applied an algorithm with a random selection tosample 150 HTML files and parsed them into files with the extension.properties, by using Jsoup4. We also parsed the all 3,507 files.

(c) Indexing files: we indexed both sets of files, using Lucene5, to have acorpus of messages (3,507) and a sampling set of messages (150).

(d) Setting a reference list: we then elaborated a preliminary reference listof written intentions from all the corpus.

(e) Searching intentions: we took the reference list and searched for the themusing Lucene. The output is a set of messages that were analysed.

(f) Analysing messages: we manually read the messages matched by theprevious step and counted only those messages whose seed words had atleast three correct occurrences6. We elaborated a final list of seed wordsand their occurrence, see Table 2. The first and second columns are theCategory and Subcategory of convers-acts, the third column shows theSeed words and the last column the Frequency in terms of messages. Forinstance, the convers-act , category Directive, subcategory Requirements,has three instances of seed words, namely, “I want” with a frequency of11 messages, “I would like to have” found in 7 messages, and “It wouldbe nice” in 65 messages.

Table 2: Example of seed words reifying the intention of the convers-actRequirements.

Category Subcategory(tag) Seed words Frequency(in messages)

Directives Requirements “I want” 11“I would like to have” 7“It would be nice” 65

4http://www.jsoup.org5http://lucene.apache.org/core/6By “correct” we mean that the intention identified, in the message, is referring to one

type of intention by reading the context.

8

Page 10: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

Figure 2: Process to select the seed words.

Analysis of convers-acts. We propose a model to analyse the intentionsretrieved from the discussions threads. This model is guided through threelevels of granularity, presented in Table 3. In this table the first column refersto the Granularity level, the second to the Aggregation of intentions and thethird column shows an example with the identified convers-act associated.Reading the first row the granularity level is a sentence, we find that therecan be single intentions and nested intentions. For instance, in the sentence“I suggest you to make a copy of your data”, the single intention is “I suggestyou”, which is the reification of the convers-act Suggestives. The example ofnested intentions can be expressed as “Why don’t you try to configure theserver with the wizard”, in this case there are two intentions, one is ques-tioning, i.e. “Why don’t you try to. . . ?” , and the other one is suggesting,i.e. “don’t you try” , representing the convers-acts Questions and Sugges-tives, respectively. In the second row we present some examples of compoundintentions as we consider they can be interpreted. Single intentions can becomposed and support the analyses at the message level, in order to infer atype of indicator, either a Bug indicator, a Feature indicator or Clarificationin case the previous two indicators do not apply. The third level of analysis isapplied to the whole discussion thread. We have some assumptions, partiallyoriginated from the manual examination on empirical data and also relatedwork [16, 6], that a certain sequence of messages containing specific inten-

9

Page 11: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

tions may also be an indicator to discover or discard that a discussion canbe regarding a bug report and be instead a feature request. We also foundthat the convers-act Informatives is considered the default convers-act , incase no other act is identified.

Table 3: Granularity of analysis.

Granularity level Aggregation ofintentions

Example

Sentence Single intention

Suggestives︷ ︸︸ ︷”I suggest you”

Nested intentions

Questions︷ ︸︸ ︷”Why don′t you try︸ ︷︷ ︸

Suggestives

?”

Bug

indicato

r=

”There is a problem”︸ ︷︷ ︸

Negative opinion

”Can anyone help me?”︸ ︷︷ ︸Questions

Message Compound intentions

Featu

rein

dicato

r

=

”I really like the application”︸ ︷︷ ︸

Positive opinion

”It would be nice”︸ ︷︷ ︸Requirement

Discussion thread7 Related intentions

Because of the main objective of this paper we do not elaborate further onthe composition of intentions and other types of language that can be used([Lt]= informal or formal), but we present some ideas regarding the technicaltype. The type of language used in the discussion [Lt] is one element thatcan be an important parameter to be considered when the analysis of a wholediscussion thread is performed. We consider that regarding the domain ofOSS development the type of language is technical. Besides this, we have

7Not fully explained in this paper but some ideas are presented.

10

Page 12: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

defined that the compound intentions Negative opinion and Question mayrepresent a Bug indicator.

4 Automating Online Discussion Analysis

Figure 3: Overview of the semi-automated process to identify peoples inten-tions.

Once defined a model for a conversation in terms of convers-act , whichwere identified upon manual inspection of samples of conversations, we canexploit computational linguistic techniques to automate conversation analy-sis. In particular we are interested in automating the annotation of convers-act in huge conversation datasets, as OSS mailing list discussions, and theidentification of discussants’ intentions. The overview of our approach isillustrated in Figure 3. The linguistic analysis is executed on discussionthreads in the format of txt files, which is the input. The GATE frameworkis used to annotate intentions on the text messages of each thread. After, thefiles annotated with intentions are parsed to extract the intentions found ineach message. Finally, an analysis of intentions is performed, following theconvers-act classification by intentions defined in our conversation model.

11

Page 13: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

4.1 Tool

Our tool is based on a knowledge-heavy approach [1], this means using aPOS tagger, JAPE rules, tokeniser, lemmatiser and gazetteers. We make useof the GATE framework which uses gazetteers and JAPE rules to annotatethe intentions applying the corresponding tags. The tags for annotating arethe subcategories of convers-act defined in Table 1. In the bottom part ofFigure 3 we present a zoom to the tool. We explain the two main resourcesused to annotate intentions in the following.

In the bottom of the tool we observe two rectangles, the left-side rectanglerefers to the JAPE rules, an example is depicted in Figure 4. The right-siderectangle is the list of tags. We have formulated lexico-syntactic rules, takingthe seed words found in other corpus of messages, inspired from examplesgiven in [12].

The gazetteers used in our approach are the lists of verbs taken mainlyfrom [3] for each subcategory of convers-act. Some JAPE rules use thegazetteers to annotate intentions.

Figure 4: Example of a JAPE rule.

We manually designed the rules using the seed words and the codificationof the POS tagger used by GATE, following a syntax. The type of designis exemplified in Table 4, the first column is the Category of convers-act ,the second column refers to the tag assigned when the rule is successfullyapplied to a sentence. The third column presents the formulation of the

12

Page 14: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

Table 4: Example of rules for extracting Directives and Constantives convers-acts.

Category Tag Rule

Directives Questions < WRB >+ < PRP >+< content >+“?”< MD >+“anyone”+“help”+< content >+“?”

Constantives Suggestives < PRP >+(< MD >)*+(“try”|“check”)< PRP >+(< MD >)*+(“suggest”|“recommend”)

Responsives (< PRP >)* + “[Hh]ope” + < content > + “help”

rule, the regular expressions < content >, (< POS tag >)∗ and [Hh] makereference: to a set of words in the middle of two seed words or POS tags, tothe presence or absence of the POS tag and to the uppercase or lowercase ofthe first letter of a word, respectively. Each designed rule was then translatedinto JAPE rules. The tool uses the rules and tags to process the discussionthreads in plain format (i.e. TXT files). After the annotation is executed,the tool parses the files to extract only the intentions found in the text and aCSV file is generated by each discussion thread. This file contains the nameof the discussants and the intention(s) identified in their messages.

5 Related work

The analysis of NL textual messages in online discussion forums, bug-trackingsystems or mailing lists has been addressed by research works in HCI, computer-mediated business conversation analysis, and more recently in software engi-neering. We briefly recall relevant works in the following.

An automated identification of intentions is presented in [22]. This inves-tigation proposes a tool that is based on SAT, dialogue acts and fuzzy logicto analyse transcripts of telephone conversations. The goal of this researchis to derive participant profiles based on a map of the intentions expressed inthe conversation, with the purpose to enable automatic deception detection.In the domain of our work (i.e. OSS development) we are so-far assumingfaithful participation, and want to support automated identification of dis-cussants’ intentions in this community, such as requesting a feature, reportingbugs, or others. The classification of e-mails using speech acts is investigatedin [6]. They are interested in classifying e-mails regarding office-related inter-actions as negotiation and delegation of tasks. They introduce the term emailacts which follows a taxonomy of verbs and nouns and highlight the fact thatsequential email acts in a thread of messages contain information useful to

13

Page 15: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

classify certain email acts. Moreover they consider non-linguistic acts as de-liver. Analogously, we define convers-act to characterise the communicationactions in the context of mailing list discussion in OSS development. In [16]the investigation of speech acts, on thread of discussions in student forum,has as objective the identification of unanswered questions, to be assigned toan instructor for their resolution. They present some patterns of interactionfound in the threads, the patterns correspond to the answer and questionspeech acts. They also point out the challenges in dealing with discussionthreads due to high noisy data.

Focusing on applications of NLP techniques in Software Engineering andHCI, Ko et al. [14] perform a linguistic analysis of titles of bug reports tounderstand how people describe software problems. In their approach theyuse a part-of-speech tagger to identify nouns, verbs, adjectives, etc. andobtain their frequency. They make reference to an intent of a sentence orgrammatical mood that is indicated by the verbs, which can help in classifyingproblem reports from requests, but they only analyse the titles and concludethat the use of such a mood concepts need further investigation. Differently,we analyse convers-act in the body of the e-mails of a discussion thread.

With reference to Requirements Engineering tasks, in [10] NLP techniquesare used to support understanding stakeholder’s needs and the system’s do-main from textual documentation, which is often available. This techniqueautomatically identifies abstractions, i.e. entity or concept, that have a par-ticular significance in the domain, which can be used, for example, to verifythe completeness of a set of requirements. Knauss et al. [13], analyse discus-sion threads for requirements elicitation purposes. They focus on the contentof communication between stakeholders to find patterns of communicationused by stakeholders when they are seeking clarification on requirements.Their approach is based on a Naive Bayesian classifier, a classification schemeof clarification and some heuristics, with interesting results. Worth mention-ing is also the work presented in [5], that aims at analysing messages, orcomments, from users of software applications. Information extraction tech-niques and topic modelling are exploited to automatically extract topics, andto provide requirements engineers with a user feedback report, which willsupport them in identifying candidate new/changed requirements. In [17]is requested the explicit indication of people’s intention behind their onlinefeedback through a smartphone app. Such an intention can be selected ac-cording to “modes” and “options”. They provide three types of modes, i.e.“compliment”, “complain” and “neutral comment”. And predefined options,

14

Page 16: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

for instance “not very usable”.All the above mentioned research works in the Requirements Engineering

area use NL text messages or documents to discover patterns, relevant topicsor identify domain key terms, but none of the them consider SAT basedtechniques to understand stakeholders’ intentions behind their messages. Weconsider that the application of SAT in Requirements Engineering can bea powerful strategy to understand stakeholder’s intentions, thus analysingthe messages they exchange in current distributed collaboration and deriverequirements knowledge.

References

[1] L. Ahrenberg, M. Andersson, and M. Merkel. A knowledge-lite approachto word alignment. In Parallel Text Processing, pages 97–116. Springer,2000.

[2] J. Austin. How to Do Things with Words. Oxford, 1962.

[3] K. Bach and R. M. Harnish. Linguistic Communication and Speech Acts.MIT Press, Cambridge, MA, 1979.

[4] B. M. Camino, A. E. Milewski, D. R. Millen, and T. M. Smith. Replyingto email with structured responses. International Journal of Human-Computer Studies, 48(6):763 – 776, 1998.

[5] L. V. G. Carreno and K. Winbladh. Analysis of user comments: anapproach for software requirements evolution. In D. Notkin, B. H. C.Cheng, and K. Pohl, editors, ICSE, pages 582–591. IEEE / ACM, 2013.

[6] V. R. Carvalho and W. W. Cohen. On the collective classification ofemail speech acts. In Proceedings of the 28th annual international ACMSIGIR conference on Research and development in information retrieval,pages 345–352. ACM, 2005.

[7] C. Castro-Herrera and J. Cleland-Huang. Utilizing recommender sys-tems to support software requirements elicitation. RSSE’10, pages 6–10,2010.

[8] J. Cowie and W. Lehnert. Information extraction. Commun. ACM,39(1):80–91, Jan. 1996.

15

Page 17: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

[9] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani,I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz,M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, and W. Peters. TextProcessing with GATE (Version 6). 2011.

[10] R. Gacitua, P. Sawyer, and V. Gervasi. Relevance-based abstractionidentification: technique and evaluation. Requir. Eng., 16(3):251–265,2011.

[11] G. Goldkuhl. Conversational analysis as a theoretical foundation for lan-guage action approaches. In Proceedings of the Eighth International Con-ference on the Language Action Perspective on Communication Model-ing, LAP-2003, Tilburg, The Netherlands, pages 51–69, 2003.

[12] D. Jurafsky, L. Shriberg, and D. Biasca. Switchboard SWBD-DAMSLshallow-discourse-function annotation coders manual, draft 13. Techni-cal report, University of Colorado at Boulder Technical Report 97-02,1997.

[13] E. Knauss, D. Damian, G. Poo-Caamano, and J. Cleland-Huang. De-tecting and classifying patterns of requirements clarifications. In Re-quirements Engineering Conference (RE), 2012 20th IEEE Interna-tional, pages 251–260, 2012.

[14] A. J. Ko, B. A. Myers, and D. H. Chau. A linguistic analysis of how peo-ple describe software problems. In Proceedings of the Visual Languagesand Human-Centric Computing, VLHCC ’06, pages 127–134, Washing-ton, DC, USA, 2006. IEEE Computer Society.

[15] N. Novielli and C. Strapparava. Dialogue act classification exploitinglexical semantics. In D. Perez-Marin and I. Pascual-Nieto, editors, Con-versational Agents and Natural Language Interaction: Techniques andEffective Practices, pages 80–106. IGI Global, 2011.

[16] S. Ravi and J. Kim. Profiling student interactions in threaded discus-sions with speech act classifiers. Frontiers in Artificial Intelligence andApplications, 158:357, 2007.

[17] K. Schneider. Focusing spontaneous feedback to support system evolu-tion. In RE, pages 165–174. IEEE, 2011.

16

Page 18: Semi-Automated Identi cation of Actors’ Intentions in OSS communities … · 2018. 7. 12. · analysis to written conversation analysis. ... de ne a structured method to perform

[18] J. R. Searle. Speech acts: An essay in the philosophy of language, volume626. Cambridge university press, 1969.

[19] J. R. Searle. A taxonomy of illocutionary acts, pages 334–369. Universityof Minnesota Press, Minneapolis, 1975c.

[20] A. Stolcke, N. Coccaro, R. Bates, P. Taylor, C. Van Ess-Dykema,K. Ries, E. Shriberg, D. Jurafsky, R. Martin, and M. Meteer. Dialogueact modeling for automatic tagging and recognition of conversationalspeech. Comput. Linguist., 26(3):339–373, Sept. 2000.

[21] D. P. Twitchell, M. Adkins, J. F. Nunamaker, and J. K. Burgoon. Usingspeech act theory to model conversations for automated classificationand retrieval. In Proceedings of the International Working ConferenceLanguage Action Perspective Communication Modelling (LAP 2004),pages 121–130, 2004.

[22] D. P. Twitchell and J. Nunamaker, J.F. Speech act profiling: a proba-bilistic method for analyzing persistent conversations and their partici-pants. In System Sciences, 2004. Proceedings of the 37th Annual HawaiiInternational Conference on, pages 10 pp.–, 2004.

[23] D. Wilson and D. Sperber. Relevance theory. Handbook of pragmatics,2002.

17