What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

61
What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes. Andrea Sebastiano Carol V. Junji Corrado A. Gerardo Harald Di Sorbo Panichella Alexandru Shimagaki Visaggio Canfora Gall UNIVERSITÀ DEGLI STUDI DEL SANNIO

Transcript of What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

Page 1: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

What Would Users Change in My App? Summarizing App Reviews for

Recommending Software Changes.

Andrea Sebastiano Carol V. Junji Corrado A. Gerardo Harald Di Sorbo Panichella Alexandru Shimagaki Visaggio Canfora Gall

UNIVERSITÀ DEGLI STUDI DEL

SANNIO

Page 2: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

2

OUTLINE

Context: Manual v.s. AutomatedAnalysis of User Reviews

Proposed Solution: Generating Summaries of User Reviews

Case Study: Assessment of the SummariesInvolving 23 Developers

Conclusion and Future Work

Page 3: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

3

Manual v.s. AutomatedAnalysis of User Reviews

V.S.

Page 4: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

4

Maintenance of Mobile Applications

“About one third of app reviews contain useful information for developers”

Pagano et. al. RE2013

Page 5: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

5

Manual Analysis of Reviews

Page 6: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

6

PAST WORKChen et al – ICSE

2014

Text Analysis to filter out non-informative reviews

Topic Analysis to recognize topics treated in the reviews classified as informative

Page 7: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

7

PAST WORK Panichella et al – ICSME

2015

FEATURE REQUESTPROBLEM DISCOVERY

INFORMATION SEEKINGINFORMATION GIVING

OTHER

Sentiment Analysis+

Natural Language Parsing

+Text Analysis

Page 8: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

8

The Problem

Feature Requests Bug Reports

Page 9: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

9

Generating Summaries of User Reviews

SURF (Summarizer of User Review Feedback)

Page 10: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

10

USER REVIEWS MODEL

Page 11: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

11

USER REVIEWS MODEL

I love this app but it crashes my whole iPad and it

has to restart itself•User intention: Problem Discovery

•Review topics: App, Model“…The User Reviews Model proposed by the authors is impressive in how it analyzes a review sentence by sentence and is able to characterize a sentence with multiple labels…” – one of FSE reviewers

Page 12: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

12

SUMMARIZER OF USER REVIEW FEEDBACK

Page 13: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

13

1. Data Collection1

Page 14: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

14

2. Intention Classification2

machinelearning

Page 15: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

15

3. Topics Classification3

Page 16: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

16

3. Topics Classification3

Page 17: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

17

3. Topics Classification3

Can't change position of icons on main screen and can't close bookmarks icon

too.

screen, trajectory, button, white, background, interface, usability, tap, switch, icon, orientation,

position, picture, show, list, category, cover, scroll, touch, website, swipe, sensitive, view, roll, side, sort, click, small, colorful, glitch, page, corner,

bookmark…

GUI-related dictionary

P (SENTENCE, GUI) = 5/14 = 0.357

Page 18: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

18

4. Sentence Scoring

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.

Intention Class ScoreProblem Discovery 3.0Feature Request 3.0

Information Seeking 1.0Information Giving 1.5

Other 0.5

4

IRSSENTENCE = 3.0

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

Page 19: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

19

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences

P (SENTENCE, GUI) = 5/14 = 0.357

4. Sentence Scoring4

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

Page 20: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

20

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences.Obs3) Longer sentences are usually more informative than shorter ones.

L SENTENCE = 80

4. Sentence Scoring4

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

Page 21: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

21

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences.Obs3) Longer sentences are usually more informative than shorter ones.Obs4) Reviews treating frequently discussed features may attract more attention of developers than reviews dealing with features rarely used or discussed by users

MFWR (SENTENCE,GUI) = 2/14 = 0.143

4. Sentence Scoring4

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

Page 22: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

23

5. Summary Generation5

Page 23: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

24

Case Study

Involving 23 Developers

Page 24: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

25

Case Study

Involving 23 Developers

3439 Reviews

Page 25: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

26

Case Study

Involving 23 Developers

3439 Reviews

Of17

Apps

Page 26: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

27

Research Questions

RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers?

RQ2: To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs?

URM

Page 27: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

28

Study Procedure

Page 28: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

29

TWO Experiments

Experiment I Experiment II

ITALY SWITZERLAND

NETHERLAND

JAPAN

Page 29: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

30

TWO Experiments

Experiment I Experiment II

ITALY SWITZERLAND

NETHERLAND

JAPAN

Page 30: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

31

TWO Experiments

Experiment I

ITALY

SWITZERLAND

NETHERLAND

Page 31: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

32

TWO Experiments

Experiment I

ITALY

SWITZERLAND

NETHERLAND

1) Summaries for 15Apps

Page 32: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

33

TWO Experiments

Experiment I

ITALY

SWITZERLAND

NETHERLAND

1) Summaries for 15Apps

2) Involving 16 Developers (6 were the original

developers)

Page 33: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

34

TWO Experiments

Experiment I

ITALY

SWITZERLAND

NETHERLAND

1) Summaries for 15Apps

2) Involving 16 Developers (6 were the original

developers)3) We assigned to each participant an app.

Page 34: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

35

TWO Experiments

Experiment II

JAPAN

Page 35: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

36

TWO Experiments

Experiment II

JAPAN

1) Summaries Of 2Apps

Page 36: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

37

TWO Experiments

Experiment II

JAPAN

1) Summaries Of 2Apps

2) Involving 7 Employers from

Page 37: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

38

TWO Experiments

Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)

Experiment II-A

Experiment II-B

Page 38: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

39

TWO Experiments

Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)

Experiment II-A

Experiment II-B

Participants ClassifiedReviews according to URM Participants Classified

Reviews according to URM

Page 39: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

40

TWO Experiments

Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)

Experiment II-A

Experiment II-B

Participants ClassifiedReviews according to URM Participants Classified

Reviews according to URM

Participants Validatedthe summaries generated

by SURF

Participants Validatedthe summaries generated

by SURF

Page 40: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

41

Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

RQ1

Page 41: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

42

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

Experiment I Experiment II &

Page 42: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

43

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

Experiment I Experiment II &

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Page 43: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

44

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Experiment I Experiment II &

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

Page 44: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

45

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Experiment I Experiment II &

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

“I found the classification GUI-BUG, APP-BUG, etc

very useful. . .”

Page 45: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

46

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Experiment I Experiment II &

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

“. . in case I'm searching for BUGs, I can just

look for the category, instead of reading

everything over andover again. . .”

“I found the classification GUI-BUG, APP-BUG, etc

very useful. . .”

Page 46: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

47

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Experiment I Experiment II &

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

“. . in case I'm searching for BUGs, I can just

look for the category, instead of reading

everything over andover again. . .”

“I found the classification GUI-BUG, APP-BUG, etc

very useful. . .”

SUMMARY: Most of participants consider URM as a robust and suitable model for representing user

needs in meaningful maintenance tasks for developers.

Page 47: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

48

To what extent does a summarization technique developed on top of URM help mobile developers better understand the

users' needs?

RQ2

Page 48: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

49

RQ2: To what extent does a summarization technique developed on top of URM help mobile developers

better understand the users' needs?

Page 49: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

50

RQ2:

The validation task performed by the survey participants highlights

the very high classification accuracy of

SURF, which is 91%.

To what extent does a summarization technique developed on top of URM help mobile developers

better understand the users' needs?

Page 50: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

51

RQ2:

The validation task performed by the survey

participants highlights the very high classification

accuracy of SURF, which is 91%.

To what extent does a summarization technique developed on top of URM help mobile developers

better understand the users' needs?

SURF works reasonable well in summarizing user feedback regarding change requests

concerning GUI, APP, FEATURE improvements with the only

exception of the maintenance topic “COMPANY”.

Page 51: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

52

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

Page 52: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

53

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

Page 53: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

54

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

Page 54: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

55

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

SURF helps to prevent more than 50% of the time required by developers for analyzing users

feedback and planning software changes.

Page 55: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

56

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

SURF helps to prevent more than 50% of the time required by developers for analyzing users

feedback and planning software changes.66% of feedback manually

extracted by the participants also appear in the summaries

automatically generated by SURF.

Page 56: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

57

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

SURF helps to prevent more than 50% of the time required by developers for analyzing users

feedback and planning software changes.66% of feedback manually

extracted by the participants also appear in the summaries

automatically generated by SURF.

SUMMARY: 1) SURF helps to prevent more than half of

the time required by developers for analyzing users feedback and planning software changes.

2) 66% of manually extracted feedback appears also in the automatic generated summaries.

Page 57: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

58

Quality of SURF’ Summaries

Page 58: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

59

Quality of SURF’ Summaries

Page 59: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

60

Quality of SURF’ Summaries

Page 60: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

61

Conclusion

1) URM is a robust and suitable model for representing user needs in

meaningful maintenance tasks for developers.

2) SURF helps to prevent more than half of the time required for

analyzing users feedback and planning software changes.

3) 66% of manually extracted feedback appears

also in the automatic generated summaries.

V.S.

4) Summaries generated by SURF are reasonably correct,

adequate, concise, and expressive.

Page 61: What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

Thanks for the Attention!

Questions?

SURF (Summarizer of User Review Feedback)