Software Analytics
-
Upload
anapaulaspbr -
Category
Data & Analytics
-
view
85 -
download
0
description
Transcript of Software Analytics
MAC6912 - Ambientes de Desenvolvimento de Software
Professor Marco Aurélio Gerosa
Ana Paula Oliveira Bertholdo
• 4 papers:– Analytics for SW Development
(Zimmermann & Buse, 2010)
– Sw Analytics as a Learning Case in Practice:
Approaches and Experiences
(Zhang et al., 2011)
– Analyze this! 145 questions for data scientists in Sw
Engineering
(Begel & Zimmermann, 2014)
– What’s next in SW Analytics
(Hassan et al., 2013)
• Software engineering is a data rich
activity.
• Artifacts of a project’s development
– automation, efficiency, and granularity.
• Projects can be measured throughout
their life-cycle.
• SW development continues to be risky
and unpredictable.
• It is not unusual for major development
efforts to experience large delays or
failures.
• Substantial disconnect between
– (A) the information needed by project managers to make good decisions and
– (B) the information currently delivered by existing tools.
– At its root:• Problem: real-world information needs of project
managers are not well understood by the research community.
• Research has ignored the needs of managers and has instead focused on the information needs of developers.
• When data needs are not met…
– tools are unavailable
– too difficult to use
– too difficult to interpret or
– they simply do not present useful or actionable information
• Managers must primarily rely on past experience and intuition for critical decision making.
• The data-centric style of decision making is
known as analytics.
• The idea is to leverage large amounts of data
into real and actionable insights.
Figure 1: Analytical Questions. The researchers distinguish between questionsof information which can be directly measured, from questions of insight which arise from a careful analytic analysis and provide managers with a basisfor action.
• Transition isn’t easy!
• Insight necessarily requires
– knowledge of the domain coupled with the
– ability to identify patterns involving multiple
indicators.
• Managers may be too busy or may simply lack
the quantitative skills or analytic expertise to fully
leverage advanced analytical applications.
• One possibility is that tools should be created
with this in mind.
• Another possibility is the addition of an analytic
professional to the software development team.
• Conclusion– All resources, especially talent, are always constrained.
– This alludes to the importance of careful and deliberate decision making by the managers of software projects.
– The observation that software projects continue to be risky and unpredictable despite being highly measurable implies that more analytic information should be leveraged toward decision making.
– In this paper, the researchers • described how software analytics can help managers move from low-level
measurements to high-level insights about complex projects.
• advocated more research into the information needs and decision process of managers.
• discussed how the complexity of software development suggests that dedicated analytic professionals with both quantitative skills and domain knowledge might provide great benefit to future projects.
• Researchers (Microsoft Research Asia) advocate that when applying analytic technologies in practice one should:– (1) incorporate a broad spectrum of domain
knowledge and expertise, • e.g., management, machine learning, large-scale
data processing and computing, and information visualization; and
– (2) investigate how practitioners take actions on the produced information, and provide effective support for such information-based action taking.
– Various analytic technologies• (data mining, machine learning, and information
visualization).
– Software analytics is to enable to perform data exploration and analysis in order to obtain insightful and actionable information.
– Insightful information• meaningful and useful understanding or knowledge
towards performing the target task.
– Actionable information• upon which software practitioners can come up with
concrete solutions towards completing the target task.
• Developing a software analytic project
typically goes through iterations of the
life cycle of four phases:
1) task definition,
2) data preparation,
3) analytic-technology development, and
4) deployment and feedback gathering.
• Task definition is to define the target
task to be assisted by software analytics
– pull model: Stack Mine -> performance
analysis
– push model: XIAO -> refactoring and defect
detection
• Data preparation is to collect data to be
analyzed.
– 2 types of infrastructure supports: existing
ones in industry and in-house ones.
– StackMine-> existing Microsoft
infrastructure support.
– XIAO-> in-house code-analysis.
• Analytic-technology development is to develop problem formulation, algorithms, and systems to explore, understand, and get insights from the data.
– The SA team needs to acquire deep knowledge about the data (including its format and semantics) and target tasks.
– the time this acquirement process may be non-trivial.
• Deployment and feedback gathering
involves two typical scenarios.
– 1: the researchers have obtained some
insightful information from the data and they
ask domain experts to review and verify.
– 2: the researchers ask domain experts to use
the analytic tools to obtain insights by
themselves.
• “the more the customers use the tools, the “smarter”
the tools become.”
• Domain knowledge and expertise are strongly needed in successfully developing a software analytic project for technology transfer.
• Types of domain knowledge:
– Specific application domain knowledge (customers).
– Common application domain knowledge(family of sw applications).
– Data domain knowledge (data preparation).
• Types of expertise:– Task expertise
• work with the customers to learn the workflow.
– Management expertise• good management and communication skills to interact with
the customers and manage the team.
– Machine learning expertise. • to develop machine learning algorithms and tools (not just in
a black-box way).
– Large-scale data processing/computing expertise. • to design and implement scalable data processing tools and
learning tools.
– Information visualization expertise. • to design and implement good user interfaces and
visualization for presenting analysis results.
• Conclusion:
– What do developers think about your
result?
– Is it applicable in their context?
– How much would it help them in their daily
work?”
• Results from 2 Surveys related to data Science applied to SW Engineering.
• 1st Survey:• questions that sw engineers would like data
scientists to investigate about sw, sw processes and practices and sw engineers.
• 2nd Survey:
– Sw engineers rate 145 questions andidentify the most importante ones to workon first.
• Businesses of all types commonly use analytics to better reach and understand their customers.
• Many software engineering researchers have argued for more use of data for decision-making.
• The demand for data scientists in software projects will grow rapidly.
• Harvard Business Review named the job of Data Scientist as the most desired Job of the 21st Century
• By 2018, the U.S. may face a shortage of as many as 190,000 people with analytical expertise and of 1.5 million managers and analysts with the skills to make data-driven decisions, according to a report by the McKinsey Global Institute.
• Research goal:
– Presents a ranked list of questions that sw
engineers want to have answered by data
scientists.
– The list was deployed among professional
sw engineers at Microsoft.
• The research:– provides a catalog of 145 questions that
software engineers would like to ask data scientists about software.
– ranks the questions by importance (and opposition) to help researchers, practitioners, and educators focus their efforts on topics of importance to industry.
– calls to action to other industry companies and to the academic community to replicate its methods and grow the body of knowledge from this start (technical report).
• Initial survey:
– 2 pilot surveys to 25 and 75 Microsoft engineers.
– The pilot demonstrated the need to seed the survey with data analytics questions.
• What impact does code quality have on our ability to monetize a software service?
– 1500 SW engineers in September 2012.
– 36,5% developers, 38,9% testers, 22,7% program managers.
• Rating Survey:
– Split Questionnaire Survey Design
• Component blocks
– 607 responses (2500 engineers)
– 16,705 ratings
– Multiple-choice format
– 29,3% developers, 30,1% testers and
40,5% program managers.
• Of the questions with the most
opposition, the top five are about the
fear that respondents had of being
ranked and rated.
Catalog of 145 questions is relevant for: • Research:
– the descriptive questions outline opportunities to collaborate with industry and
– influence their software development processes, practices, and tools.
• Practice:• the list of questions identifies particular data to collect and analyze to
find answers,
• as well as the need to build collection and analysis tools at industrial scale.
• Education:• the questions provide guidance on what analytical techniques to teach
in courses for future data scientists,
• as well as providing instruction on topics of importance to industry (which students always appreciate).
• Conclusion– Researchers hope that this paper will inspire similar research
projects.
– In order to facilitate replication of this work for additional engineering disciplines and companies, they provide the full text of both surveys as well as the 145 questions in a technical report.
– With the growing demand for data scientists, more research is needed to better understand how people make decisions in software projects and what data and tools they need.
– There is also a need to increase the data literacy of future software engineers.
– Lastly, we need to think more about the consumer of analyses and not just the producers of them (data scientists, empirical researchers).
• 6 established experts in SW analytics
• What is the most importante aspect of
this field?
• 1) SW analytics should go beyond developers.
• 2) Analytics should prove its relevance to
practitioners.
• 3) Mere numbers aren’t enough.
• 4) 3 Questions for analytics.
• 5) Opportunities for natural SW analytics.
• 6) Assistance from Information Analysts.
• SW analytics should go beyonddevelopers
– SA focuses on helping individual developers with coding and bug-fixing decisions
• by mining developer-oriented repositories such as version control systems and bug trackers.
– SA needs to service a project’s various stakeholders
• marketing, sales, support teams – not just developers.
• SW analytics should go beyond
developers
– Artifacts and Knowledge across a project’s
various facets.
– Importance os a piece of code and its
impact on user satisfaction and revenue.
• Marketers -> field usage data.
• Sales staff -> inherent value that customers
associate with each feature.
• Proving relevance to Practitioners
– Future -> Layers of context are taken intoconsideration:
• Domain of SW development– nonfunctional requirements, environments, tools,
idioms, and so on.
• Domain of the software itself– databases, applications, and so on.
• Context of the overall software project– Requirements, glossary, architecture, community, and
so on.
• Proving relevance to Practitioners
– Software analytics has to prove its
relevance by showing its cost effectiveness
versus the alternative, which is doing
nothing.
• Doing nothing can be amazingly efficient.
• We need to evaluate these techniques with
practitioners in mind.
• More meaningful and less superficial software
analytics.
• Mere numbers aren’t enough
– Numbers and equations are important to capture relations in the data,
– For practical use: they must be accompanied with interpretation and visualization.
– It’s a transfer from the quantitative domain to the qualitative domain.
– more research is needed on:• how to bring the message out of the software
analytics to those who make decision based on them.
• 3 Questions for Analytics:
• 1) How much better is my model performing than a simple strategy, such as guessing?
• 2) How practically significant are the results?– effect sizes
• 3) How sensitive are the results to small changes in one or more of the inputs?– uncertain data
• Opportunities for natural SW analytics– using models from statistical natural language
processing for a new kind of analytics.
– What most people write and say, most of the time, is highly repeatable and predictable.
– Devices like Google Translate and Siri.
– Code is no different.• most everyday code is simple and highly predictable.
– Able to adapt standard n-gram models from statistical NLP to code, and train them on hundreds of millions of LOC.
– Code is actually between 8 and 16 times more predictable than English.
• Wanted: Assistance from Information Analysts
– Mission Impossible and TV Series 24• Fields agents -> heroes -> developers
• We shouldn’t neglect the information analysts (Chloe on 24)
• Information Analysts -> provide critical information
– such as the backgrounds, strengths, and weaknesses of the people, places, and eventualities faced by the field agents.
– Without the information analysts, it’s hard to imagine a successful mission.
– Information analysts = real heroes.
• Wanted: Assistance from Information Analysts– Developers have to figure out all the necessary
information about • what and where and how to change the software by
themselves.
– We need to provide the services of information analysts to developers
• and assist them in making the right decisions.
– SW analytics can continually provide contextual information based on developers’ current tasks.
– Decent information visualization and computer-human interaction technologies
• can help present this information efficiently.
• Papers discuss:
– Context!
– Relevance for practioners.
– New ways for conducting SW analytics.
– Importance of new studies.
– Addition of an analytic professional to the
software development team.
• Video:
– https://www.youtube.com/watch?v=nO6X0a
zR0nw
IEEE Software editor in chief Forrest Shull
speaks with Tim Menzies about the growing
importance of software analytics. From IEEE
Software's July/August 2013 issue.
[1] Buse & Zimmermann: Analytics for Software
Development (FoSER 2010).
[2] Zhang et al.: Software Analytics as a Learning Case
in Practice: Approaches and Experiences (MALETS
2011).
[3] Begel & Zimmerman: Analyze This! 145 Questions
for Data Scientists in Software Engineering (ICSE 2014).
[4] Hassan, Hindle, Runeson, Shepperd, Devanbu, &
Kim: What’s Next in Software Analytics (IEEE Software
2013).