©2005 Systeme Evolutif LtdSlide 1 Some Thoughts on Metrics Test Management Forum Paul Gerrard...
-
Upload
christian-thornton -
Category
Documents
-
view
219 -
download
0
description
Transcript of ©2005 Systeme Evolutif LtdSlide 1 Some Thoughts on Metrics Test Management Forum Paul Gerrard...
©2005 Systeme Evolutif Ltd
Slide 1
Some Thoughts on MetricsTest Management Forum
Paul GerrardSysteme Evolutif Limited3rd Floor9 Cavendish PlaceLondonW1G 0QDemail: [email protected]://www.evolutif.co.uk
©2005 Systeme Evolutif Ltd
Slide 2
I’m a Metrics Sceptic
©2005 Systeme Evolutif Ltd
Slide 3
A great book on metrics “The Tyranny of Numbers – why counting
can’t make us happy” by David Boyle– Nothing to do with software– More to with government statistics– Written in the same spirit as “How to Lie with
Statistics” – another counting classic I’ve appeared to be fairly negative about
metrics in the past Not true – its blind faith in metrics I don’t like!
©2005 Systeme Evolutif Ltd
Slide 4
“What matters, we cannot count, so let’s count what does not matter”
A lovely quote from an economist called Chambers (having a dig at economists)
I’ve changed it to reflect a tester’s mantra:– “Testers have come to feel
What can’t be measured, isn’t realThe truth is always an amountCount defects, only defects count.”
©2005 Systeme Evolutif Ltd
Slide 5
Some problems with metrics
Too often, numbers, and counting, being incontrovertible, are regarded as ‘absolute truth’
Numbers are incontrovertible, but the object being counted may be subjective
The person who collects the raw data for counting isn’t usually independent (and has an ‘agenda’)
There is a huge amount of filtering going on:– by individuals– but also, the processes we use, by definition, are
selective.
©2005 Systeme Evolutif Ltd
Slide 6
When I were a lad (development team leader)…
I collected all sorts of metrics to do with code To help manage my team and their activities,
I counted lines of code, code delivered, module size, module rates of change, fault location and type, fan-in, fan-out, and other statically detectible measures as well as costs allocated to specific tasks, lifted from a simple time recording system (that I wrote specifically to see where time went).
©2005 Systeme Evolutif Ltd
Slide 7
When I were a lad (development team leader)…2
I used them to justify:– buying tools, changing my team’s behaviour and
attitude to standards and other development practices as well as justifying my team’s existence through their productivity
– (Other teams didn’t have metrics, so we were, by definition, infinitely more productive)
Metrics are extremely useful as a political tool Metrics (statistics) are probably the most
useful tool of politics. Ask any politician! I knew I was collecting ‘good stuff’ but some of
it was to be taken more seriously than others.
©2005 Systeme Evolutif Ltd
Slide 8
Counting Defects is Misleading
©2005 Systeme Evolutif Ltd
Slide 9
My biggest objection… Counting defects is misleading, by definition A gruesome analogy (body count):
– to measure progress in a military campaign– not a good measure of how successful your campaign has
been A body count gives you the following information:
– opponent’s forces have been diminished by a certain number
» But what was it to start with?» How are the enemy recruiting more participants?» No one knows
The count represents the number of participants who are no longer in the campaign. So by definition, they don’t count anymore.
©2005 Systeme Evolutif Ltd
Slide 10
Body count The body count could be used to
measure our efficiency of killing But, is killing efficiency a good way to
measure progress in a campaign intended to capture territory, enemy assets, hearts and minds?
Hardly - dead people are a consequence, a tragic side issue, not the objective itself.
©2005 Systeme Evolutif Ltd
Slide 11
Defect/bug count Defect count gives us the following
information:– the number of defects have been diminished by
a certain number– but what was it to start with?– How are the developers (the enemy? Hehe)
injecting more defects?– No one knows – they certainly don’t
Predictive metrics are unreliable because of software languages, people, knock-on effects, coupling etc. etc.
©2005 Systeme Evolutif Ltd
Slide 12
Defect/bug count 2 Defect count of defects removed from the
system– By definition…they don’t count anymore– Bugs left in don’t count because they are trivial
The count can be used to measure test “efficiency”– But is “defect detection efficiency” a good way to
measure progress in a project intended to deliver functionality, business benefits, cost savings?
– NO! Defects are a consequence, a tragic side issue, and an inevitability, not the end itself
Need I go on?
©2005 Systeme Evolutif Ltd
Slide 13
Counting defects sends the wrong message to testers and management
If the only thing we count and take seriously is defects, we are telling testers that the only thing that counts is defects
All they ever do is look for defects All management think testers do is
find defects But what to managers want?
©2005 Systeme Evolutif Ltd
Slide 14
Managers want… To know the status of deliverables, what works,
what doesn’t They want… and want it NOW…:
– demonstration that software works– confidence that software is usable
Defects are an inevitable consequence of development and testing, but not the prime objective– They are a tactical challenge
Defects are a practitioner issue all the time– But not a serious management issue unless defects block
release and a decision is required to unblock the project Most of the time, test metrics are irrelevant.
©2005 Systeme Evolutif Ltd
Slide 15
Purpose of Testing, Purpose of Early Testing…
are Flawed
©2005 Systeme Evolutif Ltd
Slide 16
Myers has a lot to answer for
Myers advanced testing a few years when he defined the purpose of testing in 1978
But that flawed definition has held us back since 1983!
The defects we count aren’t representative Typically, system and acceptance test defects are
counted We recommend that all defects are counted
– But that’s hardly possible Even if we tried we couldn’t count them all The vast majority are corrected as they are created Finding bugs is a tactical objective, not strategic.
©2005 Systeme Evolutif Ltd
Slide 17
Most defects corrected before they have an impact
When we write a document, code or a test plan, we correct the vast majority of our mistakes instantly– never find their way into the review or test– vast majority of defects not found by “testing” at all
Testing only detects the most obscure faults But we use the metrics based on the obscure
defects to generalise and steer our testing activities Surely, this isn’t sensible? Only if we consider defects in all their various
manifestations, can we promote general theories of how testing can be improved.
©2005 Systeme Evolutif Ltd
Slide 18
Our approach to testing undermines the data we collect
Textbooks promote the economic view:– finding defects early is better than finding them
later– The logic is flawless
But this argument only holds if the “absence of defects” is our key objective
But surely, defects are symptoms, not the underlying problem– Absence of defects is a sign of good work, it’s
not the deliverable– How can “absence” of anything be a meaningful
deliverable though?
©2005 Systeme Evolutif Ltd
Slide 19
Economic argument for early testing is flawed
It is based on fixing symptoms, not the underlying problem, or improving the end deliverable
The argument for using reviews and inspections has traditionally been ‘defect prevention’
But this is nonsense– Inspections and reviews find defects like any other
test activity– The economic argument is based on rework
prevention, not defect prevention Early defects are simply more expensive to
correct if left in products.
©2005 Systeme Evolutif Ltd
Slide 20
Testing is a reactive activity, not proactive
Testing cannot prevent defects – it is reactive, never proactive
This is why we still have to convince management that testing is important
Testing actually corrupts the defect data we collect– If we structure our testing to detect defects before system
and acceptance testing, design defects found in system test are A BAD THING
– bad because of the way we approach dev and test– bad because we need to re-document, redesign, re-test at
unit, integration levels and so on– Self-fulfilling prophesy
Late testing makes the other guys look bad.
©2005 Systeme Evolutif Ltd
Slide 21
Compare that with RAD, DSDM or Agile methods
Little testing done by developers– some might do test-first programming or good
unit testing– but most don’t
Because there is shallow documentation– There is little developer testing– instant response to system/user testing
incidents– cost of finding ‘serious defects’ in
system/acceptance testing is remarkably low.
©2005 Systeme Evolutif Ltd
Slide 22
Economic argument of early testing is smashed
The whole basis for a structured testing discipline is undermined
Traditional metrics don’t support Agile approach, so we say they are undisciplined, unprofessional and incompetent– (It’s hard to sell ISEB courses to these guys!)
Surely we are measuring the wrong things? The data we collect is corrupted by the
processes we follow!
©2005 Systeme Evolutif Ltd
Slide 23
Where to Now with Test Metrics?
©2005 Systeme Evolutif Ltd
Slide 24
Where now with test metrics?
Move away from defects as the principle object of measurement
Move towards ‘information assets’ as the testers deliverable
Defects are a part of that information asset Defect analysis:
– A development task, not a tester’s– Only programmers know how to analyse the
cause of defects and see trends– Defect analyses help developers improve, not
testers (a white box metric)
©2005 Systeme Evolutif Ltd
Slide 25
Where now with test metrics? 2
Testing metrics should be aligned with business metrics (more black box)– Business results/objectives/goals– Intermediate deliverables/goals– Risk– Looking forward to software use, not back
into software construction Need to present metrics in more
accessible, graphical ways.
©2005 Systeme Evolutif Ltd
Slide 26
A New Way to Classify Incidents?
©2005 Systeme Evolutif Ltd
Slide 27
Suppose you were asked to carry fruit
How many Apples could you carry?– I can carry 100
How many Oranges?– I can carry 80
How many Watermelons?– I can carry 7
Assuming you have carrier bags, could you carry 40 apples, 25 oranges and 4 watermelons?
How would you work it out?
©2005 Systeme Evolutif Ltd
Slide 28
Can I carry the fruit? If my carrying capacity is C
– Weight of an apple is C/100– Weight of an Orange is C/80– Weight of a watermelon is C/7
So total weight of the load is:40C 25C 4C100 80 7++ = 1.28C
No, I obviously
can’t carry that load
©2005 Systeme Evolutif Ltd
Slide 29
Acceptable load I don’t know what C is precisely, but
that doesn’t matter– If the load factor is greater than one, I
can’t carry the load Let’s ignore C, then, and just worry
about the acceptable load factor L
L must be less than one
©2005 Systeme Evolutif Ltd
Slide 30
Acceptable load If L is > 1 let’s try and reduce it
– Removing 1 watermelon makes L=1.14– Removing 2 watermelons makes L=0.998– I can now carry the reduced load (just)
I have a measure of the load (L) and a threshold of acceptability (less than one)
I know that removing the heavy items will have the biggest improvement.
©2005 Systeme Evolutif Ltd
Slide 31
Suppose you were asked to accept a system?
How many low severity bugs could you afford?– I can accept 100
How many medium?– I can accept 80
How many high?– I can accept 7
Could you accept 40 low, 25 medium and 4 high?
Could you work it out?
©2005 Systeme Evolutif Ltd
Slide 32
Can I afford (accept) the system with bugs?
If my “bug budget” is B– Cost of a LOW is B/100– Cost of a MEDIUM is B/80– Cost of a HIGH is B/7
So total cost of the bugs is:40B 25B 4B100 80 7++ = 1.28B
No, I obviously
can’t accept those bugs
©2005 Systeme Evolutif Ltd
Slide 33
Acceptable bug cost I don’t know what B is, but that
doesn’t matter– If the total cost of bugs is greater than
one, I can’t accept the system Let’s ignore B, then, and just worry
about the bug COST factor: C
C must be less than one
©2005 Systeme Evolutif Ltd
Slide 34
Calculating cost of bugs If C is > 1 let’s try and reduce it
– Removing 1 HIGH makes C=1.14– Removing 2 HIGHs makes C=0.998– I can now accept the improved system (just)
I have a measure of the cost (C) and a threshold of acceptability (less than one)
I know that removing the HIGH severity bugs will have the biggest improvement.
©2005 Systeme Evolutif Ltd
Slide 35
A useful metric for developers
Now, developers have a numeric score to drive their rework efforts
The can model different change strategies and predict an outcome
They can normalise the cost of correction with the reduction in bug cost.
©2005 Systeme Evolutif Ltd
Slide 36
A useful metric for testers Bugs get a score that is finer grained
that 3 or five level severities No need to worry about borderline cases
as the user can adjust the acceptability factor for bugs
Testers should focus on high COST bugs But not to the exclusion of lower cost
bugs.
©2005 Systeme Evolutif Ltd
Slide 37
Proposal Why not assign THREE
classifications:– Priority– Severity– Bug Cost*
And plot the cost of open bugs over time as well as the number of bugs?
©2005 Systeme Evolutif Ltd
Slide 38
Some Thoughts on MetricsTest Management Forum
Paul GerrardSysteme Evolutif Limited3rd Floor9 Cavendish PlaceLondonW1G 0QDemail: [email protected]://www.evolutif.co.uk