©2005 Systeme Evolutif LtdSlide 1 Some Thoughts on Metrics Test Management Forum Paul Gerrard...

©2005 Systeme Evolutif Ltd

Slide 1

Some Thoughts on MetricsTest Management Forum

Paul GerrardSysteme Evolutif Limited3rd Floor9 Cavendish PlaceLondonW1G 0QDemail: [email protected]://www.evolutif.co.uk


Slide 2

I’m a Metrics Sceptic


Slide 3

A great book on metrics “The Tyranny of Numbers – why counting

can’t make us happy” by David Boyle– Nothing to do with software– More to with government statistics– Written in the same spirit as “How to Lie with

Statistics” – another counting classic I’ve appeared to be fairly negative about

metrics in the past Not true – its blind faith in metrics I don’t like!


Slide 4

“What matters, we cannot count, so let’s count what does not matter”

A lovely quote from an economist called Chambers (having a dig at economists)

I’ve changed it to reflect a tester’s mantra:– “Testers have come to feel

What can’t be measured, isn’t realThe truth is always an amountCount defects, only defects count.”


Slide 5

Some problems with metrics

Too often, numbers, and counting, being incontrovertible, are regarded as ‘absolute truth’

Numbers are incontrovertible, but the object being counted may be subjective

The person who collects the raw data for counting isn’t usually independent (and has an ‘agenda’)

There is a huge amount of filtering going on:– by individuals– but also, the processes we use, by definition, are

selective.


Slide 6

When I were a lad (development team leader)…

I collected all sorts of metrics to do with code To help manage my team and their activities,

I counted lines of code, code delivered, module size, module rates of change, fault location and type, fan-in, fan-out, and other statically detectible measures as well as costs allocated to specific tasks, lifted from a simple time recording system (that I wrote specifically to see where time went).


Slide 7

When I were a lad (development team leader)…2

I used them to justify:– buying tools, changing my team’s behaviour and

attitude to standards and other development practices as well as justifying my team’s existence through their productivity

– (Other teams didn’t have metrics, so we were, by definition, infinitely more productive)

Metrics are extremely useful as a political tool Metrics (statistics) are probably the most

useful tool of politics. Ask any politician! I knew I was collecting ‘good stuff’ but some of

it was to be taken more seriously than others.


Slide 8

Counting Defects is Misleading


Slide 9

My biggest objection… Counting defects is misleading, by definition A gruesome analogy (body count):

– to measure progress in a military campaign– not a good measure of how successful your campaign has

been A body count gives you the following information:

– opponent’s forces have been diminished by a certain number

» But what was it to start with?» How are the enemy recruiting more participants?» No one knows

The count represents the number of participants who are no longer in the campaign. So by definition, they don’t count anymore.


Slide 10

Body count The body count could be used to

measure our efficiency of killing But, is killing efficiency a good way to

measure progress in a campaign intended to capture territory, enemy assets, hearts and minds?

Hardly - dead people are a consequence, a tragic side issue, not the objective itself.


Slide 11

Defect/bug count Defect count gives us the following

information:– the number of defects have been diminished by

a certain number– but what was it to start with?– How are the developers (the enemy? Hehe)

injecting more defects?– No one knows – they certainly don’t

Predictive metrics are unreliable because of software languages, people, knock-on effects, coupling etc. etc.


Slide 12

Defect/bug count 2 Defect count of defects removed from the

system– By definition…they don’t count anymore– Bugs left in don’t count because they are trivial

The count can be used to measure test “efficiency”– But is “defect detection efficiency” a good way to

measure progress in a project intended to deliver functionality, business benefits, cost savings?

– NO! Defects are a consequence, a tragic side issue, and an inevitability, not the end itself

Need I go on?


Slide 13

Counting defects sends the wrong message to testers and management

If the only thing we count and take seriously is defects, we are telling testers that the only thing that counts is defects

All they ever do is look for defects All management think testers do is

find defects But what to managers want?


Slide 14

Managers want… To know the status of deliverables, what works,

what doesn’t They want… and want it NOW…:

– demonstration that software works– confidence that software is usable

Defects are an inevitable consequence of development and testing, but not the prime objective– They are a tactical challenge

Defects are a practitioner issue all the time– But not a serious management issue unless defects block

release and a decision is required to unblock the project Most of the time, test metrics are irrelevant.


Slide 15

Purpose of Testing, Purpose of Early Testing…

are Flawed


Slide 16

Myers has a lot to answer for

Myers advanced testing a few years when he defined the purpose of testing in 1978

But that flawed definition has held us back since 1983!

The defects we count aren’t representative Typically, system and acceptance test defects are

counted We recommend that all defects are counted

– But that’s hardly possible Even if we tried we couldn’t count them all The vast majority are corrected as they are created Finding bugs is a tactical objective, not strategic.


Slide 17

Most defects corrected before they have an impact

When we write a document, code or a test plan, we correct the vast majority of our mistakes instantly– never find their way into the review or test– vast majority of defects not found by “testing” at all

Testing only detects the most obscure faults But we use the metrics based on the obscure

defects to generalise and steer our testing activities Surely, this isn’t sensible? Only if we consider defects in all their various

manifestations, can we promote general theories of how testing can be improved.


Slide 18

Our approach to testing undermines the data we collect

Textbooks promote the economic view:– finding defects early is better than finding them

later– The logic is flawless

But this argument only holds if the “absence of defects” is our key objective

But surely, defects are symptoms, not the underlying problem– Absence of defects is a sign of good work, it’s

not the deliverable– How can “absence” of anything be a meaningful

deliverable though?


Slide 19

Economic argument for early testing is flawed

It is based on fixing symptoms, not the underlying problem, or improving the end deliverable

The argument for using reviews and inspections has traditionally been ‘defect prevention’

But this is nonsense– Inspections and reviews find defects like any other

test activity– The economic argument is based on rework

prevention, not defect prevention Early defects are simply more expensive to

correct if left in products.


Slide 20

Testing is a reactive activity, not proactive

Testing cannot prevent defects – it is reactive, never proactive

This is why we still have to convince management that testing is important

Testing actually corrupts the defect data we collect– If we structure our testing to detect defects before system

and acceptance testing, design defects found in system test are A BAD THING

– bad because of the way we approach dev and test– bad because we need to re-document, redesign, re-test at

unit, integration levels and so on– Self-fulfilling prophesy

Late testing makes the other guys look bad.


Slide 21

Compare that with RAD, DSDM or Agile methods

Little testing done by developers– some might do test-first programming or good

unit testing– but most don’t

Because there is shallow documentation– There is little developer testing– instant response to system/user testing

incidents– cost of finding ‘serious defects’ in

system/acceptance testing is remarkably low.


Slide 22

Economic argument of early testing is smashed

The whole basis for a structured testing discipline is undermined

Traditional metrics don’t support Agile approach, so we say they are undisciplined, unprofessional and incompetent– (It’s hard to sell ISEB courses to these guys!)

Surely we are measuring the wrong things? The data we collect is corrupted by the

processes we follow!


Slide 23

Where to Now with Test Metrics?


Slide 24

Where now with test metrics?

Move away from defects as the principle object of measurement

Move towards ‘information assets’ as the testers deliverable

Defects are a part of that information asset Defect analysis:

– A development task, not a tester’s– Only programmers know how to analyse the

cause of defects and see trends– Defect analyses help developers improve, not

testers (a white box metric)


Slide 25

Where now with test metrics? 2

Testing metrics should be aligned with business metrics (more black box)– Business results/objectives/goals– Intermediate deliverables/goals– Risk– Looking forward to software use, not back

into software construction Need to present metrics in more

accessible, graphical ways.


Slide 26

A New Way to Classify Incidents?


Slide 27

Suppose you were asked to carry fruit

How many Apples could you carry?– I can carry 100

How many Oranges?– I can carry 80

How many Watermelons?– I can carry 7

Assuming you have carrier bags, could you carry 40 apples, 25 oranges and 4 watermelons?

How would you work it out?


Slide 28

Can I carry the fruit? If my carrying capacity is C

– Weight of an apple is C/100– Weight of an Orange is C/80– Weight of a watermelon is C/7

So total weight of the load is:40C 25C 4C100 80 7++ = 1.28C

No, I obviously

can’t carry that load


Slide 29

Acceptable load I don’t know what C is precisely, but

that doesn’t matter– If the load factor is greater than one, I

can’t carry the load Let’s ignore C, then, and just worry

about the acceptable load factor L

L must be less than one


Slide 30

Acceptable load If L is > 1 let’s try and reduce it

– Removing 1 watermelon makes L=1.14– Removing 2 watermelons makes L=0.998– I can now carry the reduced load (just)

I have a measure of the load (L) and a threshold of acceptability (less than one)

I know that removing the heavy items will have the biggest improvement.


Slide 31

Suppose you were asked to accept a system?

How many low severity bugs could you afford?– I can accept 100

How many medium?– I can accept 80

How many high?– I can accept 7

Could you accept 40 low, 25 medium and 4 high?

Could you work it out?


Slide 32

Can I afford (accept) the system with bugs?

If my “bug budget” is B– Cost of a LOW is B/100– Cost of a MEDIUM is B/80– Cost of a HIGH is B/7

So total cost of the bugs is:40B 25B 4B100 80 7++ = 1.28B

No, I obviously

can’t accept those bugs


Slide 33

Acceptable bug cost I don’t know what B is, but that

doesn’t matter– If the total cost of bugs is greater than

one, I can’t accept the system Let’s ignore B, then, and just worry

about the bug COST factor: C

C must be less than one


Slide 34

Calculating cost of bugs If C is > 1 let’s try and reduce it

– Removing 1 HIGH makes C=1.14– Removing 2 HIGHs makes C=0.998– I can now accept the improved system (just)

I have a measure of the cost (C) and a threshold of acceptability (less than one)

I know that removing the HIGH severity bugs will have the biggest improvement.


Slide 35

A useful metric for developers

Now, developers have a numeric score to drive their rework efforts

The can model different change strategies and predict an outcome

They can normalise the cost of correction with the reduction in bug cost.


Slide 36

A useful metric for testers Bugs get a score that is finer grained

that 3 or five level severities No need to worry about borderline cases

as the user can adjust the acceptability factor for bugs

Testers should focus on high COST bugs But not to the exclusion of lower cost

bugs.


Slide 37

Proposal Why not assign THREE

classifications:– Priority– Severity– Bug Cost*

And plot the cost of open bugs over time as well as the number of bugs?


Slide 38

Some Thoughts on MetricsTest Management Forum

Paul GerrardSysteme Evolutif Limited3rd Floor9 Cavendish PlaceLondonW1G 0QDemail: [email protected]://www.evolutif.co.uk

©2005 Systeme Evolutif LtdSlide 1 Some Thoughts on Metrics Test Management Forum Paul Gerrard...

Documents

Transcript of ©2005 Systeme Evolutif LtdSlide 1 Some Thoughts on Metrics Test Management Forum Paul Gerrard...