Evaluation,Evaluators and Evaluation Culture

1

Let me begin with a confession. I am not aprofessional evaluator in the sense of beingschooled in the “core discipline of evalua-tion” or in any of its branches. I am not anexpert in any particular evaluation tech-nique. My only credential is a miscellany ofexperience in and exposure to severalnational and international research anddevelopment (R&D) projects over a longperiod. Whatever I have contributed to theseevaluation assignments came from the “gut,”not from any formal evaluation training. Tomake up for my naïveté in the method andsubject matter of the evaluation, I take everyevaluation assignment as a serious, intensivelearning exercise carried out through deep“immersion” in the substance of the reviewbefore the formal evaluation starts. This ishow I manage to learn something new anddifferent every year. I make no apologies fornot being a professional evaluator. After all,there has never been a professional evaluatoron the evaluation teams I have been part of.

As an introduction, allow me to sharewith you some experiences from actualparticipation in external evaluations of R&Dprograms in agriculture, health, and socialdevelopment, particularly with respect to thefollowing aspects: purpose, process, prod-uct, evaluation culture, and evaluators andtheir probity.

PurposeEvery evaluation has a purpose, always setforth in the terms of reference. By the way,evaluations are known by other names, evenif the intent is really to evaluate. The otherwords used are review, assess, audit, scruti-nize, analyze, examine, and investigate.

The most “purposeful” purpose ofevaluation is articulated by the ConsultativeGroup on International Agricultural Re-search (CGIAR). Its Technical AdvisoryCommittee (TAC) commissions regularexternal program and management reviewsof each center, in addition to intercenterreviews of particular research areas, system-wide reviews, stripe reviews of genericissues, project reviews, and even reviews ofthe review process. For this reason, thefollowing statements of purpose may beregarded as applicable to the CGIAR sys-tem.

“The purpose of external reviews is tohelp ensure that the centers continue toimplement strategies and programs that arerelevant to these goals, that they maintain orenhance their record of achievement, andthat they are efficiently managed. In theseways, external reviews reinforce mecha-nisms of accountability within the system.External program reviews and external

Evaluation, Evaluators, and Evaluation Culture

Gelia T. Castillo

2

management reviews are essential compo-nents of the CGIAR’s integrated planningprocess. As autonomous institutions, centersare not obliged to implement the endorsedrecommendations. In practice, however, theyimplement most, if not all, of them”(CGIAR-TAC 1993).

To illustrate how seriously reviewrecommendations are taken, IRRI’s responseto the recommendations of the third reviewwas examined by the fourth review panel,which made the following observation: “Ofthe 37 program recommendations, IRRI hasimplemented 30 in full and five partially;IRRI did not implement two. For ten man-agement recommendations, seven wereimplemented in full and three partially.”

As further instructions, the review panelsare reminded that “Recommendationsshould be justified by the analysis andapproved by panel members. Reports shouldinclude clear endorsements of center activi-ties where appropriate as well as recommen-dations and suggestions for change.”

Every single evaluation I have partici-pated in specified and, in fact, required thatthe evaluation team make recommendations.It is as if an evaluation without a recommen-dation is not an evaluation.

Frankly, recommendations are not easyto write even after brilliant analysis becausethey must be actionable. To suggest some-thing implementable often means that thereview panel is rich and broad in its knowl-edge base in addition to being familiar withthe structure, operations, and subject matterof the program. Furthermore, some recom-mendations have financial, organizational,and personnel implications. Without system-atic consideration of these implications, therecommendations are likely to be hollow.While the analytical part of the evaluation

often requires technical expertise, it is in therecommendations that experience andaccumulated and collective wisdom count.Counsels of perfection that come from theideal world of academe are not likely to betaken seriously because there are not enoughresources anywhere to implement perfection.As one senior minister said to us, “Thetrouble with researchers is that you substi-tute research for wisdom!”

The terms of reference (TOR) for evalu-ation are almost always broad and compre-hensive in scope and multilevel in coverage:from the network, program, or project levelto the household or farmer level, includingexternal linkages; from resource allocation,capability building, and institutional devel-opment to the research process; and fromresearch products and their adoption oradaptation to social and economic impact.The task of external evaluation is not for the“faint-hearted” because the demands aregreat and time is limited.

Even if the TOR do not specify this,sometimes it is important to examine theobjectives of the program or project itselfbecause unrealistic program objectives couldbe the underlying source of “failure.”

When the goals of a research or actionprogram are overambitious or unattainablegiven the overall resource and institutionalconstraints, program leaders and evendonors often appreciate a candid assessmentof “realizibility” because it reduces theprobability of failure and increases theprospects for reducing the gap betweenpromise and performance. Programimplementors are spared the burden ofhaving to admit overpromise. Donors areprepared to be more down-to-earth in theirexpectations. They should not expect tochange the world (only a little bit of it) for$50,000 a year!

3

In this regard, IRRI should examine thenew mission statement approved by theCGIAR at International Centers’ Week 1998(CGIAR 1998). It says: “To contribute tofood security and poverty eradication indeveloping countries through research,partnership, capacity building, and policysupport, promoting sustainable agriculturaldevelopment based on the environmentallysound management of natural resources.”

The CGIAR also asked the centers “toadopt congruent mission statements empha-sizing their functions as global centers offrontier science.”

Research institutions that are usuallyrigorous in setting their own research priori-ties often lose that rigor when defining TORfor the review. The TOR become too broadand comprehensive, particularly when thosewho commission the review add their ownquestions. It would be interesting to reviewevaluation reports to determine the extent towhich the expected breadth and depth ofcoverage has been achieved with quality.After all, these reports (often of limitedcirculation) almost always function as aserious basis for serious courses of actionregarding the program that has been evalu-ated.

Unless an R&D program has an excel-lent monitoring and evaluation componentthat provides systematic and relevant infor-mation, program documentation, and signifi-cant research results on issues to be ad-dressed in the evaluation, the review reliesmuch more on the expert and experiencedjudgment of the evaluators than on theweight of carefully gathered empiricalevidence. Nowadays, when the operativeword is impact, the evidence in this regard israrely, if ever, available. Field visits toobtain a “feel” for impact at the farm,household, and village level sometimes

result in a disproportionately significant“assessment” of impact, especially whenthese field visits are selectively orchestrated.To me, the best parts of a review are theproject site visits, interaction with field-levelresearchers, and information from peoplewho know a great deal about what’s goingon but who are not program managers.

Evaluation objectives depend on thestage in the life of the R&D program, butquite often the impact on intended beneficia-ries and participants is promised too muchand expected too soon. Inevitably, thereview panel comes up with “intermediate”impact—which is really impact on its waybut which is not yet there and might even bederailed in the process.

The following is an example of aninterim verdict on impact at the farmer level:• Technology introduction—yes• Trial adoption by farmers—yes• Continuing adoption—not yetTherefore, impact is still awaited, even if itis forthcoming.

Achim Dobermann’s IRRI seminar on“Site-specific nutrient management inintensive rice systems: on the road to im-pact” (1999) mentioned another example ofanticipated impact. He learned the lesson ofpatience because “on-farm research is onlysuccessful if it has a long-term framework.”The project, which started with systemcharacterization in 1994, is expected to haveextension and adoption at all research sitesin 2004, at which time the project will alsobe transferred to NARS.

Although an assessment of past andpresent achievements is always called for,the prospective part of the review with aneye toward the future often gets the lion’sshare of attention in recommendations. Thenext phase, the next round of funding, the

4

next set of activities, and the continuity ofjobs and of logistical support are containedin the prospective, not retrospective, analy-sis. Program reviews to those who under-stand what they are for are reasons foranxiety or anticipation even to drivers,research assistants, and secretaries. It isnormal for affected individuals to peer intothe drafts of the much-awaited reviewreport.

Evaluators are asked to look into manyprogram attributes: effectiveness, usefulness,strengths, weaknesses, achievements, rel-evance, performance, quality, sustainability,appropriateness, collaborative relationships,linkages, and impact. Every single one ofthese attributes implies some kind of identi-fiable indicator. The genius of a reviewpanel lies in the members’ resourcefulnessand creativity in coming up with someavailable information that can qualify asindicators (qualitative, quantitative, or both)that can be cited in the report as “passable”evidence to address the TOR. The questionthat should be asked is: How adequate is“passable” evidence as evidence in anevaluation when the R&D program beingevaluated is rather exacting in its ownresearch undertakings? Does judgmentsubstitute for evidence when the latter isonly marginally evident whether as a func-tion of stage in the life cycle of the programor of program neglect in monitoring impact?

In one illustrative case, the review teamfound it very difficult to determine impactbecause the program had not paid anyattention to it. The team had lots of informa-tion on inputs and outputs but hardly anyevidence of impact although the researchorganization had been in existence for 30years. After the evaluation, two socialscientists were assigned to find farm-levelimpact of technologies developed throughthe research organization, but they also came

out almost empty-handed. The lack ofverifiable impact may be due to a researchprocess that failed to produce an adoptableproduct that could create an impact or aninability to develop procedures that couldcapture impact. The former is a failure of theresearch process and the latter is a failure ofmonitoring and evaluation (M&E). It is thislatter situation which must be addressed inevaluation capacity development.

The terms of reference are the writtenobjectives of the evaluation and the writtenreport addresses these objectives, but it isnaïve to regard them as the only objectives.There is an explicit agenda, but there mightalso be an implicit agenda. That is theunwritten purpose of the evaluation.

Different groups involved in the programor project being evaluated, such as donors,program managers, research and supportstaff, research collaborators, and intendedbeneficiaries, have their own vested interestsin the outcome of the evaluation and thesemay not all be in harmony. For example, adonor may want to terminate support for aproject and the review could be used as an“objective” rationale for the phase-out.Conversely, it is not unusual for a donorrepresentative to informally say that renewalor extension of a project depends on thereview panel’s recommendation. A positive,even glowing, evaluation report gives creditnot only to the institute or agency but also toprogram leaders. Even when a final phasehas been agreed upon earlier, project staffoften hope that an extension will still bepossible if the review panel comes throughwith a good “report card.”

Evaluations by their very nature areoften threatening to job security, programlife span, and the professional reputation ofprogram leaders. In an institutional setting oflimited resources and competing demands,

5

different stakeholders who want a “piece ofthe pie” may lobby in subtle and sometimesnot too subtle ways to try to influence theresults of the evaluation process in their owninterest. Hence, there are key informants,volunteer informants, defensive informants,offensive informants, and those who aretactfully but deliberately kept out of thepanel’s reach as a form of “damage control.”Sometimes the program leader or seniorresearchers will dwell on certain researchprojects but not on others. Which projects orwhose projects are omitted in the presenta-tions can be as important as those that areincluded, especially when the omissions areglaring. In an evaluation, we have to listennot only to what is being said but also towhat is not being said.

In the case of research collaborators, ifthe project being evaluated is of marginalinterest to the institution in which it islocated, the evaluation will also likely beregarded with some indifference and this initself would be of substantive significance tothe evaluators. One research network thatcontinued for two decades was droppedfrom an institute’s program when networkmembers chose other networks as moreimportant to them.

Needless to say, evaluations are politi-cally sensitive. How to be negative withoutbeing offensive is an art. Some evaluatorsare more “artful” than others.

ProcessThe evaluation process has several steps:

1. Identification and definition of the needfor an evaluation by donors, programleaders, etc. In many instances, for exter-nally funded projects, periodic reviewsare specifically provided for in programdocuments; hence, it is not a matter ofchoice but a matter of when, how, how

much time, and by whom, and allocatingresources for it.

2. Drawing up the terms of reference. Quiteoften these are seen and/or approved bydonors.

3. Choosing the members of the review teamand, more importantly, the chairperson orteam leader.

4. Organizing the evaluation in terms ofdocuments to be reviewed; itinerary forsite visits; schedule of interviews; teammeetings; report-writing; reporting tomanagement and staff; obtaining feed-back, especially with respect to errors offact; report writing and re-writing;cleanup; and report submission.

5. Formal response to the report by theprogram or institution being reviewed.

In preparing for the review, the programstaff do an in-depth review of their ownprogram and go through a visioning exerciseto develop plans for the future.

Other issues are relevant to the reviewprocess besides the TOR:

1. Timing of the review refers not only to“when” but also to “how long.”

Reviews are expensive and any addi-tional day adds to their cost. Total timecould be 4 to 6 weeks, but spread over aperiod of 6 months or even 1 year. Somereviews are undertaken intensively over a 2-or 3-week period. Considering the compre-hensiveness in scope and thoroughnessexpected, this means truly intensive workfrom early morning to late at night, longtrips to project sites, and one interview afteranother. In one review I participated in, wewere in a different hotel almost every nightin two countries over a 2-week period. Inanother review, we crossed the MekongRiver four times and passed over countlessbridges to meet with different groups of

6

farmers and local officials. Such tasksrequire that evaluators be physically fit andmentally alert so as to read lots of docu-ments, ask intelligent questions, listencarefully to answers, and make a reasonableassessment of the situation. Being with amultidisciplinary team helps a great deal inmaking observations and judgments.

The timing of the review is importantnot only because it is better to see crops onthe ground but also because sought-afterteam members are often very busy people.Furthermore, timing could refer to the stagein the life of the project or in the life afterproject completion. For example, in anevaluation of a root-crop research program,site visits after a devastating typhoonshowed how valuable sweet potato can be asa survival crop. But in normal times, sweetpotato has a minor role in the diet of thiscommunity. Sweet potato is now beingconsidered as a crop having real potential forprocessed products; therefore, varietiesdeveloped for this purpose are likely to beadopted whereas before they were rejectedby the fresh market.

In assessing impact, the gestation periodis important. Sometimes a project can bejudged a success because it has not had achance to fail and sometimes it can bejudged a failure because it has not had achance to succeed.

2. The TOR are the basic organizing factorin the review process.

A division of labor is almost alwaysarrived at in order to cover all of the tasksenumerated in the TOR. The nature of thegroup dynamics within the team determineshow much the chairperson has to do alone.Now and then, members do not pull theirweight, leave early before the report hasbeen written, or are too inexperienced to

function at a collegial level with others.Occasionally, we find team members whoactually do a great deal of homework beforethe formal evaluation period begins.

While there is a great deal of merit infunctioning as a team during site visits andinterviews, the time is so limited that theteam has to split up according to expertiseand interests. Although team memberscontribute their share to the writing of thereport, the final outcome and quality of thereport belongs to the chairperson. In the finalanalysis, the report will eventually be knownby the name of the chairperson.

3. The cost of an evaluation is not insignifi-cant, but R&D without evaluation is evenmore expensive.

We are aware of R&D and social devel-opment programs that started with the nobleintention of having a built-in evaluationcomponent in every project, but in manyinstances the cost of the evaluation in finan-cial and human resource terms was such thatit had to be dropped. On the other hand,some programs suffer from overevaluation.One foreign-funded integrated agriculturaldevelopment program in my country literallyhad one external evaluation team in thefield, one at the airport, one in Manila, andone that had just left. The evaluation reportswere not only too many but were also socomplicated that another team had to behired to interpret them. In the meantime,foreign consultants were kept fully em-ployed.

There is also a cost in staff time on thepart of the program being evaluated. Staffmembers have to prepare for the evaluationand spend time anticipating and answeringquestions. This could have its own benefitsbecause program staff have an opportunity

7

to reflect, consolidate, and plan, providedsuch disruptions are not too frequent.

Few of the R&D programs we evaluatedhad an evident, deliberate, and systematicM&E system. Data available in their man-agement information system (if they hadthis) were mostly financial inputs andtraining or publication outputs. Incidentally,even financial inputs were not easy to obtainin a manner that would be useful for evalua-tion. It was most difficult to assess impactexcept in a very judgmental manner becausethe empirical basis for it was not there. Oneprogram based in an industrialized countrydid not even have a good computerizeddatabase of its grantees. It was only after theevaluation that this was done.

One example of data analysis that isuseful for evaluation is the IRRI LibraryStudy, which examined 396 articles pub-lished by IRRI scientists from 1993 to 1997in refereed journals rated by the Institute forscientific information. The analysis showedthat 85.4% of these articles fell into the firstor second quartile journals, while only 15%fell into the last two quartiles. This is oneindication that science at IRRI is of goodquality.

But IRRI cannot rest on these laurels.The new mission statement of the CGIARrequires a different approach if we are toaddress food security, poverty eradication,and sustainability.

4. To trace the roots of impact, we need tounderstand both process and product.

Every evaluation implies some measureof what and how much the project or pro-gram being evaluated has achieved and whatdifference it has made, if any, to someone, asarticulated in the program objectives. Identi-fying and specifying indicators of broad

concepts such as effectiveness, performance,achievements, etc., is the most dauntingaspect of evaluation, particularly becauseTOR are written with the assumption thatthe evaluation team would find indicatorseven if program leaders themselves hardlyever thought of them. Quantitative versusqualitative indicators is not even the issue,but rather what and where can we findpassable or defensible indicators?

As a mental guide to sensitize me tolikely indicators, I have relied all of theseyears on three papers on monitoring andevaluation and impact assessment. Thefollowing concepts offered in these threepapers are particularly appealing to a pedes-trian mind like mine because they are simpleand straightforward.

According to Hinkle (1961), an adequatetheory of change is expected to offer an-swers to these questions:

1. What is it that has changed?2. How much has it changed (extent)?3. How quickly has it changed (rate)?4. What were the conditions before and after

change?5. What occurred during the transition?6. What were the stimuli that induced the

change?7. Through what mechanisms did change

occur?8. What brought stabilization at a particular

point in change?9. Can directionality be observed in the

change?

Answers to these questions would helpus deal with the problem of attribution orhow we can tell whether a perceived impactcan be attributed to the supposed source.With the focus on partnership, the problemof attribution becomes more complicatedand more political.

8

Deboeck (1978), in a World Bankdocument on monitoring and evaluation,provides three concepts that are useful inidentifying measurable project objectives:

1. Project outputs are the physical outcomeof project activities.

2. Project effects are the outcome of in-creased use made of project outputs.

3. Project impact is the change in the stan-dard of living or in the increased capacityfor self-sustained development of a groupof beneficiaries or communities resultingfrom project effects.

The document emphasizes that “someultimate goals of a project can only beachieved by successfully implementing aseries of detailed implementation objectivesregarding project inputs and activities. Thus,a hierarchy of objectives needs to be devel-oped that permits showing of the linkagesbetween project inputs, activities, outputs,and the expected effects and impact result-ing from them.”

To relate these concepts to an integrateddevelopment program, for example, we canhave the following operational indicators:• Project outputs—hectares of irrigated

land, farmers trained, agricultural devel-opment teams organized, roads built,irrigation associations established, etc.

• Project effects—increased production,more people employed, higher prices forfarm products, more farmers adopting anew technology, lower transportationcosts, etc.

• Project impact—higher family income;improved housing, nutrition, and health; amore positive assessment of life condi-tions; lessened inequality between ruraland urban areas; improved income distri-bution, etc. Where increased yields havebeen realized and higher incomesachieved although no irrigation associa-

tions are functioning, we cannot attributethe impact and the effect to a nonexistentoutput, that is, the irrigation association.Therefore, it is necessary to show thelinkages between these three sets ofindicators.

Horton (1988), in writing about theimpact of international agricultural research,introduces the following concepts, types oftechnology, and types of impact:

1. Production technology refers broadly toall methods that farmers, market agents,and consumers use to cultivate, harvest,store, process, handle, transport, andprepare food crops and livestock forconsumption.

2. R&D technology refers to the organiza-tional strategies and methods used byresearch and extension programs inconducting their work. R&D technologiesinclude scientific procedures for geneticengineering, screening germplasm,disease identification and eradication, andrapid multiplication of vegetativelypropagated crops. They also includeorganizational models such as the inte-grated commodity program, and institu-tional strategies for program planning andevaluation, training, networking, on-farmtrials, and interdisciplinary team researchinvolving social and biological scientists.

3. Production impact refers to the physical,social, and economic effects of newcultivation and postharvest methods oncrop and livestock production, distribu-tion, and use and on human welfare ingeneral (including the effects on employ-ment, nutrition, and income distribution).

4. Institutional impact refers to the effects ofnew R&D technology on the capacity ofresearch and programs to generate anddisseminate new production technology.Institutional impact is easier to producethan production impact.

9

These concepts were applied in theevaluation of a human reproduction researchprogram by finding analogous indicators inthe field of human reproduction research.That such analogies can be made shouldencourage cross-learning between agricul-tural research and health research, which atthe moment are still very separate worlds,although they have the same donors and thesame intended beneficiaries.

ProductBesides the analysis of program or projectoutputs, effects, and impact, every evalua-tion exercise carries its own impact, which isseldom “evaluated.” Both the products andby-products of an evaluation must be moni-tored and assessed.

The immediate product of any evaluationexercise is the evaluation report, its analysis,and recommendations. But what appears onpaper is only one part of the evaluationproduct. Because the evaluation process isinteractive, the program acquires greatersalience during the review and stakeholdersbecome more aware of the stakes they have.As one program leader said: “When you askfor their participation, they are indifferentand have no time, but when evaluation timecomes, they complain about not beingincluded.”

In the review process, forthcominganalysis and recommendations are usuallydiscussed with program management, staff,and program collaborators as “trial balloons”for reaction and validations of first impres-sions. Sometimes management wants tointroduce changes into the program but findsresistance from the staff. The evaluationreport, if the panel finds merit, can carrysuch changes in the recommendations togive them a legitimacy or urgency.

Because recommendations are takenseriously, our experience shows that by-products of the evaluation can also beserious or significant, such as:• Resignation of some key staff• Merger of programs• Phasing out of some projects or aspects of

a project• Change in mandate of a research institu-

tion, for example, from a council type oforganization to a hands-on research center

• Continuation, increase, decrease, ortermination of support

• Addition of program coverage in terms ofcrop, country, or research thrust

• Strengthening of program with morefinancial and staff support

• Improvements in program implementa-tion and management

• Demoralization of staff who are nega-tively affected by the report

• A seal of “good housekeeping” for theinstitution that has been positively re-viewed

There are also consequences for theevaluators. Some are never asked again,others make a name for themselves, so theyjoin the “evaluation circuit,” and still othersare invited to join the board, the staff, or thepolicy advisory committee of the programthat has been evaluated. Some experience allof the above after different evaluationassignments.

Repeat evaluators or those who areasked again and again are often those forwhom evaluation is not just a contract but acommitment to the goals of the R&D pro-gram. Such evaluators become credibleadvocates because they have had an opportu-nity to see the program, warts and all, wherethe good news outweighs the bad news andthe cause of human welfare promises to beserved. Of course, to an evaluator, the best

10

“ego trip” comes from an evaluation reportthat is not only accepted but is followedthrough in all of its recommendations evenwithout pressure from donors. The worst egotrip is when program managers and boardsrefuse to accept the content of the report, letalone publish it. I’ve seen all of these.

Evaluation cultureEvaluation, by its very nature, particularlyexternal evaluation, is a threatening exer-cise, no matter how sophisticated the staff ofthe program being evaluated is. Programleaders hardly smile until they have seen thereport. Evaluation can be expensive depend-ing upon how it is done and by whom. But itis difficult to imagine why an R&D programshould continuously be funded when it hasnot been evaluated in some way.

Right from the conceptualization of anR&D program, the idea of evaluation mustbe built into it. Program goals must beformulated realistically and implemented inan anticipatory way, always asking thequestion: “How do we know when we havereached our goal?” For example, if povertyeradication is the goal, how do we definepoverty? Which poor do we have in mindand how will they be reached directly orindirectly? How will integrated gene man-agement and integrated natural resourcesmanagement have an impact on poverty?The road to poverty eradication starts withnoble intentions, but the long and tortuousroute between the intent and the deed leadsto detours, even dead ends, where those inneed wait and “those who have, get.” But Ibelieve that rice is the bottomline of foodsecurity for the poor, whether at the house-hold or national level. The issue is how totranslate the science and practice of inte-grated gene management and integratednatural resources management into more ricefor those who need it most—the poor.

Technologies generated but not adopted cannever have an impact, so adoption-diffusionstudies are important, if we have technolo-gies. Where are the success stories? I believethat if we can be honest about our failures,our successes can be more credible.

An evaluation culture means that atten-tion is being paid to a system of generatinginformation about the program and docu-menting events and progress so that usefulevaluation can be made. Evaluation keeps usfocused on our intent and our route. Nowa-days, participatory M&E has become fash-ionable where stakeholders are involved indeveloping and choosing indicators fromtheir different perspectives and in assessingwhat has been achieved. The social learningprocess supposedly leads to improvedprogram effectiveness and a sense of owner-ship among stakeholders.

In an R&D environment where even peerreview is not well received and criticism istaken as a personal attack against one’sreputation, an evaluation culture is not likelyto prosper. I hope, however, that the devel-opment of an evaluation culture will be partof capacity building, research partnerships,and training in graduate school. Where thereis no evaluation culture, evaluations will notbe carried out, evaluation reports will notsee the light of day, evaluators will beostracized in subtle and not so subtle ways,recommendations will be discredited insteadof answered, and a defensive rather thanconstructive attitude toward the report willprevail. These weaknesses, we discovered,also occur in R&D organizations, even indeveloped countries.

In the past, my hypothesis was thatwhere science flourishes, evaluation iswelcome, but where science has to betranslated into impact on poverty eradication

11

and food security, even developed-countryscientists become insecure.

Evaluators and their probityIf McLuhan says, “The medium is themessage,” in evaluation we can almost say,“The evaluator is the evaluation.” Forexample, as a basis for selecting the chair-person of its review team, one researchinstitute has put out the following criteria:

Essential• Broad vision• Knowledge of global issues about the

mandate of the institute• International reputation• Objectivity• Knowledge of the system of which the

institute is a part• Leadership• Review experience• Productive under stress

Desirable• Previous review experience• Good writing and presentation skills• Developing-country national

Less important• Disciplinary background

To provide guidance in the selection ofevaluators, TOR often include words such asindependent, dedicated, scientific honesty,eminent, distinguished, experienced, top-caliber, high-level expertise, respected, etc.Some exclusions are also mentioned, such asthose who have received support from theprogram or are serving or have served on theprogram’s board or advisory committees.

Some R&D programs insist that thechairperson of a review team must havebeen a reviewer of a similar institute. Re-viewers are “recycled” because of a reputa-tion they have acquired as “tough”; hence, a

program that has undergone a “tough”review can be proud of being given a cleanbill of health. A “soft” review, especially ifvery positive, may actually do more damageto the program being reviewed. Manyprograms would rather not subject them-selves to the risks of a “soft” review; there-fore, some names appear repeatedly asreviewers.

But in real life, who are the evaluators?They are almost always chosen by the board,the advisory committee, or the managementof the program being evaluated, often withthe approval of the donors if there are any.Names usually come by word of mouth,through people to whom one is known. Inmy experience, there have been big-namescientists (biophysical as well as biomedi-cal), well-known socioeconomists, ex-ministers of agriculture, health, finance, anddevelopment planning, deans, directors ofresearch institutes, vice-chancellors ofuniversities, an ex-vice-president of theWorld Bank, an ex-president of a largedevelopment agency, an obstetrician-gyne-cologist of the British royal family, thechairman of the Nobel Prize committee, etc.I’ve had the pleasure of working with twoveterinarians who never spoke about animaldiseases even when asked to. In other words,most review team members I have workedwith have a broad perspective; they thinkbeyond their field of specialization and canfunction as a team member. Rarely did wehave someone who could only answer to twopages of his discourse on genetics. Once wehad a famous scientist whose specializationwas the Drosophila (fruit flies). It was afascinating subject, but it did not do muchfor our report.

Until now, I cannot tell you how tochoose review team members. Quite often, itis their qualities as human beings, on top oftheir experience and expertise, that really

12

make a difference because working with areview team is an intense interpersonalactivity.

An evaluator must be a fast reader, aquick learner, and a patient listener with agreat deal of physical stamina because thedays are long, the evenings are short, andthere are not enough days in the week to doall the things that need to be done. Becauseof the time and substantive pressures associ-ated with the production of a credible anddefensible report, I would add two otherqualities of an evaluator: a sense of humorand the humility to say “I don’t know.”Sometimes we take ourselves so seriouslythat we believe everything we say. I alwaysapproach the review job with the expectationthat it will be fun, and most reviews havebeen fun. By the way, these review teams arevery male-dominated.

ConclusionsEvaluation is done with a purpose (writtenand unwritten), often full of earth-shakingexpectations. It is a physically, intellectually,and professionally demanding task thatrequires intense interpersonal relations andthe ability to listen not only to what is beingsaid but also to what is not being said. Whenevaluation culture is underdeveloped, docu-mentation for evaluation is inadequate, andempirical evidences of impact have yet topresent themselves, evaluations rely mainlyon authoritative judgments of the evaluators.Because evaluations and evaluators wieldpower through these authoritative judg-ments, the process and the producers of anevaluation must be characterized by probity,which the dictionary defines as “uncompro-mising adherence to the highest principlesand ideals; unimpeachable integrity.”

This quality becomes all the moreimportant because evaluation results comein the shape of inverted pyramids, some

more “inverted” than others. Let me illus-trate:

ReferencesCGIAR-TAC (Consultative Group on

International Agricultural Research-Technical Advisory Committee). 1993.Report of the Fourth External Programand Management Review of the Interna-tional Rice Research Institute. TechnicalAdvisory Committee Secretariat, FAO,Rome, Italy.

CGIAR (Consultative Group on Interna-tional Agricultural Research). 1998.Shaping the CGIAR’s future. Washing-ton, D.C. (USA): CGIAR.

Deboeck GJ. 1978. Systems for monitoringand evaluating nutritional interventions.Washington, D.C. (USA): World Bank.

Dobermann A. 1999. Site-specific nutrientmanagement in intensive rice systems:on the road to impact. International RiceResearch Institute Seminar, February1999.

Hinkle RC. 1961. Howard Becker’s ap-proach to social change. Sociol. Quart.2(3):155-180.

What we need to do

What we want to do

What we say we are goingto do

What we can do

What we are actuallydoing

What we havedone

13

Horton DE. 1988. Assessing the impact ofinternational research: concepts andchallenges. Paper presented at ISNAR/Rutgers Agricultural Technology Man-agement Workshop on Methods forAssessing Research Impact and forDiagnosing Research System Con-straints, New Brunswick, New Jersey.

NotesGelia T. Castillo is a consultant at IRRI;academician, National Academy of Scienceand Technology; and professor emeritus,University of the Philippines Los Baños.

Evaluation,Evaluators and Evaluation Culture

Documents

Transcript of Evaluation,Evaluators and Evaluation Culture