Concept of Evaluation and Its Principles

114
CONCEPT OF EVALUATION AND ITS PRINCIPLES Concepts of testing, measurement, assessment and evaluation Testing An instrument or systematic procedure for measuring a sample of behaviour by posing a set of questions in a uniform manner. Because a test is a form of assessment, tests also answer the question, “How well does the individual perform – either in comparison with others or in comparison with a domain of performance tasks? “ Measurement The process of obtaining a numerical description of the degree to which an individual possesses a particular characteristic. Measurement answers the question, “How much?”. Example:- When the teacher calculates the percentage of problems of student Geetha has correct, and gives her a score of 70/100, that is measurement. Test is used to gather information. That information is presented in the form of measurement. That measurement is then used to make evaluation. Assessment Any of a variety of procedures used to obtain information about student performance includes traditional paper and pencil tests as well as extended responses (e.g. essays) and performances of authentic tasks (e.g. laboratory experiments). Assessment answers the question, “How well does the individual perform?”. Note that term assessment is used to mean the same as evaluation. Concept of Evaluation Evaluation has a wider meaning. It goes beyond measurement. When from useful information including measurement, we make a judgement, that is evaluation. Example:- The teacher may evaluate the student Geetha that she is doing well in mathematics, because most of the class scored 50/100. This is an example of evaluation using quantitative data (measurable information). The teacher might also make an evaluation based on qualitative data, such as her observations that Geetha works hard, has an enthusiastic attitude towards mathematics and finishes her assignments quickly. Evaluation is a Science of providing information for decision making. It Includes measurement, assessment and testing It is a process that involves Information gathering Information processing Judgement forming Decision making

description

Concept of Evaluation and Its PrinciplesConcept of Evaluation and Its PrinciplesConcept of Evaluation and Its Principles

Transcript of Concept of Evaluation and Its Principles

CONCEPT OF EVALUATION AND ITS PRINCIPLESConcepts of testing, measurement, assessment and evaluationTestingAn instrument or systematic procedure for measuring a sample of behaviour by posing a set of questions in a uniform manner. Because a test is a form of assessment, tests also answer the question, How well does the individual perform either in comparison with others or in comparison with a domain of performance tasks? Measurement The process of obtaining a numerical description of the degree to which an individual possesses a particular characteristic. Measurement answers the question, How much?. Example:- When the teacher calculates the percentage of problems of student Geetha has correct, and gives her a score of 70/100, that is measurement. Test is used to gather information. That information is presented in the form of measurement. That measurement is then used to make evaluation.AssessmentAny of a variety of procedures used to obtain information about student performance includes traditional paper and pencil tests as well as extended responses (e.g. essays) and performances of authentic tasks (e.g. laboratory experiments). Assessment answers the question, How well does the individual perform?. Note that term assessment is used to mean the same as evaluation.Concept of EvaluationEvaluation has a wider meaning. It goes beyond measurement. When from useful information including measurement, we make a judgement, that is evaluation. Example:- The teacher may evaluate the student Geetha that she is doing well in mathematics, because most of the class scored 50/100. This is an example of evaluation using quantitative data (measurable information). The teacher might also make an evaluation based on qualitative data, such as her observations that Geetha works hard, has an enthusiastic attitude towards mathematics and finishes her assignments quickly. Evaluation is a Science of providing information for decision making. It Includes measurement, assessment and testing It is a process that involves Information gathering Information processing Judgement forming Decision makingFrom the above, we can arrive at the following concept of evaluationEvaluation is a concept that has emerged as a prominent process of assessing, testing and measuring. Its main objective is Qualitative Improvement.Evaluation is a process of making value judgements over a level of performance or achievement. Making value judgements in Evaluation process presupposes the set of objectivesEvaluation implies a critical assessment of educative process and its outcome in the light of the objectives.

Purposes of Evaluation Evaluation is the process of determining the extent to which the objectives are achieved. It is Concerned not only with the appraisal of achievement, but also with its improvement. Evaluation is a continuous and a dynamic process. Evaluation helps in forming the following decisions.

Types of Decisions Instructional Curricular Selection Placement or Classification PersonalAmong the above decisions, we shall learn how evaluation assists a teacher in taking instructional decisions. Evaluation assists in taking certain instructional decisions like:1. 1. to what extent students are ready for learning experience?2. to what extent they can cope with the pace of Learning Experiences provided?3. How the individual differences within the group can be tackled?4. What are the learning problems of the students?5. What is the intensity of such problems?6. What modifications are needed in the instruction to suit the needs of students, etc.Evaluation is an integral part of teaching and learning process. This is explained in the following figure.

What should teachers learn about evaluation?1. Choosing evaluation methods appropriate for instructional decisions;2. developing methods appropriate for instructional decisions;3. administering, scoring and interpreting the results of both externally produced and teacher-produced evaluation methods;4. using evaluation results when making decisions about individual students, planning teaching, developing curriculum and school improvement,5. developing valid pupil grading procedures, which use pupil assessments,6. communicating evaluation results to students, parents, other lay audiences, and other educators;7. recognizing unethical, illegal and otherwise inappropriate evaluation methods and uses of evaluation information.

Definition and Types of EvaluationEvaluation consists of objective assessment of a project, programme or policy at all of its stages, i.e. planning, implementation and measurement of outcomes. It should provide reliable and useful information allowing to apply the knowledge thus obtained in the decision making process. It often concerns the process of determination of the value or importance of a measure, policy or programme.According to the above indicated Regulation, the aim of evaluation is to improve: the quality, effectiveness and consistency of the assistance from the Funds and the strategy and implementation of operational programmes with respect to the specific structural problems affecting the Member States and regions concerned, while taking account of the objective of sustainable development and the relevant Community legislation concerning environmental impact and strategic environmental assessment.Evaluation as a process of systematic assessment of interventions financed from structural funds is continuously gaining increasing importance. In the programming period 2007-2013 the results of evaluation studies will play an important role in the process of shaping of the cohesion policy of the European Union, and during the debate on the next budget following after the current financial perspective after 2013 will belong to key arguments in favour of preserving it in the existing shape or of the verification of its presumptions.Categories of EvaluationAccording to the criterion of the purpose of evaluation, it is classified into the following categories: Strategic evaluation (with the purpose to assess and analyse the evolution of NSRF and OP with respect to national and Community priorities); Operational evaluation (with the purpose to support the process of NSRF and OP monitoring).Strategic evaluationconcerns mainly the analysis and assessment of interventions at the level of strategic goals. The object of strategic evaluation consists of the analysis and appraisal of the relevance of general directions of interventions determined at the programming stage. One of the significant aspects of strategic evaluation consists of the verification of the adopted strategy with respect to the current and anticipated social and economic situation.Operational evaluationis closely linked to the process of NSRF and OP management and monitoring. The purpose of operational evaluation consists of providing support to the institutions responsible for the implementation of NSRF and OP with regards to the achievement of the assumed operational objectives by providing practically useful conclusions and recommendations. According to Regulation 1083/2006, operational evaluation should be carried out, in particular, in the case when monitoring has revealed significant deviations from the originally assumed objectives and when requests are submitted for the review of an operational programme or its part.From the point of view of timing of the performed evaluation, it is classified into the following types: ex ante evaluation (prior to the launch of NSRF or OP implementation), ongoing evaluation (in the course of NSRF or OP implementation), ex post evaluation (after completion of NSRF or OP implementation).The process of ex ante evaluation of NSRF and OP was completed in the year 2006. The results of ex-ante evaluations of NSRF and OP performed by external evaluators were taken into account in the final version of the National Strategic Reference Framework for 2007-2013 and of the different particular Operational Programmes.

Ex post evaluation is done by the European Commission in cooperation with the member states and with the Management Bodies. Regardless of the evaluation conducted by the European Commission, member states may perform ex post evaluation on their own account.Ongoing evaluation is a process with the purpose of arriving at better understanding of the current outcomes of intervention and the formulation of recommendations that would be useful from the point of view of programme implementation. In the next few years ongoing evaluation will become key for the effective cohesion policy implementation in Poland.

Evaluation Approaches & Types

There are various types of evaluations but two main philosophical approaches:formativeandsummative. After a brief introduction to these two approaches, we shall share several specific types of evaluations that fall under the formative and summative approaches.Formative evaluation is an on-going process that allows for feedback to be implemented during a program cycle.Formative evaluations (Boulmetis & Dutwin, 2005): Concentrate on examining and changing processes as they occur Provide timely feedback about program services Allow you to make program adjustments on the fly to help achieve program goalsCOMMON TYPES OF FORMATIVE EVALUATION Needs assessment determines who needs the program, how great the need is, and what might work to meet the need. Structured conceptualization helps stakeholders define the program, the target population, and the possible outcomes. Implementation evaluation monitors the fidelity of the program delivery. Process evaluation investigates the process of delivering the program, including alternative delivery procedures.Summative evaluation occurs at the end of a program cycle and provides an overall description of program effectiveness. Summative evaluation examines program outcomes to determine overall program effectiveness. Summative evaluation is a method for answering some of the following questions: Were your program objectives met? Will you need to improve and modify the overall structure of the program? What is the overall impact of the program? What resources will you need to address the programs weaknesses?Summative evaluation will enable you to make decisions regarding specific services and the future direction of the program that cannot be made during the middle of a program cycle. Summative evaluations should be provided to funders and constituents with an interest in the program.COMMON TYPES OF SUMMATIVE EVALUATION Goal-based evaluation determines if the intended goals of a program were achieved. Has my program accomplished its goals? Outcome evaluation investigate whether the program caused demonstrable effects on specifically defined target outcomes. What effect does program participation have on students? Impact evaluation is broader and assesses the overall or net effects intended or unintended of the program. What impact does this program have on the larger organization (e.g., high school or college), community, or system? Cost-effectiveness and cost-benefit analysis address questions of efficiency by standardizing outcomes in terms of their dollar costs and values. How efficient is my program with respect to cost?Below is a figure depicting the different ways formative and summative evaluation can be utilized.

Introduction to EvaluationEvaluation is a methodological area that is closely related to, but distinguishable from more traditional social research. Evaluation utilizes many of the same methodologies used in traditional social research, but because evaluation takes place within a political and organizational context, it requires group skills, management ability, political dexterity, sensitivity to multiple stakeholders and other skills that social research in general does not rely on as much. Here we introduce the idea of evaluation and some of the major terms and issues in the field.Definitions of EvaluationProbably the most frequently given definition is:Evaluation is the systematic assessment of the worth or merit of some objectThis definition is hardly perfect. There are many types of evaluations that do notnecessarilyresult in an assessment of worth or merit -- descriptive studies, implementation analyses, and formative evaluations, to name a few. Better perhaps is a definition that emphasizes the information-processing and feedback functions of evaluation. For instance, one might say:Evaluation is the systematic acquisition and assessment of information to provide useful feedback about some objectBoth definitions agree that evaluation is asystematicendeavor and both use the deliberately ambiguous term 'object' which could refer to a program, policy, technology, person, need, activity, and so on. The latter definition emphasizesacquiring and assessing informationrather thanassessing worth or meritbecause all evaluation work involves collecting and sifting through data, making judgements about the validity of the information and of inferences we derive from it, whether or not an assessment of worth or merit results.The Goals of EvaluationThe generic goal of most evaluations is to provide "useful feedback" to a variety of audiences including sponsors, donors, client-groups, administrators, staff, and other relevant constituencies. Most often, feedback is perceived as "useful" if it aids in decision-making. But the relationship between an evaluation and its impact is not a simple one -- studies that seem critical sometimes fail to influence short-term decisions, and studies that initially seem to have no influence can have a delayed impact when more congenial conditions arise. Despite this, there is broad consensus that the major goal of evaluation should be to influence decision-making or policy formulation through the provision of empirically-driven feedback.Evaluation Strategies'Evaluation strategies' means broad, overarching perspectives on evaluation. They encompass the most general groups or "camps" of evaluators; although, at its best, evaluation work borrows eclectically from the perspectives of all these camps. Four major groups of evaluation strategies are discussed here.Scientific-experimental modelsare probably the most historically dominant evaluation strategies. Taking their values and methods from the sciences -- especially the social sciences -- they prioritize on the desirability of impartiality, accuracy, objectivity and the validity of the information generated. Included under scientific-experimental models would be: the tradition of experimental and quasi-experimental designs; objectives-based research that comes from education; econometrically-oriented perspectives including cost-effectiveness and cost-benefit analysis; and the recent articulation of theory-driven evaluation.The second class of strategies aremanagement-oriented systems models. Two of the most common of these arePERT, theProgramEvaluation andReviewTechnique, andCPM, theCriticalPathMethod. Both have been widely used in business and government in this country. It would also be legitimate to include the Logical Framework or "Logframe" model developed at U.S. Agency for International Development and general systems theory and operations research approaches in this category. Two management-oriented systems models were originated by evaluators: theUTOSmodel whereUstands for Units,Tfor Treatments,Ofor Observing Observations andSfor Settings; and theCIPPmodel where theCstands for Context, theIfor Input, the firstPfor Process and the secondPfor Product. These management-oriented systems models emphasize comprehensiveness in evaluation, placing evaluation within a larger framework of organizational activities.The third class of strategies are thequalitative/anthropological models. They emphasize the importance of observation, the need to retain the phenomenological quality of the evaluation context, and the value of subjective human interpretation in the evaluation process. Included in this category are the approaches known in evaluation as naturalistic or 'Fourth Generation' evaluation; the various qualitative schools; critical theory and art criticism approaches; and, the 'grounded theory' approach of Glaser and Strauss among others.Finally, a fourth class of strategies is termedparticipant-oriented models. As the term suggests, they emphasize the central importance of the evaluation participants, especially clients and users of the program or technology. Client-centered and stakeholder approaches are examples of participant-oriented models, as are consumer-oriented evaluation systems.With all of these strategies to choose from, how to decide? Debates that rage within the evaluation profession -- and they do rage -- are generally battles between these different strategists, with each claiming the superiority of their position. In reality, most good evaluators are familiar with all four categories and borrow from each as the need arises. There is no inherent incompatibility between these broad strategies -- each of them brings something valuable to the evaluation table. In fact, in recent years attention has increasingly turned to how one might integrate results from evaluations that use different strategies, carried out from different perspectives, and using different methods. Clearly, there are no simple answers here. The problems are complex and the methodologies needed will and should be varied.Types of EvaluationThere are many different types of evaluations depending on the object being evaluated and the purpose of the evaluation. Perhaps the most important basic distinction in evaluation types is that betweenformativeandsummativeevaluation. Formative evaluations strengthen or improve the object being evaluated -- they help form it by examining the delivery of the program or technology, the quality of its implementation, and the assessment of the organizational context, personnel, procedures, inputs, and so on. Summative evaluations, in contrast, examine the effects or outcomes of some object -- they summarize it by describing what happens subsequent to delivery of the program or technology; assessing whether the object can be said to have caused the outcome; determining the overall impact of the causal factor beyond only the immediate target outcomes; and, estimating the relative costs associated with the object.Formative evaluationincludes several evaluation types: needs assessmentdetermines who needs the program, how great the need is, and what might work to meet the need evaluability assessmentdetermines whether an evaluation is feasible and how stakeholders can help shape its usefulness structured conceptualizationhelps stakeholders define the program or technology, the target population, and the possible outcomes implementation evaluationmonitors the fidelity of the program or technology delivery process evaluationinvestigates the process of delivering the program or technology, including alternative delivery proceduresSummative evaluationcan also be subdivided: outcome evaluationsinvestigate whether the program or technology caused demonstrable effects on specifically defined target outcomes impact evaluationis broader and assesses the overall or net effects -- intended or unintended -- of the program or technology as a whole cost-effectiveness and cost-benefit analysisaddress questions of efficiency by standardizing outcomes in terms of their dollar costs and values secondary analysisreexamines existing data to address new questions or use methods not previously employed meta-analysisintegrates the outcome estimates from multiple studies to arrive at an overall or summary judgement on an evaluation questionEvaluation Questions and MethodsEvaluators ask many different kinds of questions and use a variety of methods to address them. These are considered within the framework of formative and summative evaluation as presented above.In formative research the major questions and methodologies are:What is the definition and scope of the problem or issue, or what's the question?Formulating and conceptualizing methods might be used including brainstorming, focus groups, nominal group techniques, Delphi methods, brainwriting, stakeholder analysis, synectics, lateral thinking, input-output analysis, and concept mapping.Where is the problem and how big or serious is it?The most common method used here is "needs assessment" which can include: analysis of existing data sources, and the use of sample surveys, interviews of constituent populations, qualitative research, expert testimony, and focus groups.How should the program or technology be delivered to address the problem?Some of the methods already listed apply here, as do detailing methodologies like simulation techniques, or multivariate methods like multiattribute utility theory or exploratory causal modeling; decision-making methods; and project planning and implementation methods like flow charting, PERT/CPM, and project scheduling.How well is the program or technology delivered?Qualitative and quantitative monitoring techniques, the use of management information systems, and implementation assessment would be appropriate methodologies here.The questions and methods addressed under summative evaluation include:What type of evaluation is feasible?Evaluability assessment can be used here, as well as standard approaches for selecting an appropriate evaluation design.What was the effectiveness of the program or technology?One would choose from observational and correlational methods for demonstrating whether desired effects occurred, and quasi-experimental and experimental designs for determining whether observed effects can reasonably be attributed to the intervention and not to other sources.What is the net impact of the program?Econometric methods for assessing cost effectiveness and cost/benefits would apply here, along with qualitative methods that enable us to summarize the full range of intended and unintended impacts.Clearly, this introduction is not meant to be exhaustive. Each of these methods, and the many not mentioned, are supported by an extensive methodological research literature. This is a formidable set of tools. But the need to improve, update and adapt these methods to changing circumstances means that methodological research and development needs to have a major place in evaluation work.Evaluation Models

You learned the basics of evaluation last week. This week we are going to learn about some of the different evaluation approaches or models or metaphors that different groups of evaluators tend to endorse. I generally use the terms approaches, models, and metaphors as synonyms.

The reading is titled Chapter 4: Evaluation Models, which is from a book by my (Burke Johnsons) major professor at the University of Georgia. Here is the reference:

Payne, D.A. (1994). Designing educational project and programevaluations: A practical overview based on research andexperience. Boston: Kluwer Academic Publishers.

The whole book is actually quite good, but we are only using one chapter for our course.

In this chapter, Payne discusses four types of models: 1. Management Models2. Judicial Models3. Anthropological Models4. Consumer Models

You might remember these four types using this mnemonic: MJAC.

Here is how Scriven defines models: A term loosely used to refer to a conception or approach or sometimes even a method (e.g., naturalistic, goal-free) of doing evaluation.Models are to paradigms as hypotheses are to theories, which means less general but with some overlap.

Payne notes (p.58) that his four metaphors may be helpful in leading to your theory of evaluation. In fact, this is something I want you to think about this semester: what is YOUR theory of evaluation. Note: Marvin Alkins (1969) definition of program theory on p.58 of Paynes chapter and compare it with Will Shadishs definition of program theory on page 33 in RFL. I suggest that you memorize Shadishs definition of program theory. I am a strong advocate of evaluators being aware of their evaluation theory.

In short, you may wish to pick one model as being of most importance in your theory of evaluation. On the other hand, I my theory of evaluation is a needs based or contingency theory of evaluation. (By the way, I am probably most strongly influenced by Will Shadishs evaluation writings.) In short, I like to select the model that best fits the specific needs or situational characteristics of the program evaluation I am conducting. Payne makes some similar points in the last section of the chapter in the section titled Metaphor Selection: In Praise of Eclecticism.

Now I will make some comments about each of the four approaches to evaluation discussed by David Payne. I will also add some thoughts not included by Payne.

1. Management Models

The basic idea of the management approach is that the evaluators job is to provide information to management to help them in making decisions about programs, products, etc. The evaluators job is to serve managers (or whoever the key decision makers are).

One very popular management model used today is Michael Pattons Utilization Focused Evaluation. (Note that Pattons model is not discussed in Paynes chapter. You may want to examine the appendix of RFL for pages where Pattons model is briefly discussed.) Basically, Patton wants evaluators to provide information to primary intended users, and not to even conduct an evaluation if it has little or no potential for utilization. He wants evaluators to facilitate use as much as possible. Pattons motto is to focus on intended use by intended users. He recommends that evaluators work closely with primary intended users so that their needs will be met. This requires focusing on stakeholders key questions, issues, and intended uses. It also requires involving intended users in the interpretation of the findings, and then disseminating those findings so that they can be used. One should also follow up on actual use. It is helpful to develop a utilization plan and to outline what the evaluator and primary users must do to result in the use of the evaluation findings. Ultimately, evaluations should, according to Patton, be judged by their utility and actual use. Pattons approach is discussed in detail in the following book:

Patton, M.Q. (1997). Utilization-focused evaluation: The new century text. Thousand Oaks, CA: Sage.

The first edition of Pattons Utilization-focused evaluation book was published in 1978.

Another current giant in evaluation that fits into the management oriented evaluation camp is Joseph Wholey, but I will not outline his theory here (see, for example his 1979 book titled Evaluation: Promise and Performance, or his 1983 book titled Evaluation and Effective Public Management, or his edited 1994 book titled Handbook of Practical Program Evaluation.).

Now I will make a few comments on the only management model discussed by Payne (i.e., the CIPP Model).

Daniel Stufflebeams CIPP Model has been around for many years (e.g., see Stufflebeam, et al. 1971), and it has been very popular in education.

The CIPP Model is a simple systems model applied to program evaluation. A basic open system includes input, process, and output. Stufflebeam added context, included input and process, and relabeled output with the term product. Hence, CIPP stands for context evaluation, input evaluation, process evaluation, and product evaluation. These types are typically viewed as separate forms of evaluation, but they can also be viewed as steps or stages in a comprehensive evaluation.

Context evaluation includes examining and describing the context of the program you are evaluating, conducting a needs and goals assessment, determining the objectives of the program, and determining whether the proposed objectives will be sufficiently responsive to the identified needs. It helps in making program planning decisions.

Input evaluation includes activities such as a description of the program inputs and resources, a comparison of how the program might perform compared to other programs, a prospective benefit/cost assessment (i.e., decide whether you think the benefits will outweigh the costs of the program, before the program is actually implemented), an evaluation of the proposed design of the program, and an examination of what alternative strategies and procedures for the program should be considered and recommended. In short, this type of evaluation examines what the program plans on doing. It helps in making program structuring decisions.

Process evaluation includes examining how a program is being implemented, monitoring how the program is performing, auditing the program to make sure it is following required legal and ethical guidelines, and identifying defects in the procedural design or in the implementation of the program. It is here that evaluators provide information about what is actually occurring in the program. Evaluators typically provide this kind of feedback to program personnel because it can be helpful in making formative evaluation decisions (i.e., decisions about how to modify or improve the program). In general, process evaluation helps in making implementing decisions.

Product evaluation includes determining and examining the general and specific outcomes of the program (i.e., which requires using impact or outcome assessment techniques), measuring anticipated outcomes, attempting to identify unanticipated outcomes, assessing the merit of the program, conducting a retrospective benefit/cost assessment (to establish the actual worth or value of the program), and/or conducting a cost effectiveness assessment (to determine if the program is cost effective compared to other similar programs). Product evaluation is very helpful in making summative evaluation decisions (e.g., What is the merit and worth of the program? Should the program be continued?)

(By the way, formative evaluation is conducted for the purpose of improving an evaluation object (evaluand) and summative evaluation is conducted for the purpose of accountability which requires determining the overall effectiveness or merit and worth of an evaluation object. Formative evaluation information tends to be used by program administrators and staff members, whereas summative evaluation information tends to be used by high level administrators and policy makers to assist them in making funding or program continuation decisions. As I mentioned earlier (in lecture one), the terms formative and summative evaluation were coined by Michael Scriven in the late 1960s.)

Thinking of the CIPP Model, input and process evaluation tend to be very helpful for formative evaluation and product evaluation tends to be especially helpful for summative evaluation. Note, however, that the other parts of the CIPP Model can sometimes be used for formative or summative evaluative decisions. For example, product evaluation may lead to program improvements (i.e., formative), and process evaluation may lead to documentation that the program has met delivery requirements set by law (i.e., summative).

As you can see, the CIPP Evaluation Model is quite comprehensive, and one would often not use every part of the CIPP Model in a single evaluation. On the other hand, it would be fruitful for you to think about a small program (e.g., a training program in a local organization) where you would go through all four steps or parts of the CIPP Model. (Again, there are two different ways to view the CIPP model: first as four distinct kinds of evaluation and second as steps or stages in a comprehensive evaluation model.) The CIPP Model is, in general, quite useful in helping us to focus on some very important evaluation questions and issues and to think about some different types or stages of evaluation.

Interestingly, Stufflebeam no longer talks about the CIPP Model. He now seems to refer to his approach as Decision/Accountability-Oriented Evaluation (see Stufflebeam, 2001, in the Sage book titled Evaluation Models). (By the way, I generally do not recommend Stufflebeams recent book titled Evaluation Models because he tends to denigrate other useful approaches (in my opinion) while pushing his own approach. In contrast, I advocate an eclectic approach to evaluation or what Will Shadish calls needs based evaluation; needs based evaluation is based on contingency theory because the type of evaluation needed in a particular time and place is said to be contingent upon many factors which must be determined and considered by the evaluator.)

2. Judicial Models

Judicial or adversary-oriented evaluation is based on the judicial metaphor. It is assumed here that the potential for evaluation bias by a single evaluator cannot be ruled out, and, therefore, each side should have a separate evaluator to make their case. For example, one evaluator can examine and present the evidence for terminating a program and another evaluator can examine and present the evidence for continuing the program. A hearing of some sort is conducted where each evaluator makes his or her case regarding the evaluand. In a sense, this approach sets up a system of checks and balances, by ensuring that all sides be heard, including alternative explanations for the data. Obviously the quality of the different evaluators must be equated for fairness. The ultimate decision is made by some judge or arbiter who considers the arguments and the evidence and then renders a decision.

One example, that includes multiple experts is the so called blue-ribbon panel, where multiple experts of different backgrounds argue the merits of some policy or program. Some committees also operate, to some degree, along the lines of the judicial model.

As one set of authors put it, adversary evaluation has a built-in metaevaluation (Worthen and Sanders, 1999). A metaevaluation is simply an evaluation of an evaluation.

By showing the positive and negative aspects of a program, considering alternative interpretations of the data, and examining the strengths and weaknesses of the evaluation report (metaevaluation), the adversary or judicial approach seems to have some potential. On the other hand, it may lead to unnecessary arguing, competition, and an indictment mentality. It can also be quite expensive because of the requirement of multiple evaluators. In general, formal judicial or adversary models are not often used in program evaluation. It is, however, an interesting idea that may be useful on occasion.

3. Anthropological Models

Payne includes under this heading the qualitative approaches to program evaluation. For a review of qualitative research you can review pages 17-21 and Chapter 11 in my research methods book (Educational Research by Johnson and Christensen). (Remember that IDE 510 or a very similar course is a prerequisite for IDE 660.) Briefly, qualitative research tends to be exploratory, collect a lot of descriptive data, and take an inductive approach to understanding the world (i.e., looking at specifics and then trying to come up with conclusions or generalizations about the what is observed). Payne points out that you may want to view the group of people involved in a program as forming a unique culture that can be systematically studied.

Payne treats several approaches as being very similar and anthropological in nature, including responsive evaluation (Robert Stakes model), goal-free evaluation (developed by Scriven as a supplement to his other evaluation approach), and naturalistic evaluation (which is somewhat attributable to Guba and Lincoln, who wrote a 1985 book titled Naturalistic Evaluation). Again, what all these approaches have in common is that they tend to rely on the qualitative research paradigm.

In all of these approaches the evaluator enters the field and observes what is going on in the program. Participant and nonparticipant observation are commonly used. Additional data are also regularly collected (e.g., focus groups, interviews, questionnaires, and secondary or extant data), especially for the purpose of triangulation.

The key to Scrivens goal-free evaluation is to have an evaluator enter the field and try to learn about a program and its results inductively and without being aware of the specific objectives of the program. Note that Scrivens approach is useful as a supplement to the more traditional goal-oriented evaluation. Goal free evaluation is done by a separate evaluator, who collects exploratory data to supplement another evaluators goal-oriented data.

Payne next lists several strengths of qualitative evaluation. This list is from a nice book by Michael Patton (titled How to Use Qualitative Methods in Evaluation). Qualitative methods tend to be useful for describing program implementation, studying process, studying participation, getting program participants views or opinions about program impact, and identifying program strengths and weaknesses. Another strength is identifying unintended outcomes which may be missed if you design a study only to measure certain specific objectives.

Next, Payne talks about Robert Stakes specific anthropological model, which is called Responsive Evaluation. (By the way, Robert (Bob) Stake also has a recent book on case study research which I recommend you add to your library sometime (titled The Art of Case Study Research, Sage Publications, 1997).) Stake uses the term responsive because he wants evaluators to be flexible and responsive to the concerns and issues of program stakeholders. He also believes that qualitative methods provide the way to be the most responsive. He uses a somewhat derogatory (I think) term to refer to what he sees as the traditional evaluator. In particular, he labels the traditional evaluation approach preordinate evaluation, which means evaluation that relies only on formal plans and measurement of pre-specified program objectives.

In explaining responsive evaluation, Stake says an educational evaluation is responsive evaluation if it orients more directly to program activities than to program intents; responds to audience requirements for information; and if the different value-perspectives present are referred to in reporting the success and failure of the program (Stake, 1975).

Payne also shows Stakes events clock which shows the key evaluation activities and events, while stressing that they do not have to be done in a predetermined or linear order. Flexibility is the key. Go where the data and your emerging conclusions and opportunities lead you. Ultimately, the responsive evaluator prepares a narrative or case study report on what he or she finds, although it is also essential that the responsive evaluator present findings informally to different stakeholders during the conduct of the study to increase their input, participation, buy-in, and use of findings. As you can see, responsive evaluation is very much a participatory evaluation approach.

On page 74 Payne lists some strengths and weaknesses of the Anthropological evaluation approach. He also gives a nice real world example of an evaluation using the responsive approach.

4. Consumer Models

The last approach discussed by Payne is the consumer approach. The primary evaluation theorist behind this approach is Michael Scriven. Obviously this approach is based on the consumer product metaphor. In other words, perhaps evaluators can obtain some useful evaluation ideas from the field of consumer product evaluation (which is exemplified by the magazine Consumer Reports). As Payne mentions, the consumer approach is primarily summative. For example, when you read Consumer Reports, your goal is to learn if a product is good or not and how well it stacks up against similar products and whether you want to purchase it. In short, you are looking at the merit and worth (absolute and relative) of a particular product. Note, however, that it is much more difficult to evaluate a social or educational program that it is to evaluate, for example, an automobile or a coffee maker. With an automobile or a coffee maker, you can easily measure its specifications and performance. A social program is a much more complex package, that includes many elements and that requires an impact assessment using social science research techniques to determine if the program works and how it works.

Payne includes an excellent checklist (developed by Scriven, and sometimes called an evaluation checklist) that you may want to use when you are evaluating any type of evaluand (i.e., not just consumer products).

As Payne points out, the consumer approach also holds some promise for developing lists of programs that work, which can be used by policy makers and others when developing or selecting programs for specific problems. Payne also discuss the process of how a program could get on to such a list.

Logic modelAlogic model(also known as a logical framework,theory of change, or program matrix) is a tool used most often by managers and evaluators of programs to evaluate the effectiveness of a program. Logic models are usually a graphical depiction of the logical relationships between the resources, activities, outputs and outcomes of a program.[1]While there are many ways in which logic models can be presented, the underlying purpose of constructing a logic model is to assess the "if-then" (causal) relationships between the elements of the program; if the resources are available for a program, then the activities can be implemented, if the activities are implemented successfully then certain outputs and outcomes can be expected. Logic models are most often used in the evaluation stage of a program, they can however be used during planning and implementation.[2]Versions[edit]In its simplest form, a logic model has four components:[3]InputsActivitiesOutputsOutcomes/impacts

what resources go into a programwhat activities the program undertakeswhat is produced through those activitiesthe changes or benefits that result from the program

e.g. money, staff, equipmente.g. development of materials, training programse.g. number of booklets produced, workshops held, people trainede.g. increased skills/ knowledge/ confidence, leading in longer-term to promotion, new job, etc.

Following the early development of the logic model in the 1970s byCarol Weiss,Joseph Wholeyand others, many refinements and variations have been added to the basic concept. Many versions of logic models set out a series of outcomes/impacts, explaining in more detail the logic of how an intervention contributes to intended or observed results.[4]This will often include distinguishing between short-term, medium-term and long-term results, and between direct and indirect results.Some logic models also include assumptions, which are beliefs the prospective grantees have about the program, the people involved, and the context and the way the prospective grantees think the program will work, and external factors, consisting of the environment in which the program exists, including a variety of external factors that interact with and influence the program action.University Cooperative Extension Programs in the US have developed a more elaborate logic model, called the Program Action Logic Model, which includes six steps: Inputs(what we invest) Outputs: Activities(the actual tasks we do) Participation(who we serve; customers &stakeholders) Engagement(how those we serve engage with the activities) Outcomes/Impacts: Short Term(learning: awareness, knowledge, skills, motivations) Medium Term(action: behavior, practice, decisions, policies) Long Term(consequences: social, economic, environmental etc.)In front of Inputs, there is a description of a Situation and Priorities. These are the considerations that determine what Inputs will be needed.TheUniversity of WisconsinExtension offers a series of guidance documents[5]on the use of logic models. There is also an extensivebibliography[6]of work on this program logic model.Advantages[edit]By describing work in this way, managers have an easier way to define the work and measure it.Performance measurescan be drawn from any of the steps. One of the key insights of the logic model is the importance of measuring final outcomes or results, because it is quite possible to waste time and money (inputs), "spin the wheels" on work activities, or produce outputs without achieving desired outcomes. It is these outcomes (impacts, long-term results) that are the only justification for doing the work in the first place. For commercial organizations, outcomes relate toprofit. Fornot-for-profitorgovernmental organizations, outcomes relate to successful achievement of mission or program goals.Uses of the logic model[edit]Program planning[edit]One of the most important uses of the logic model is for program planning. Here it helps managers to 'plan with the end in mind'Stephen Covey, rather than just consider inputs (e.g. budgets, employees) or just the tasks that must be done. In the past, program logic has been justified by explaining the process from the perspective of an insider. Paul McCawley (no date) outlines how this process was approached:1. We invest this time/money so that we can generate this activity/product.2. The activity/product is needed so people will learn how to do this.3. People need to learn that so they can apply their knowledge to this practice.4. When that practice is applied, the effect will be to change this condition5. When that condition changes, we will no longer be in this situation.While logic models have been used in this way successfully, Millaret al.(1999) has suggested that following the above sequence, from the inputs through to the outcomes, could limit ones thinking to the existing activities, programs and research questions. Instead, by using the logic model to focus on the intended outcomes of a particular program the questions change from what is being done? to what needs to be done? McCawley (no date) suggests that by using this new reasoning, a logic model for a program can be built by asking the following questions in sequence:1. What is the current situation that we intend to impact?2. What will it look like when we achieve the desired situation or outcome?3. What behaviors need to change for that outcome to be achieved?4. What knowledge or skills do people need before the behavior will change?5. What activities need to be performed to cause the necessary learning?6. What resources will be required to achieve the desired outcome?By placing the focus on ultimate outcomes or results, planners canthink backwardsthrough the logic model to identify how best to achieve the desired results. Planners therefore need to understand the difference between the categories of the logic model.Performance evaluation[edit]The logic model is often used ingovernmentor not-for-profit organizations, where the mission and vision are not aimed at achieving a financial benefit. In such situations, where profit is not the intended result, it may be difficult to monitor progress toward outcomes. A program logic model provides such indicators, in terms of output and outcome measures of performance. It is therefore important in these organizations to carefully specify the desired results, and consider how to monitor them over time. Often, such as ineducationorsocial programs, the outcomes are long-term and mission success is far in the future. In these cases, intermediate or shorter-term outcomes may be identified that provide an indication of progress toward the ultimate long-term outcome.Traditionally, government programs were described only in terms of theirbudgets. It is easy to measure the amount of money spent on a program, but this is a poor indicator of mission success. Likewise it is relatively easy to measure the amount of work done (e.g. number of workers or number of years spent), but the workers may have just been 'spinning their wheels' without getting very far in terms of ultimate results or outcomes. The production of outputs is a better indicator that something was delivered to customers, but it is still possible that the output did not really meet the customer's needs, was not used, etc. Therefore, the focus on results or outcomes has become amantrain government and not-for-profit programs.ThePresident's Management Agenda[7]is an example of the increasing emphasis on results ingovernment management. It states:"Government likes to begin things to declare grand new programs and causes. But good beginnings are not the measure of success. What matters in the end is completion. Performance. Results."[8]However, although outcomes are used as the primary indicators of program success or failure they are still insufficient. Outcomes may easily be achieved through processes independent of the program and an evaluation of those outcomes would suggest program success when in fact external outputs were responsible for the outcomes (Rossi, Lipsey and Freeman, 2004). In this respect, Rossi, Lipsey and Freeman (2004) suggest that a typical evaluation study should concern itself with measuring how the process indicators (inputs and outputs) have had an effect on the outcome indicators. A program logic model would need to be assessed or designed in order for an evaluation of these standards to be possible. The logic model can and, indeed, should be used in both formative (during the implementation to offer the chance to improve the program) and summative (after the completion of the program) evaluations.A FRAMEWORK FOR PROGRAM EVALUATIONProgram evaluation offers a way to understand and improve community health and development practice using methods that are useful, feasible, proper, and accurate. The framework described below is a practical non-prescriptive tool that summarizes in a logical order the important elements of program evaluation.THE FRAMEWORK CONTAINS TWO RELATED DIMENSIONS: Steps in evaluation practice, and Standards for "good" evaluation.

The six connected steps of the framework are actions that should be a part of any evaluation. Although in practice the steps may be encountered out of order, it will usually make sense to follow them in the recommended sequence. That's because earlier steps provide the foundation for subsequent progress. Thus, decisions about how to carry out a given step should not be finalized until prior steps have been thoroughly addressed.However, these steps are meant to be adaptable, not rigid. Sensitivity to each program's unique context (for example, the program's history and organizational climate) is essential for sound evaluation. They are intended to serve as starting points around which community organizations can tailor an evaluation to best meet their needs. Engage stakeholders Describe the program Focus the evaluation design Gather credible evidence Justify conclusions Ensure use and share lessons learnedUnderstanding and adhering to these basic steps will improve most evaluation efforts.The second part of the framework is a basic set of standards to assess the quality of evaluation activities. There are 30 specific standards, organized into the following four groups: Utility Feasibility Propriety AccuracyThese standards help answer the question, "Will this evaluation be a 'good' evaluation?" They are recommended as the initial criteria by which to judge the quality of the program evaluation efforts.ENGAGE STAKEHOLDERSStakeholdersare people or organizations that have something to gain or lose from what will be learned from an evaluation, and also in what will be done with that knowledge. Evaluation cannot be done in isolation. Almost everything done in community health and development work involves partnerships - alliances among different organizations, board members, those affected by the problem, and others. Therefore, any serious effort to evaluate a program must consider the different values held by the partners. Stakeholders must be part of the evaluation to ensure that their unique perspectives are understood. When stakeholders are not appropriately involved, evaluation findings are likely to be ignored, criticized, or resisted.However, if they are part of the process, people are likely to feel a good deal of ownership for the evaluation process and results. They will probably want to develop it, defend it, and make sure that the evaluation really works.That's why this evaluation cycle begins by engaging stakeholders. Once involved, these people will help to carry out each of the steps that follows.THREE PRINCIPLE GROUPS OF STAKEHOLDERS ARE IMPORTANT TO INVOLVE: People or organizations involved in program operationsmay include community members, sponsors, collaborators, coalition partners, funding officials, administrators, managers, and staff. People or organizations served or affected by the programmay include clients, family members, neighborhood organizations, academic institutions, elected and appointed officials, advocacy groups, and community residents. Individuals who are openly skeptical of or antagonistic toward the program may also be important to involve. Opening an evaluation to opposing perspectives and enlisting the help of potential program opponents can strengthen the evaluation's credibility.Likewise, individuals or groups who could be adversely or inadvertently affected by changes arising from the evaluation have a right to be engaged. For example, it is important to include those who would be affected if program services were expanded, altered, limited, or ended as a result of the evaluation. Primary intended users of the evaluationare the specific individuals who are in a position to decide and/or do something with the results.They shouldn't be confused with primary intended users of the program, although some of them should be involved in this group. In fact, primary intended users should be a subset of all of the stakeholders who have been identified. A successful evaluation will designate primary intended users, such as program staff and funders, early in its development and maintain frequent interaction with them to be sure that the evaluation specifically addresses their values and needs.The amount and type of stakeholder involvement will be different for each program evaluation. For instance, stakeholders can be directly involved in designing and conducting the evaluation. They can be kept informed about progress of the evaluation through periodic meetings, reports, and other means of communication.It may be helpful, when working with a group such as this, to develop an explicit process to share power and resolve conflicts. This may help avoid overemphasis of values held by any specific stakeholder.DESCRIBE THE PROGRAMA program description is a summary of the intervention being evaluated. It should explain what the program is trying to accomplish and how it tries to bring about those changes. The description will also illustrate the program's core components and elements, its ability to make changes, its stage of development, and how the program fits into the larger organizational and community environment.How a program is described sets the frame of reference for all future decisions about its evaluation. For example, if a program is described as, "attempting to strengthen enforcement of existing laws that discourage underage drinking," the evaluation might be very different than if it is described as, "a program to reduce drunk driving by teens." Also, the description allows members of the group to compare the program to other similar efforts, and it makes it easier to figure out what parts of the program brought about what effects.Moreover, different stakeholders may have different ideas about what the program is supposed to achieve and why. For example, a program to reduce teen pregnancy may have some members who believe this means only increasing access to contraceptives, and other members who believe it means only focusing on abstinence.Evaluations done without agreement on the program definition aren't likely to be very useful. In many cases, the process of working with stakeholders to develop a clear and logical program description will bring benefits long before data are available to measure program effectiveness.THERE ARE SEVERAL SPECIFIC ASPECTS THAT SHOULD BE INCLUDEDWHEN DESCRIBING A PROGRAM.Statement of needA statement of needdescribes the problem, goal, or opportunity that the program addresses; it also begins to imply what the program will do in response. Important features to note regarding a program's need are: the nature of the problem or goal, who is affected, how big it is, and whether (and how) it is changing.ExpectationsExpectationsare the program's intended results. They describe what the program has to accomplish to be considered successful. For most programs, the accomplishments exist on a continuum (first, we want to accomplish X... then, we want to do Y...). Therefore, they should be organized by time ranging from specific (and immediate) to broad (and longer-term) consequences. For example, a program's vision, mission, goals, and objectives, all represent varying levels of specificity about a program's expectations.ActivitiesActivitiesare everything the program does to bring about changes. Describing program components and elements permits specific strategies and actions to be listed in logical sequence. This also shows how different program activities, such as education and enforcement, relate to one another. Describing program activities also provides an opportunity to distinguish activities that are the direct responsibility of the program from those that are conducted by related programs or partner organizations. Things outside of the program that may affect its success, such as harsher laws punishing businesses that sell alcohol to minors, can also be noted.ResourcesResourcesinclude the time, talent, equipment, information, money, and other assets available to conduct program activities. Reviewing the resources a program has tells a lot about the amount and intensity of its services. It may also point out situations where there is a mismatch between what the group wants to do and the resources available to carry out these activities. Understanding program costs is a necessity to assess the cost-benefit ratio as part of the evaluation.Stage of developmentA program'sstage of developmentreflects its maturity. All community health and development programs mature and change over time. People who conduct evaluations, as well as those who use their findings, need to consider the dynamic nature of programs. For example, a new program that just received its first grant may differ in many respects from one that has been running for over a decade.At least three phases of development are commonly recognized:planning,implementation, andeffects or outcomes. In the planning stage, program activities are untested and the goal of evaluation is to refine plans as much as possible. In the implementation phase, program activities are being field tested and modified; the goal of evaluation is to see what happens in the "real world" and to improve operations. In the effects stage, enough time has passed for the program's effects to emerge; the goal of evaluation is to identify and understand the program's results, including those that were unintentional.ContextA description of the program'scontextconsiders the important features of the environment in which the program operates. This includes understanding the area's history, geography, politics, and social and economic conditions, and also what other organizations have done. A realistic and responsive evaluation is sensitive to a broad range of potential influences on the program. An understanding of the context lets users interpret findings accurately and assess their generalizability. For example, a program to improve housing in an inner-city neighborhood might have been a tremendous success, but would likely not work in a small town on the other side of the country without significant adaptation.Logic modelAlogic modelsynthesizes the main program elements into a picture of how the program is supposed to work. It makes explicit the sequence of events that are presumed to bring about change. Often this logic is displayed in a flow-chart, map, or table to portray the sequence of steps leading to program results.Creating a logic model allows stakeholders to improve and focus program direction. It reveals assumptions about conditions for program effectiveness and provides a frame of reference for one or more evaluations of the program. A detailed logic model can also be a basis for estimating the program's effect on endpoints that are not directly measured. For example, it may be possible to estimate the rate of reduction in disease from a known number of persons experiencing the intervention if there is prior knowledge about its effectiveness.The breadth and depth of a program description will vary for each program evaluation. And so, many different activities may be part of developing that description. For instance, multiple sources of information could be pulled together to construct a well-rounded description. The accuracy of an existing program description could be confirmed through discussion with stakeholders. Descriptions of what's going on could be checked against direct observation of activities in the field. A narrow program description could be fleshed out by addressing contextual factors (such as staff turnover, inadequate resources, political pressures, or strong community participation) that may affect program performance.FOCUS THE EVALUATION DESIGNBy focusing the evaluation design, we mean doing advance planning about where the evaluation is headed, and what steps it will take to get there. It isn't possible or useful for an evaluation to try to answer all questions for all stakeholders; there must be a focus. A well-focused plan is a safeguard against using time and resources inefficiently.Depending on what you want to learn, some types of evaluation will be better suited than others. However, once data collection begins, it may be difficult or impossible to change what you are doing, even if it becomes obvious that other methods would work better. A thorough plan anticipates intended uses and creates an evaluation strategy with the greatest chance to be useful, feasible, proper, and accurate.AMONG THE ISSUES TO CONSIDER WHEN FOCUSING AN EVALUATION ARE:PurposePurposerefers to the general intent of the evaluation. A clear purpose serves as the basis for the design, methods, and use of the evaluation. Taking time to articulate an overall purpose will stop your organization from making uninformed decisions about how the evaluation should be conducted and used.There are at least four general purposes for which a community group might conduct an evaluation: To gain insight.This happens, for example, when deciding whether to use a new approach (e.g., would a neighborhood watch program work for our community?) Knowledge from such an evaluation will provide information about its practicality. For a developing program, information from evaluations of similar programs can provide the insight needed to clarify how its activities should be designed. To improve how things get done.This is appropriate in the implementation stage when an established program tries to describe what it has done. This information can be used to describe program processes, to improve how the program operates, and to fine-tune the overall strategy. Evaluations done for this purpose include efforts to improve the quality, effectiveness, or efficiency of program activities. To determine what the effects of the program are. Evaluations done for this purpose examine the relationship between program activities and observed consequences. For example, are more students finishing high school as a result of the program? Programs most appropriate for this type of evaluation are mature programs that are able to state clearly what happened and who it happened to. Such evaluations should provide evidence about what the program's contribution was to reaching longer-term goals such as a decrease in child abuse or crime in the area. This type of evaluation helps establish the accountability, and thus, the credibility, of a program to funders and to the community. To affect those who participate in it. The logic and reflection required of evaluation participants can itself be a catalyst for self-directed change. And so, one of the purposes of evaluating a program is for the process and results to have a positive influence. Such influences may: Empower program participants(for example, being part of an evaluation can increase community members' sense of control over the program); Supplement the program(for example, using a follow-up questionnaire can reinforce the main messages of the program); Promote staff development(for example, by teaching staff how to collect, analyze, and interpret evidence); or Contribute to organizational growth(for example, the evaluation may clarify how the program relates to the organization's mission).UsersUsersare the specific individuals who will receive evaluation findings. They will directly experience the consequences of inevitable trade-offs in the evaluation process. For example, a trade-off might be having a relatively modest evaluation to fit the budget with the outcome that the evaluation results will be less certain than they would be for a full-scale evaluation. Because they will be affected by these tradeoffs, intended users have a right to participate in choosing a focus for the evaluation. An evaluation designed without adequate user involvement in selecting the focus can become a misguided and irrelevant exercise. By contrast, when users are encouraged to clarify intended uses, priority questions, and preferred methods, the evaluation is more likely to focus on things that will inform (and influence) future actions.UsesUsesdescribe what will be done with what is learned from the evaluation. There is a wide range of potential uses for program evaluation. Generally speaking, the uses fall in the same four categories as the purposes listed above: to gain insight, improve how things get done, determine what the effects of the program are, and affect participants. The following list gives examples of uses in each category.SOME SPECIFIC EXAMPLES OF EVALUATION USES TO GAIN INSIGHT: Assess needs and wants of community members Identify barriers to use of the program Learn how to best describe and measure program activities TO IMPROVE HOW THINGS GET DONE: Refine plans for introducing a new practice Determine the extent to which plans were implemented Improve educational materials Enhance cultural competence Verify that participants' rights are protected Set priorities for staff training Make mid-course adjustments Clarify communication Determine if client satisfaction can be improved Compare costs to benefits Find out which participants benefit most from the program Mobilize community support for the program TO DETERMINE WHAT THE EFFECTS OF THE PROGRAM ARE: Assess skills development by program participants Compare changes in behavior over time Decide where to allocate new resources Document the level of success in accomplishing objectives Demonstrate that accountability requirements are fulfilled Use information from multiple evaluations to predict the likely effects of similar programs TO AFFECT PARTICIPANTS: Reinforce messages of the program Stimulate dialogue and raise awareness about community issues Broaden consensus among partners about program goals Teach evaluation skills to staff and other stakeholders Gather success stories Support organizational change and improvementQuestionsThe evaluation needs to answer specificquestions. Drafting questions encourages stakeholders to reveal what they believe the evaluation should answer. That is, what questions are more important to stakeholders? The process of developing evaluation questions further refines the focus of the evaluation.MethodsThemethodsavailable for an evaluation are drawn from behavioral science and social research and development. Three types of methods are commonly recognized. They are experimental, quasi-experimental, and observational or case study designs. Experimental designs use random assignment to compare the effect of an intervention between otherwise equivalent groups (for example, comparing a randomly assigned group of students who took part in an after-school reading program with those who didn't). Quasi-experimental methods make comparisons between groups that aren't equal (e.g. program participants vs. those on a waiting list) or use of comparisons within a group over time, such as in an interrupted time series in which the intervention may be introduced sequentially across different individuals, groups, or contexts. Observational or case study methods use comparisons within a group to describe and explain what happens (e.g., comparative case studies with multiple communities).No design is necessarily better than another. Evaluation methods should be selected because they provide the appropriate information to answer stakeholders' questions, not because they are familiar, easy, or popular. The choice of methods has implications for what will count as evidence, how that evidence will be gathered, and what kind of claims can be made. Because each method option has its own biases and limitations, evaluations that mix methods are generally more robust.Over the course of an evaluation, methods may need to be revised or modified. Circumstances that make a particular approach useful can change. For example, the intended use of the evaluation could shift from discovering how to improve the program to helping decide about whether the program should continue or not. Thus, methods may need to be adapted or redesigned to keep the evaluation on track.AgreementsAgreementssummarize the evaluation procedures and clarify everyone's roles and responsibilities. An agreement describes how the evaluation activities will be implemented. Elements of an agreement include statements about the intended purpose, users, uses, and methods, as well as a summary of the deliverables, those responsible, a timeline, and budget.The formality of the agreement depends upon the relationships that exist between those involved. For example, it may take the form of a legal contract, a detailed protocol, or a simple memorandum of understanding. Regardless of its formality, creating an explicit agreement provides an opportunity to verify the mutual understanding needed for a successful evaluation. It also provides a basis for modifying procedures if that turns out to be necessary.As you can see, focusing the evaluation design may involve many activities. For instance, both supporters and skeptics of the program could be consulted to ensure that the proposed evaluation questions are politically viable. A menu of potential evaluation uses appropriate for the program's stage of development could be circulated among stakeholders to determine which is most compelling. Interviews could be held with specific intended users to better understand their information needs and timeline for action. Resource requirements could be reduced when users are willing to employ more timely but less precise evaluation methods.GATHER CREDIBLE EVIDENCECredible evidenceis the raw material of a good evaluation. The information learned should be seen by stakeholders as believable, trustworthy, and relevant to answer their questions. This requires thinking broadly about what counts as "evidence." Such decisions are always situational; they depend on the question being posed and the motives for asking it. For some questions, a stakeholder's standard for credibility could demand having the results of a randomized experiment. For another question, a set of well-done, systematic observations such as interactions between an outreach worker and community residents, will have high credibility. The difference depends on what kind of information the stakeholders want and the situation in which it is gathered.Context matters! In some situations, it may be necessary to consult evaluation specialists. This may be especially true if concern for data quality is especially high. In other circumstances, local people may offer the deepest insights. Regardless of their expertise, however, those involved in an evaluation should strive to collect information that will convey a credible, well-rounded picture of the program and its efforts.Having credible evidence strengthens the evaluation results as well as the recommendations that follow from them. Although all types of data have limitations, it is possible to improve an evaluation's overall credibility. One way to do this is by using multiple procedures for gathering, analyzing, and interpreting data. Encouraging participation by stakeholders can also enhance perceived credibility. When stakeholders help define questions and gather data, they will be more likely to accept the evaluation's conclusions and to act on its recommendations.THE FOLLOWING FEATURES OF EVIDENCE GATHERING TYPICALLY AFFECT HOW CREDIBLE IT IS SEEN AS BEING:IndicatorsIndicatorstranslate general concepts about the program and its expected effects into specific, measurable parts.Examples of indicators include: The program's capacity to deliver services The participation rate The level of client satisfaction The amount of intervention exposure (how many people were exposed to the program, and for how long they were exposed) Changes in participant behavior Changes in community conditions or norms Changes in the environment (e.g., new programs, policies, or practices) Longer-term changes in population health status (e.g., estimated teen pregnancy rate in the county)Indicators should address the criteria that will be used to judge the program. That is, they reflect the aspects of the program that are most meaningful to monitor. Several indicators are usually needed to track the implementation and effects of a complex program or intervention.One way to develop multiple indicators is to create a "balanced scorecard," which contains indicators that are carefully selected to complement one another. According to this strategy, program processes and effects are viewed from multiple perspectives using small groups of related indicators. For instance, a balanced scorecard for a single program might include indicators of how the program is being delivered; what participants think of the program; what effects are observed; what goals were attained; and what changes are occurring in the environment around the program.Another approach to using multiple indicators is based on a program logic model, such as we discussed earlier in the section. A logic model can be used as a template to define a full spectrum of indicators along the pathway that leads from program activities to expected effects. For each step in the model, qualitative and/or quantitative indicators could be developed.Indicators can be broad-based and don't need to focus only on a program's long -term goals. They can also address intermediary factors that influence program effectiveness, including such intangible factors as service quality, community capacity, or inter -organizational relations. Indicators for these and similar concepts can be created by systematically identifying and then tracking markers of what is said or done when the concept is expressed.In the course of an evaluation, indicators may need to be modified or new ones adopted. Also, measuring program performance by tracking indicators is only one part of evaluation, and shouldn't be confused as a basis for decision making in itself. There are definite perils to using performance indicators as a substitute for completing the evaluation process and reaching fully justified conclusions. For example, an indicator, such as a rising rate of unemployment, may be falsely assumed to reflect a failing program when it may actually be due to changing environmental conditions that are beyond the program's control.SourcesSourcesof evidence in an evaluation may be people, documents, or observations. More than one source may be used to gather evidence for each indicator. In fact, selecting multiple sources provides an opportunity to include different perspectives about the program and enhances the evaluation's credibility. For instance, an inside perspective may be reflected by internal documents and comments from staff or program managers; whereas clients and those who do not support the program may provide different, but equally relevant perspectives. Mixing these and other perspectives provides a more comprehensive view of the program or intervention.The criteria used to select sources should be clearly stated so that users and other stakeholders can interpret the evidence accurately and assess if it may be biased. In addition, some sources provide information in narrative form (for example, a person's experience when taking part in the program) and others are numerical (for example, how many people were involved in the program). The integration of qualitative and quantitative information can yield evidence that is more complete and more useful, thus meeting the needs and expectations of a wider range of stakeholders.QualityQualityrefers to the appropriateness and integrity of information gathered in an evaluation. High quality data are reliable and informative. It is easier to collect if the indicators have been well defined. Other factors that affect quality may include instrument design, data collection procedures, training of those involved in data collection, source selection, coding, data management, and routine error checking. Obtaining quality data will entail tradeoffs (e.g. breadth vs. depth); stakeholders should decide together what is most important to them. Because all data have limitations, the intent of a practical evaluation is to strive for a level of quality that meets the stakeholders' threshold for credibility.QuantityQuantityrefers to the amount of evidence gathered in an evaluation. It is necessary to estimate in advance the amount of information that will be required and to establish criteria to decide when to stop collecting data - to know when enough is enough. Quantity affects the level of confidence or precision users can have - how sure we are that what we've learned is true. It also partly determines whether the evaluation will be able to detect effects. All evidence collected should have a clear, anticipated use.LogisticsBylogistics, we mean the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations also have cultural preferences that dictate acceptable ways of asking questions and collecting information, including who would be perceived as an appropriate person to ask the questions. For example, some participants may be unwilling to discuss their behavior with a stranger, whereas others are more at ease with someone they don't know. Therefore, the techniques for gathering evidence in an evaluation must be in keeping with the cultural norms of the community. Data collection procedures should also ensure that confidentiality is protected.JUSTIFY CONCLUSIONSThe process of justifying conclusions recognizes that evidence in an evaluation does not necessarily speak for itself. Evidence must be carefully considered from a number of different stakeholders' perspectives to reach conclusions that are well -substantiated and justified. Conclusions become justified when they are linked to the evidence gathered and judged against agreed-upon values set by the stakeholders. Stakeholders must agree that conclusions are justified in order to use the evaluation results with confidence.THEPRINCIPAL ELEMENTSINVOLVED IN JUSTIFYING CONCLUSIONS BASED ON EVIDENCE ARE:StandardsStandardsreflect the values held by stakeholders about the program. They provide the basis to make program judgments. The use of explicit standards for judgment is fundamental to sound evaluation. In practice, when stakeholders articulate and negotiate their values, these become the standards to judge whether a given program's performance will, for instance, be considered "successful," "adequate," or "unsuccessful."Analysis and synthesisAnalysis and synthesisare methods to discover and summarize an evaluation's findings. They are designed to detect patterns in evidence, either by isolating important findings (analysis) or by combining different sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each evidence element, as well as a synthesis of all sources to examine patterns that emerge. Deciphering facts from a given body of evidence involves deciding how to organize, classify, compare, and display information. These decisions are guided by the questions being asked, the types of data available, and especially by input from stakeholders and primary intended users.InterpretationInterpretationis the effort to figure out what the findings mean. Uncovering facts about a program's performance isn't enough to make conclusions. The facts must be interpreted to understand their practical significance. For example, saying, "15 % of the people in our area witnessed a violent act last year," may be interpreted differently depending on the situation. For example, if 50% of community members had watched a violent act in the last year when they were surveyed five years ago, the group can suggest that, while still a problem, things are getting better in the community. However, if five years ago only 7% of those surveyed said the same thing, community organizations may see this as a sign that they might want to change what they are doing. In short, interpretations draw on information and perspectives that stakeholders bring to the evaluation. They can be strengthened through active participation or interaction with the data and preliminary explanations of what happened.JudgementsJudgmentsare statements about the merit, worth, or significance of the program. They are formed by comparing the findings and their interpretations against one or more selected standards. Because multiple standards can be applied to a given program, stakeholders may reach different or even conflicting judgments. For instance, a program that increases its outreach by 10% from the previous year may be judged positively by program managers, based on standards of improved performance over time. Community members, however, may feel that despite improvements, a minimum threshold of access to services has still not been reached. Their judgment, based on standards of social equity, would therefore be negative. Conflicting claims about a program's quality, value, or importance often indicate that stakeholders are using different standards or values in making judgments. This type of disagreement can be a catalyst to clarify values and to negotiate the appropriate basis (or bases) on which the program should be judged.RecommendationsRecommendationsare actions to consider as a result of the evaluation. Forming recommendations requires information beyond just what is necessary to form judgments. For example, knowing that a program is able to increase the services available to battered women doesn't necessarily translate into a recommendation to continue the effort, particularly when there are competing priorities or other effective alternatives. Thus, recommendations about what to do with a given intervention go beyond judgments about a specific program's effectiveness.If recommendations aren't supported by enough evidence, or if they aren't in keeping with stakeholders' values, they can really undermine an evaluation's credibility. By contrast, an evaluation can be strengthened by recommendations that anticipate and react to what users will want to know.THREE THINGS MIGHT INCREASE THE CHANCES THAT RECOMMENDATIONS WILL BE RELEVANT AND WELL-RECEIVED: Sharing draft recommendations Soliciting reactions from multiple stakeholders Presenting options instead of directive adviceJustifying conclusions in an evaluation is a process that involves different possible steps. For instance, conclusions could be strengthened by searching for alternative explanations from the ones you have chosen, and then showing why they are unsupported by the evidence. When there are different but equally well supported conclusions, each could be presented with a summary of their strengths and weaknesses. Techniques to analyze, synthesize, and interpret findings might be agreed upon before data collection begins.ENSURE USE AND SHARE LESSONS LEARNEDIt is naive to assume that lessons learned in an evaluation will necessarily be used in decision making and subsequent action. Deliberate effort on the part of evaluators is needed to ensure that the evaluation findings will be used appropriately. Preparing for their use involves strategic thinking and continued vigilance in looking for opportunities to communicate and influence. Both of these should begin in the earliest stages of the process and continue throughout the evaluation.THEELEMENTSOF KEY IMPORTANCE TO BE SURE THAT THE RECOMMENDATIONS FROM AN EVALUATION ARE USED ARE:DesignDesignrefers to how the evaluation's questions, methods, and overall processes are constructed. As discussed in the third step of this framework (focusing the evaluation design), the evaluation should be organized from the start to achieve specific agreed-upon uses. Having a clear purpose that is focused on the use of what is learned helps those who will carry out the evaluation to know who will do what with the findings. Furthermore, the process of creating a clear design will highlight ways that stakeholders, through their many contributions, can improve the evaluation and facilitate the use of the results.PreparationPreparationrefers to the steps taken to get ready for the future uses of the evaluation findings. The ability to translate new knowledge into appropriate action is a skill that can be strengthened through practice. In fact, building this skill can itself be a useful benefit of the evaluation. It is possible to prepare stakeholders for future use of the results by discussing how potential findings might affect decision making.For example, primary intended users and other stakeholders could be given a set of hypothetical results and asked what decisions or actions they would make on the basis of this new knowledge. If they indicate that the evidence presented is incomplete or irrelevant and that no action would be taken, then this is an early warning sign that the planned evaluation should be modified. Preparing for use also gives stakeholders more time to explore both positive and negative implications of potential results and to identify different options for program improvement.FeedbackFeedbackis the communication that occurs among everyone involved in the evaluation. Giving and receiving feedback creates an atmosphere of trust among stakeholders; it keeps an evaluation on track by keeping everyone informed about how the evaluation is proceeding. Primary intended users and other stakeholders have a right to comment on evaluation decisions. From a standpoint of ensuring use, stakeholder feedback is a necessary part of every step in the evaluation. Obtaining valuable feedback can be encouraged by holding discussions during each step of the evaluation and routinely sharing interim findings, provisional interpretations, and draft reports.Follow-upFollow-uprefers to the support that many users need during the evaluation and after they receive evaluation findings. Because of the amount of effort required, reaching justified conclusions in an evaluation can seem like an end in itself.It is not. Active follow-up may be necessary to remind users of the intended uses of what has been learned. Follow-up may also be required to stop lessons learned from becoming lost or ignored in the process of making complex or political decisions. To guard against such oversight, it may be helpful to have someone involved in the evaluation serve as an advocate for the evaluation's findings during the decision -making phase.Facilitating the use of evaluation findings also carries with it the responsibility to prevent misuse. Evaluation results are always bounded by the context in which the evaluation was conducted. Some stakeholders, however, may be tempted to take results out of context or to use them for different purposes than what they were developed for. For instance, over-generalizing the results from a single case study to make decisions that affect all sites in a national program is an example of misuse of a case study evaluation.Similarly, program opponents may misuse results by overemphasizing negative findings without giving proper credit for what has worked. Active follow-up can help to prevent these and other forms of misuse by ensuring that evidence is only applied to the questions that were the central focus of the evaluation.DisseminationDisseminationis the process of communicating the procedures or the lessons learned from an evaluation to relevant audiences in a timely, unbiased, and consistent fashion. Like other elements of the evaluation, the reporting strategy should be discussed in advance with intended users and other stakeholders. Planning effective communications also requires co