writing 2 MCQs higher order cognitive skills based on BLOOM’S TAXONOMY about Expertise-Oriented evaluation approach from the 2 books i attached . the MCQ should flow the Structure of MCQStem: opening statement or problemLead-in: the questionone right answer3 Distractor: wrong answers, distract students from right answer .1
The beginning of the twenty-first century is an exciting time for evaluation. The field is
growing. People—schools, organizations, policy makers, the public at large-are interested
in learning more about how programs work—how they succeed and how they fail. Many
people are interested in accountability from schools, government agencies, nonprofit
organizations, and corporations. They want to know whether these organizations are
doing what they claim to do. Performance measurement and outcome assessments are
expanding around the globe. People who work in organizations are also interested in
evaluation. They want to know how well they’re doing, how to tackle the tough
problems, and how to improve their performance.
Evaluation, today, is changing in a variety of ways to help stakeholders obtain this
information. Many different methods are being developed and used——a wide array of
qualitative and quantitative approaches to design and data collection, increasing
involvement of new and different stakeholders in the evaluation process, expanded
considerations of the uses of evaluation, and more effective and diverse ways to
communicate findings. Evaluation is expanding around the world and the experiences of
adapting evaluation to different settings and different cultures is enriching the field.
We hope to convey the dynamism and creativity involved in conducting evaluation to
you in this new edition. Each of us has many years’ experience in conducting evaluations
in a variety of different settings, in schools, public welfare agencies, mental health
organizations, criminal justice settings, environmental programs, nonprofit organizations,
and corporations. We also have years of experience teaching students how to use
evaluation in their own organizations or communities. Our goal is, and has always been,
to present information that readers can use either to conduct or be a participant in
evaluations that make a difference to their workplace, their clients, and their community.
Let us tell you a bit about how we hope to do that in this new edition.
As in the past, the book is organized in five parts. Part One introduces the reader to key
concepts in evaluation, as well as its history, and some of the current and emerging trends
in the field. (These include high-stakes testing in schools; performance monitoring in
cities, states, and the federal government; and technologically sophisticated means of
collecting evaluation information and reporting results.) Part ‘I‘wo gives the reader a
foundation in the different approaches used to conduct evaluation. (Determining whether
objectives have been achieved isn’t the only way to approach evaluation!) in Parts Three
and Four, the core of the book, we describe how to plan and carry out an evaluation
study. Extensive graphics, lists, and examples are used to help students learn the craft. An
ongoing case study concludes each chapter, illustrating some of the concepts described
therein. Finally, in Part Five we describe some special current issues in evaluation
(evaluating training programs, personnel, and multiple-site programs; planning and
organizational renewal) and the future of the field.
Although we present different approaches to evaluation in Part Two, our goal is to
introduce the reader to many different methods and approaches so that you can select
those most appropriate for collecting valid, reliable, and useful information that meets the
needs of stakeholders, the organization, and society. We are eclectic in our approach to
evaluation. We believe the methods and measures one chooses should match the context,
program, and questions to be answered. Thus, we present both qualitative and
quantitative methods, management—oriented and participant-oriented approaches. We
hope you will do the same.
To facilitate learning, we have continued the consistent pedagogical structure we have
used in past editions. Each chapter presents information on current and foundational
issues in a practical, accessible manner. Tables and figures are used frequently to
summarize or illustrate key points. A case study is introduced in Part Three and
continued at the end of each chapter in Parts Three and Four. All chapters conclude with
an Application Exercises and a list of Suggested References for readers to consult on that
topic. In addition, we have added two new pieces to the end of each chapter: (l) “Major
Concepts and Theories” summarizes the key issues in the chapter, and (2) “Discussion
Questions” presents issues for consideration in class discussions.
What have we changed? Each chapter has been revised by considering the most current
books, articles, and reports. Many new references and contemporary examples have been
added. The case study has been updated to reflect some trends in current practices. We
also did some reorganizing of the book to reflect changes in the field and a structure
better fitting the organization of an evaluation study. Specifically, in the previous edition,
we included one chapter on qualitative methods and one on quantitative methods. Today,
the debate over qualitative and quantitative methods is over. Most evaluators recognize
that a mix of methods is necessary to answer the vast majority of important questions in
an evaluation. Therefore, these chapters were updated and reorganized to form two new
chapters: one on issues concerning the design of the study and sampling, and another on
data collection, analysis, and interpretation. Each chapter presents an array of qualitative
and quantitative methods and approaches.
In Part Two, we largely retained our previous organization of chapters, which continue to
reflect major theoretical differences in approaches to evaluation. (A chapter on
adversary—oriented evaluations was eliminated, but its methods of presentation were
described in Chapter 16 on reporting.) Within each chapter, however, we discuss the
current status of long—established approaches and review new approaches that have
evolved from the original approaches. For example, in Chapter 4, concerning
objectives—oriented evaluation approaches, we describe new approaches that make use
of logic models and program theory to understand programs and direct the evaluation.
Perhaps the most change has occurred in participant oriented evaluation approaches in the
last few years. We review and comment on those approaches, including participatory and
empowerment evaluation, in Chapter 8. Further, applications of different approaches and
their influence on the field are discussed throughout the book. In Chapter 2, performance
measurement is reviewed. Methods for developing program theory and logic models are
discussed in Chapters II and l2 as ways to clarify the evaluation request and set
boundaries for the evaluation.
Finally, ways to use technology to improve evaluation are discussed throughout the text.
Software packages for storing, organizing, and analyzing qualitative data are reviewed.
Web-based surveys are described. Ways to use technology to distribute evaluation
findings are discussed. A new appendix has been added that lists Web sites with
evaluation-related information. These sites link readers to evaluation organizations,
reports, texts, software, and list serves to communicate with others interested in
We hope this book will inspire you to think in a new way about issues—in a questioning,
exploring, evaluative way—about programs, policy and organizational change. For those
readers who are already evaluators, this book should provide you with new perspectives
and tools for your practice. For those who are new to evaluation, this book will make you
a more informed consumer of or participant in evaluation studies or, perhaps, even
inspire you to undertake your own. Welcome to the new world of evaluation in the
We are grateful for the insightful comments of the following reviewers:
Weldon Beckner, Baylor University
Marcie Boberg, San Diego State University
Jim Connors, The Ohio State University
Traci Webb Dempsey, West Virginia University
Ken Hancock, University of Tulsa
Beverly lrby, Sam Houston State University
Kristen Renn, Southern Illinois University-Carbondale
Susan Twombley, University of Kansas
Wendon Waite, Boise State University
This initial section of our text provides the background necessary for the beginning
student to understand the chapters that follow. In it, we attempt to accomplish three
things: to explore the concept of evaluation and its various meanings, to review the
history of program evaluation and its development as a discipline, and to acquaint the
reader with some of the current controversies and trends in the field.
In Chapter 1, we discuss the basic purposes of evaluation and the varying roles evaluators
play. We define evaluation specifically, and we introduce the reader to several different
concepts and distinctions important to evaluation. In Chapter 2, we summarize the origins
of today’s evaluation tenets and practices and the historical evolution of evaluation as a
growing force in improving our society’s public, nonprofit, and corporate programs. We
then review some recent developments and trends in evaluation that have marked the past
decade. These movements were either not apparent or still too embryonic to deserve
attention when our 1997 text was written. Today, no discussion of evaluation should
overlook movements such as the expanding uses of technology in evaluation, the
emergence of performance measurement and standards—based education and the
widespread attention these indicators receive, current controversies over the role and
meaning of advocacy in evaluation, and the burgeoning growth and practice of evaluation
in other countries.
Our intent in Part One is to provide the reader with information essential to understanding
not only the content of the sections that follow but also the wealth of material that exists
in the literature on program evaluation. Although the content in the remainder of this
book is intended to apply to the evaluation of programs, most of it applies as well to
projects, products, and processes used in those areas, indeed, to any object of an
evaluation. In Part Two we will introduce you to different approaches to evaluation to
enlarge your understanding of the diversity of choices that evaluators and stakeholders
make in undertaking evaluation.
Evaluations Basic ,
Purpose, Uses, and
1. How does evaluation serve society? Why is it important?
2. What is the difference between formal and informal evaluation?
3. What are some purposes of evaluation? What roles can the evaluator play? Give some
examples from your experience with evaluation.
4. What are the major differences between formative and summative evaluations?
5. What is an example of an issue an evaluator might address in a needs assessment, a
process evaluation, and an outcome evaluation?
6. Under what circumstances might an external evaluator be preferable to an internal
The challenges confronting our society in the twenty-first century are enormous. Few of
them are really new. In the United States and many other countries, the public and
nonprofit sectors are grappling with complex issues: educating children for the new
century; reducing functional illiteracy; strengthening families; training versatile
employees; combating disease and mental illness; fighting discrimination; reducing
crime, drug abuse, and child and spouse abuse. More recently, pursuing and balancing
environmental and economic goals and working to insure peace and economic growth in
developing countries have become prominent concerns. Each new decade seems to add to
the list of challenges as society and the problems it confronts become increasingly
As society’s concern over these pervasive and perplexing problems has intensified, so
have its efforts to resolve them. Collectively, local, regional, and national agencies have
launched a veritable flotilla of programs aimed at identifying and eliminating the
underlying causes of these problems. Specific programs judged to have been ineffective
have been “mothballed” or sunk outright, usually to be replaced by a new program
designed to attack the problem in a different and, hopefully, more effective—manner.
In more recent years, scarce resources and. budget deficits have posed still · more
challenges as administrators and program managers have had to struggle to keep their
most promising programs afloat. Increasingly, policy makers and managers have been
faced with tough choices, being forced to cancel some programs or program components
to provide sufficient funds to launch new ones or continue others.
To make such choices intelligently, policy makers need good information about the
relative effectiveness of each program Which programs are working well? Which poorly?
What are the programs’ relative costs and benefits? Similarly, each program manager
needs to know how well different parts of the program are working. Are some parts
contributing more than others? What can be done to improve those parts of the program
that are not contributing what they should? Have all aspects of the program been thought
through carefully at the planning stage, or is more planning needed? What is the theory or
logic model for the program’s effectiveness? What adaptations would make the program
Answering such questions is the major task of program evaluation. The major task of this
book is to introduce you to evaluation and the vital role it plays in virtually every sector
of modern society. However, before we can hope to convince you that good evaluation is
an essential part of good programs, we must help you understand at least the basic
concepts in each of the following areas:
How we-and others-—define evaluation
How formal and informal evaluation differ
The basic purposes—and various uses—of formal evaluation
The distinction between basic types of evaluation
The distinction between internal and external evaluators
Evaluation’s importance and its limitations
Covering all of those areas thoroughly could fill a whole book, not just one chapter of an
introductory text. In this chapter, we provide only brief coverage of each of these topics
to orient you to concepts and distinctions necessary to understand the content of later
A Brief Definition of Evaluation
In the previous section, the perceptive reader will have noticed that the term evaluation
has been used rather broadly without definition beyond what was implicit in context. But
the rest of this chapter could be rather confusing if we did not stop briefly to define the
term more precisely. Intuitively, it may not seem difficult to define evaluation. For
example, one typical dictionary definition of evaluation is “to determine or fix the value
of: to examine and judge? Seems quite straightforward, doesn’t it? Yet among
professional evaluators, there is no uniformly agreed—upon definition of precisely what
the term evaluation means. In fact, in considering the role of language in evaluation,
Michael Scriven, one of the founders of evaluation, recently noted there are nearly sixty
different terms for evaluation that apply to one context or another. These include adjudge,
appraise, analyze, assess, critique, examine, grade, inspect, judge, rate, rank, review
scare, study, test and so on (cited in Patton, 2000, p. 7). While all these terms may appear
confusing. Scriven (cited in Patton, 2000) notes that the variety of uses of the term
evaluation “reflects not only the immense importance of the process of evaluation in
practical life, but the explosion of a new area of study” (p. 7). This chapter will introduce
the reader to the array of variations in application, but, at this point, we would like to
focus on one definition that encompasses many others.
Early in the development of the field, Scriven (1967) defined evaluation as judging the
worth or merit of something. Many recent definitions encompass this original definition
of the term (Mark, Henry, & Julnes, 1999; Stake, 2000a; Stufflebeam, 2001b). We concur
that evaluation is determining the worth or merit of an evaluation object (whatever is
evaluated). More broadly, we define evaluation as the identification, clarification, and
application of defensible criteria to determine an evaluation object’s value (worth or
merit) in relation to those criteria. Note that this definition requires identifying and
clarifying defensible criteria. Often, in practice, our judgments of evaluation objects
differ because we have failed to identify and clarify the means we, as individuals, use to
judge an object. One educator may value a reading curriculum because of the love it
instills for reading; another may disparage the program because it does not move the
child along as rapidly as other curricula in helping the student to recognize and interpret
letters, words, or meaning. These educators differ in the value they assign to the curricula
because their criteria differ. One important role of an evaluator is to help stakeholders
articulate their criteria and to stimulate dialogue about them. Our definition, then,
emphasizes using those criteria to judge the merit or worth of the product.
Evaluation uses inquiry and judgment methods, including: (1) determining standards for
judging quality and deciding whether those standards should be relative or absolute, (2)
collecting relevant information, and (3) applying the standards to determine value,
quality, utility, effectiveness, or significance. It leads to recommendations intended to
optimize the evaluation object in relation to its intended purpose(s) or to help
stakeholders determine whether the evaluation object is worthy of adoption, continuation,
Differences in Evaluation and Research
It may be important here to distinguish between evaluation and research. While some
methods of evaluation emerged from social science research traditions, there are
important distinctions between evaluation and research.) One of those distinctions is
purpose. Research and evaluation seek different ends. The primary purpose of research is
to add to knowledge in a field, to contribute to the growth of theory. While the results of
an evaluation study may contribute to knowledge development (Mark, Henry, Julnes,
1999), that is a secondary concern in evaluation. Evaluation’s primary purpose is to help
those who hold a stake in whatever is being evaluated (stakeholders), often consisting of
many different groups, make a judgment decision, Research seeks conclusions;
evaluation leads to judgments, Valuing is the sine qua non of evaluation. A touchstone
for discriminating between an evaluator and a researcher is to ask whether the inquiry he
is conducting would be regarded as a failure if it produced no data on the usefulness of
the thing being studied. A researcher answering strictly as a researcher will probably say
These differing purposes have implications for the approaches one takes. Re-search is the
quest for laws—statements of relationships among two or more variables. Thus, the
purpose of research is typically to explore and establish causal relationships. Evaluation,
instead, seeks to describe a particular thing. Sometimes, describing that thing involves
examining causal relationships; often, it does not. Whether the evaluation focuses on a
causal issue depends on the needs of the stakeholders.
This highlights another difference in evaluation and research-who sets the agenda. In
research, the hypotheses to be investigated are chosen by the researcher and his
assessment of the appropriate next steps in developing theory in the discipline or field of
knowledge. In evaluation, the questions to be answered are not those of the evaluator, but
rather, come from many sources, including those of significant stakeholders. An
evaluator might suggest questions, but would never determine the focus of the study
without consultation with stakeholders. Such actions, in fact, would be unethical in
Another difference concerns generalizability of results. Given evaluations purpose of
describing a particular thing, good evaluation is quite specific to the context in which the
evaluation object rests. Stakeholders are making judgments about a particular evaluation
object and have less desire to generalize to other settings than a researcher would. (Note
that the setting or context may be large, national programs with many sites, or small, a
program in one school.) In contrast, because the purpose of research is to add to general
knowledge, the methods are designed to maximize generalizability to many different
settings. lf one’s findings are to add to knowledge in a field, ideally, the results should
transcend the particulars of time and setting.
Research and evaluation differ further in the criteria or standards used to judge their
adequacy. Two important criteria for judging the adequacy of research are internal
validity, or causality, and external validity, or generalizability to other
Research itself varies across a wide spectrum, from basic research (which we use here to
highlight the distinction of research and evaluation) to applied research, which sometimes
resembles evaluation in being applied to solve educational, social, and private sector
problems or issues. For a more extended discussion of the differences and similarities of
research and evaluation. see Worthen and Sanders, 1973.
settings and other times. These criteria, however, are not sufficient, or appropriate, for
judging the quality of an evaluation. Instead, evaluations are typically judged by their
accuracy (the extent to which the information obtained is an accurate reflection——a
one-to-one correspondence—with reality), utility (the extent to which the results serve
practical information needs of intended users), feasibility (the extent to which the
evaluation is realistic, prudent, diplomatic, and frugal), and propriety (the extent to which
the evaluation is dpne legally and ethically, protecting the rights of those involved).
These standards were developed by the _ Joint Committee on Standards for Evaluation to
help both users of evaluation and evaluators themselves understand what evaluations
should do. (See Chapter 18 for more on the Standards.)
Finally, the preparation of researchers and evaluators differs significantly. Re- searchers
are trained in depth in a single discipline, their field of inquiry. This approach is
appropriate because the researcher’s work, in almost all cases, will remain within a single
discipline or field. Evaluators, by contrast, are responding to the needs of clients and
stakeholders with many different information needs and operating in many different
settings. As such, evaluators’ education must be interdisciplinary. Only through
interdisciplinary training can evaluators become sensitive to the wide range of
phenomena to which they must attend if they are to properly assess the worth of a
program or policy. Evaluators must be broadly familiar with a wide variety of methods
and techniques so that they can choose those most appropriate for the particular program
and needs of stakeholders. Finally, evaluators differ from researchers in that they must
establish personal working relationships with clients. As a result, they require preparation
in inter-personal and communication skills (Fitzpatrick, 1994).
Sanders (1979) identified several general areas of competence important for evaluators.
These included the ability to describe the object and context of an evaluation; to
conceptualize appropriate purposes and frameworks for the evaluation; to identify and
select appropriate evaluation questions, information needs, and sources of information; to
select means for collecting and analyzing information; to determine the value of the
object of an evaluation; to communicate plans and results effectively to audiences; to
manage the evaluation; to maintain ethical standards; to adjust for external factors
influencing the evaluation; and to evaluate the evaluation (meta evaluation).
In summary, research and evaluation differ in their purposes and} as a result, in the roles
of the evaluator and researcher in their work, their preparation, the generalizability of
their results, and the criteria used to judge their work. These distinctions lead to many
differences in the manner in which research and evaluation are conducted.
Of course, evaluation and research sometimes overlap. An evaluation study may add to
our knowledge of laws or theories in a discipline. Research can inform our judgments and
decisions regarding a program or policy. Yet, fundamental distinctions remain. Our
discussion above highlights these differences to help those new to evaluation to see the
ways in which evaluators behave differently than researchers. Evaluations may add to
knowledge in a field, contribute to theory development, establish causal relationships,
and provide explanations for the relationship between phenomena, but that is not its
primary purpose its primary purpose is to assist stakeholders in making value judgments
and decisions about whatever is being evaluated.
We will discuss shortly the matter of how one’s definition of evaluation is theproduct of
what one believes the purpose of evaluation to be. First, however, we need to distinguish
between systematic, formal evaluation studies—the focus of this book—and the much
more informal, even casual evaluation that is a part of our everyday life.
Informal versus Formal Evaluation
Evaluation is not a new concept. If one focuses on the aspect of ”examining and judging,
to determine value,” then the practice of evaluation doubtlessly long preceded its
definition, tracing its roots back to the beginning of human history. Neanderthals
practiced it when determining which types of saplings made the best spears, as did
Persian patriarchs in selecting the most suitable suitors for their daughters, or English
yeomen who abandoned their own crossbows in favor of the Welsh longbow. They had
observed that the longbow could send an arrow through the stoutest armor and was
capable of launching three arrows while the crossbow sent only one. Although no formal
evaluation reports on bow comparisons” have been unearthed in English archives, it is
clear that the English evaluated the longbow’s value for their purposes, deciding that its
use would strengthen them in their struggles with the French. So they relinquished their
crossbows, perfected and improved on the Welsh longbow, and the English armies
proved invincible during most of the Hundred Years’ War.
By contrast. French archers experimented briefly with the longbow, then went back to the
crossbow—and continued to lose battles. Such are the perils of poor evaluation!
Unfortunately, the faulty judgment that led the French to persist in using an inferior
weapon represents an informal evaluation pattern that has been repeated too often
As human beings we evaluate everyday. Practitioners, managers, and policy makers make
judgments about students, clients, personnel, programs, and policies. These judgments
lead to choices and decisions. They are a natural part of life. A school principal observes
a teacher working in the classroom and forms some judgments about that teacher’s
effectiveness. A program officer of a foundation visits a substance abuse program and
forms a judgment about the program’s quality and effectiveness. A policy maker hears a
speech about a new method for delivering health care to uninsured children and draws
some conclusions about whether that method would work in his state. Such judgments are
made every day in our work. These judgments, however, are based on informal, or
Informal evaluations can result in faulty or wise judgments. But, they are characterized
by an absence of breadth and depth because they lack systematic procedures and formally
collected evidence. As humans, we are limited in making judgments by both the lack of
opportunity to observe many different settings, clients, or students and by our own past
experience, which both informs and biases our judgments. Informal evaluation does not
occur in a vacuum. Experience, instinct, generalization, and reasoning can all influence
the outcome of informal evaluations, and any or all of these may be the basis for sound,
or faulty, judgments. Did we see the teacher on a good day or a bad one? How did our
past experience with similar students, course content, and methods influence our
judgment? When we conduct informal evaluations, we are less cognizant of these
limitations. However, when formal evaluations are not possible, informal evaluation
carried out by knowledgeable, experienced, and fair people can be very useful indeed. It
would be unrealistic to think any individual, group, or organization could evaluate
formally everything it does. Often informal evaluation is the only practical approach. (In
choosing an entrée from a dinner menu, only the most compulsive individual would
conduct exit interviews with restaurant patrons to gather data to guide that choice.)
Informal and formal evaluation, however, form a continuum, Schwandt (2001)
acknowledges the importance and value of everyday judgments and argues that
evaluation is not simply about methods and rules. He sees the evaluator as helping
practitioners to “cultivate critical intelligence.” Evaluation, he notes, forms a middle
ground “between over reliance on and over application of method, general principles, and
rules to making sense of ordinary life on one hand, and advocating trust in personal
inspiration and sheer intuition on the other” (p. 86). Mark, Henry, and Julnes (1999) echo
this concept when they describe evaluation as a form of assisted sense making.
Evaluation, they observe, ‘has been developed to assist and extend natural human abilities
to observe, understand, and make judgments about policies, programs, and other objects
in evaluation” (p. 179).
Evaluation, then, is a basic form of human behavior. Sometimes it is thorough, structured,
and formal. More often it is impressionistic and private. Our focus is on the more formal,
structured, and public evaluation. We want to inform readers of various approaches and
methods for developing criteria and collecting information about alternatives. For those
readers who aspire to become professional evaluators, we will be introducing you to the
approaches and methods used in these formal studies. For all readers, practitioners and
evaluators, we hope to cultivate that critical intelligence, to make you cognizant of the
factors influencing your more informal judgments and decisions.
Distinguishing between Evaluation’s Purposes
and Evaluators’ Roles and Activities ·
We mentioned earlier that how one defines evaluation stems from what one perceives
evaluation’s basic purpose to be. We treat that topic in more depth in this section as we
attempt to separate the basic purpose of evaluation from the roles a professional evaluator
can play in different evaluations and the activities under taken to complete an evaluation
Purposes of Evaluation
Just as evaluators are not all agreed on one final, authoritative definition of evaluation,
they are by no means unanimous in what they believe evaluation’s purpose to be
consistent with our earlier definition of evaluation, we believe that the basic purpose of
evaluation isto render judgments about the value of whatever is being evaluated. Many
different uses may be made of those value judgments, as we shall discuss shortly, but in
every instance the central purpose of the evaluative act is the same: to determine the
merit or worth of some thing (in program evaluation, of the program or some part of it).
This view parallels that of Scriven (1967), who was one of the earliest to outline the
purpose of formal evaluation. In his seminal paper, “The Methodology of Evaluation,” he
noted that evaluation plays many roles but argued that it has a single goal: to determine
the worth or merit of whatever is evaluated. He distinguished between the goal of
evaluation, providing answers to significant evaluative questions that are posed, and
evaluation roles, the ways in which those answers are used. According to Scriven,
evaluation’s goal usually relates to value questions, requires judgments of worth or merit,
and is conceptually distinct from its roles. Scriven made the distinction this way:
In terms of goals, we may say that evaluation attempts to answer certain types of
questions about certain entities. The entities are the various . . . instruments
(processes, personnel, procedures, programs, etc.). The types of question include
questions of the form: How well does this instrument perform (with respect to
such-and-such criteria)? Does it perform better than this other instrument? What
merits, or drawbacks does this instrument have . . . ? Is the use of this instrument
worth what it’s costing?
But the roles which evaluation has in a particular. . . context may be enormously
various; it may form part of a . . . training activity, of the process of curriculum
development, of a field experiment ,… of . . . an executive training program, a
prison, or a classroom (pp. 40-41).
I.n the decades since this original distinction between evaluation’s basic purpose (goal)
and its diverse uses (roles), Scriven (1980, 1991a, l99lc) has greatly elaborated his view
without abandoning it. While he has more recently added that ”evaluation is concerned
with significance, not just merit and worth” (1994, p. 380), he continues to present
powerful philosophical arguments that evaluation of any object (e.g., a marketing plan, a
school curriculum, or a residential treatment facility for drug abusers) is undertaken to
identify and apply defensible criteria to determine its worth, merit, or quality.
This view of evaluations basic purpose has been most widely adopted by prominent
evaluators working in the field of education, ultimately being incorporated into the
Program Evaluation Standards developed by the Joint Com Eye on Standards for
Educational Evaluation (1994). Yet, while this view is broadly held, other articulate
colleagues have argued that evaluation has several purposes. For example, Talmage
(1982) notes that ”three purposes appear most frequently in definitions of evaluation: (1)
to render judgments on the worth of a program; (2) to assist decision makers responsible
for deciding policy; and (3) to serve a political function” (p. 594). Talmage also notes
that, while these purposes are not mutually exclusive, they are clearly different. Rallis
and V Rossman (2000) have argued that the fundamental purpose of evaluation is
learning helping practitioners and others better understand and
Some recent discussions of the purposes of evaluation move beyond these more
immediate purposes to evaluation’s ultimate impact on society. Weiss (1998b) and Henry
(2000) have argued that the purpose of evaluation is to bring about social betterment.
Mark, Henry, and Julnes (1999) define achieving social betterment as “the alleviation of
social problems, meeting of human needs (p. 190). Chelimsky (1997) takes a global
perspective, extending evaluation’s context in the new century to worldwide challenges
rather than domestic ones: new technologies, demographic imbalances across nations,
environmental protection, sustainable development, terrorism, human rights, and other
issues that extend beyond one program or even one country. House and Howe (1999)
argue that the goal of evaluation is to foster deliberative democracy. This goal, which
they recognize as idealistic, calls on the evaluator to work to help less powerful
stakeholders gain a voice and to stimulate dialogue among stakeholders in a democratic
Mark, Henry, and Julnes (1999) have articulated four different purposes for evaluation:
assessment of merit and worth, oversight and compliance, program and organizational
improvement, and knowledge development. They note that oversight and compliance is
often viewed as achieving the purpose of assessing merit and worth, but because such
activities generally focus only on whether the designated services are delivered to the
appropriate clients, Mark and his coauthors do not see them as effectively contributing to
decisions about overall merit and worth. Similarly, they separate program and
organizational improvement from merit and worth because, while such activities can
focus on the merit and worth of subsets of programs, such evaluations do not lead to
overall judgments of merit and worth. They note, as do we, that knowledge development
can be a useful outcome or corollary to evaluation. We would emphasize, however, that it
is not the primary purpose.
We will expand on these differing views of evaluation later in the book. At this point, we
want to present them to introduce the reader to differing views on purposes. These views
are useful in shedding light for the reader new to evaluation on the types of things
evaluation might do and what evaluation means. Determining merit and worth is a quite
abstract concept. The views of these different authors, we would argue, help illustrate
what determining merit and worth means and what it can involve. For this text) we will
continue to define the primary purpose of evaluation as determining merit and worth
because it emphasizes the valuing component of evaluation that we see as critical and
because we believe many, if not most, of these distinctions can be subsumed within
determining merit and worth.
Roles and Activities of Professional Evaluators
Scriven (1967) discusses the roles of evaluation in terms of how evaluation is . used, but
evaluators as practitioners play numerous roles and conduct multiple activities in
performing evaluation. Just as discussions on the purposes of evaluation help us to better
understand what we mean by determining merit and worth, a brief discussion of the roles
and activities pursued by evaluators will acquaint the reader with the full scope of
activities that professionals in the field pursue.
A major role of the evaluator that many in the field emphasize and discuss is that of
encouraging use (Patton, 1996; Shadish, 1994). While the means for encouraging use and
the anticipated type of use may differ, considering use of results is a major role of the
evaluator. In Chapter 16, we will elaborate the types of uses of evaluation and ways to
maximize use. Henry (2000), however, has cautioned that focusing primarily on use can
lead to evaluations focused solely on program and organizational improvement and,
ultimately, avoiding final decisions about merit and worth. His concern is appropriate;
however, if the audience for the evaluation is one that is making decisions about the
programs merit and worth, this problem may be avoided. (See discussion of formative
and summative evaluation in this chapter.) Use is certainly central to evaluation, as
demonstrated by the prominent role it plays in the professional standards and codes of
evaluation (see Chapter 16).
Others’ discussions of the role of the evaluator illuminate the ways in which evaluators
might interact with stakeholders and other users. Rallis and Rossman (2000) see the role
of the evaluator as a critical friend. As noted, they view the ‘ primary purpose of
evaluation as learning. They then argue that, for learning to occur the evaluator has to be
a trusted person, ”someone the emperor knows and can listen to. She is more friend than
judge, although she is not afraid to offer judgments” (p. 83). Schwandt (2001) describes
the evaluator in the role of a teacher, helping practitioners develop critical judgment.
Patton (1996) envisions evaluators in many different roles including facilitator,
collaborator, teacher, management consultant, OD specialist, and soeial—change agent.
These roles reflect his approach to working with organizations to bring about
developmental change. Preskill 8 Torres (1999) stress the role of the evaluator in bringing
about organizational learning and instilling a learning environment. Mertens (1999),
Chelim—sky (1998), and Greene (1997) emphasize the important role of including
stakeholders, who often have been ignored by evaluation (see Chapter 2 on recent
trends). House and Howe (1999) argue that a critical role of the evaluator is stimulating
dialogue among various groups. The evaluator does not merely report information, or
provide it to a limited or designated key stakeholder who may be most likely to use the
information, but instead stimulates dialogue, often bringing in disenfranchised groups to
encourage democratic decision making.
Evaluators also have a role in program planning. Bickman (2001) and Chen (1990)
emphasize the important role evaluators play in helping articulate program theories or
logic models. Wholey (1996) argues that a critical role for evaluators in performance
measurement is helping policymakers and managers select the performance dimensions
to be measured as well as the tools to use in measuring those dimensions.
Certainly, too, evaluators can play the role of the scientific expert. As Lipsey (2000)
notes, practitioners want and often need evaluators with the ”expertise to track things
down, systematically observe and measure them, and compare, analyze, and interpret
with a good faith attempt at objectivity” (p. 222). Evaluation emerged from social science
research. While we will describe the growth and emergency of new approaches and
paradigms, and the role of evaluators in educating users to our purposes, stakeholders
typically contract with evaluators to provide technical or “scientific” expertise and/or an
outside ”objective” opinion.
Thus, the evaluator takes on many roles. In noting the tension between advocacy and
neutrality, Weiss (1998b) writes that the role(s) evaluators play will depend heavily on
the context of the evaluation. The evaluator may serve as a teacher or critical friend in an
evaluation designed to improve the early stages of a new reading program. The evaluator
may act as a facilitator or collaborator with a community group appointed to explore
solutions to problems of underemployment in the region. In conducting an evaluation on
the employability of new immigrant groups to a state, the evaluator may act to stimulate
dialogue among immigrants, policy makers, and non—immigrant groups competing for
employment. Finally, the evaluator may serve as an outside expert in designing and
conducting a study for Congress on the effectiveness of annual testing in improving
In carrying out these roles, evaluators undertake many activities. These include
negotiating with stakeholder groups to define the purpose of evaluation, developing
contracts, hiring and overseeing staff, managing budgets, identifying disenfranchised or
underrepresented groups, working with advisory panels, collecting and analyzing and
interpreting qualitative and quantitative information, communicating frequently with
various stakeholders to seek input into the evaluation and to report results, writing
reports, considering effective ways to disseminate information, meeting with the press
and other representatives to report on progress and results, and recruiting others to
evaluate the evaluation (meta evaluation). These, and many other activities, constitute the
work of evaluators. Today, in many organizations, that work may be conducted by people
who are formally trained and educated as evaluators, attend professional conferences and
read widely in the field, and identify their professional role as an evaluator or by staff
who have many other responsibilities, some managerial, some direct work with students
or clients, and some evaluation tasks thrown into the mix. Each of these will assume
some of the roles described above and will conduct many of the tasks listed.
Uses and Objects of Evaluation
At this point, it might be useful to describe some of the ways in which evaluation can
potentially be used.`An exhaustive list would be prohibitive, filling the rest of this book
and more. Here we provide only a few representative examples of uses made of
evaluation in selected sectors of society.
Examples of Evaluation Use in Education
1. To empower teachers to have more say about how school budgets are allocated
2. To judge the quality of school curricula in specific content areas
3. To accredit schools that meet minimum accreditation standards
4. To determine the value of a middle school’s block scheduling
5. To satisfy an external funding agency’s demands for reports on effectiveness of
school programs it supports
6. To assist parents and students in selecting schools in a district with school choice
7. To help teachers improve their reading program to encourage more voluntary reading
Examples of Evaluation Use in Other Public and Nonprofit Sectors
1. To decide whether to implement an urban development program
2. To establish the value of a job-training program
3. To decide whether to modify a low-cost housing projects rental policies
4. To improve a recruitment program for blood donors
5. To determine the impact of a prison’s early-release program on recidivism
6. To gauge community reaction to proposed fire-burning restrictions to improve air
7. To determine the cost—benefit contribution of a new sports stadium for a
Examples of Evaluation Use in Business and Industry
1. To improve a commercial product
2. To judge the effectiveness of a corporate training program on teamwork
3. To determine the effect of a new flextime policy on productivity, recruitment, and
4. To identify the contributions of specific programs to corporate profits
5. To determine the public’s perception of a corporation’s environmental image
6. To recommend ways to improve retention among younger employees
7. To study the quality of performance-appraisal feedback
One additional comment about the use of evaluation in business and industry may be
warranted. Evaluators unfamiliar with the private sector are sometimes unaware that
personnel evaluation is not the only use made of evaluation in business and industry
settings. Perhaps that is because the term evaluation has been absent from the descriptors
for many corporate activities and programs that, when examined, are decidedly
evaluative. Activities labeled as quality assurance, quality control, Total Quality
Management (TQM), or Continuous Quality Improvement (CQI) turn out, on closer
inspection, to possess many characteristics of program evaluation. In Chapter 20 we treat
this topic more fully. Suffice it to say here that many uses are made of evaluation
concepts in business and industry.
Uses of Evaluation Are Generally Applicable. As should be obvious by now, uses of
evaluation are clearly portable, if one wishes to use evaluation in the same way in another
arena. The use of evaluation may remain constant, but the entity it is applied to——that
is, the object of the evaluation—may vary widely. Thus, evaluation may be used to
improve a commercial product, a community training program, or a school district’s
student assessment system. It could be used to build organizational capacity in the Xerox
Corporation, the E. F. Lilly Foundation, the Minnesota Department of Education, or the
Utah Division of Family Services. Evaluation can be used to empower parents in the San
Juan County Migrant Education Program, workers in the U.S. Postal Service, employees
of Barclays Bank of England, or residents in east Los Angeles. Evaluation can be used to
provide information for decisions about programs in vocational education centers,
community mental health clinics, university medical schools, or county cooperative
extension offices. Such examples could be multiplied ad infinitum, but these should
suffice to make our point.
A Word about the Objects of Formal Evaluation Studies. As is evident from the previous
discussion, formal evaluation studies have been conducted to answer questions about a
wide variety of entities, which we have referred to as evaluation objects. The evaluation
object is whatever is being evaluated. Like many disciplines, evaluation has developed its
own technical terminology. For example, the word evaluand is sometimes used to refer to
the evaluation object, unless it is a person, who is then an evaluee (Scriven, l991a).
while we do not mind precise language, we see no need to use new terminology when
familiar terms will do. Thus, except as they may appear in quoted material, we will not
use evaluand or evaluee further, preferring to refer to both as objects of the evaluation.
In some instances, so many evaluations are conducted of the same type of evaluation
object that it prompts suggestions for evaluation techniques found to be particularly
helpful in evaluating something of that particular type. An example would be
Kirkpatrick’s (1983) model for evaluating training efforts. In several areas, concern about
how to evaluate broad categories of objects effectively has led to the development of
various subareas within the field of evaluation, such as product evaluation, personnel
evaluation, program evaluation, policy evaluation, and performance evaluation.
Some Basic Types of Evaluation
Formative and Summative Evaluation
Scriven (1967) first distinguished between the formative and summative roles of
evaluation. Since then, the terms have become almost universally accepted in the field. In
practice, distinctions between these two types of evaluation may blur somewhat, but the
terms serve an important function in highlighting the types of judgments, decisions, or
choices that evaluation can serve. The terms, in fact, contrast two different types of
actions that stakeholders might take as a result of evaluation.
An evaluation is considered to be formative if the primary purpose is to provide
information for program improvement. Often, such evaluations provide information to
judge the merit or worth of a part of a program. Three examples follow:
1. Planning personnel in the central office of Perrymount School District have been asked
by the school board to plan a new, and later, school day for the local high schools based
on research showing adolescents’ biological clocks cause them to be more groggy in the
early morning hours and parental concerns about teenagers being released from school as
early as 2:30 nm. A formative evaluation will collect information (surveys, interviews,
focus groups) from parents, teachers and school staff, and students regarding their views
on the calendar and visit other schools using similar calendars to provide information for
planning the schedule. The planning staff will give the information to the Late Schedule
Advisory Group, which will make final recommendations for the new schedule.
2. Staff with supervisory responsibilities at the Akron County Human Resources
Department have been trained in a new method for conducting performance appraisals.
One of the purposes of the training is to improve the performance appraisal interview so
that employees receiving the appraisal feel motivated to improve their performance. The
trainers would like to know if the information they are providing on conducting
interviews is useful. They plan to use the results to revise this portion of the training
program. A formative evaluation might observe supervisors conducting actual, or mock.
interviews, as well as interviewing or conducting focus groups with both supervisors who
have been trained and employees who have been receiving feedback. Feedback for the
formative evaluation might also be collected from participants in the training through a
reaction survey delivered either at the conclusion of the training or a few weeks after the
training ends, when trainees have had a chance to practice the interview.
3. A mentoring program has been developed and implemented to help new teachers in the
classroom. New teachers are assigned a mentor, a senior teacher who will provide them
with individualized assistance on issues ranging from discipline to time management. The
focus of the program is on helping mentors learn more about the problems new teachers
are encountering and helping them find solutions. Because the program is so
individualized, the assistant principal responsible for overseeing the program is
concerned with learning whether it is being implemented as planned. Are mentors
developing a trusting relationship with the new teachers and learning about the problems
they encounter? What are the typical problems encountered? The array of problems? For
what types of problems are mentors less likely to be able to provide effective assistance?
Interviews, logs or diaries, and observations will be used to collect data to address these
issues. The assistant principal will use the results to consider how to better train and lead
In contrast to formative evaluations, which focus on program improvement, summative
evaluations are concerned with providing information to serve decisions or assist in
making judgments about program adoption, continuation, or expansion. They assist with
judgments about a program’s overall worth or merit in relation to important criteria. More
recently, Scriven (199la) has defined summative evaluation as “evaluation done for, or
by, any observers or decision makers (by contrast with developers) who need valuative
conclusions for any other reasons besides development” (p. 20). Robert Stake has
memorably described the distinction between the two in this way: ”When the cook tastes
the soup, that’s formative evaluation; when the guest tastes it, that’s summative
evaluation (cited by Scriven, 1991, p. 19). in the examples below we extend the earlier
formative evaluations into summative evaluations.
1. After the new schedule is developed and implemented, a summative evaluation might
be conducted to determine whether the schedule should be continued and expanded to
other high schools in the district. The school board might be the primary audience for this
information because it is typically in a position to make the judgments concerning
continuation and expansion or termination, but others—central office administrators,
principals, parents, students, and the public at large—might be interested stakeholders as
well. The study might collect information on attendance, grades, and participation in
after—school activities. Other unintended side effects might be examined, such as the
impact of the schedule on delinquency, opportunities for students to work after school,
and other afternoon activities.
2. To determine whether the performance appraisal program should be continued, the
director of the Human Resource Department and his staff might ask for an evaluation of
the impact of the new performance appraisal on job satisfaction and performance.
Surveys of employees and existing records on performance might serve as key methods
of data collection.
3. Now that the mentoring program for new teachers has been ”tinkered with” for a
couple of years using the results of the formative evaluation, the principal wants to know
whether the program should be continued. The summative evaluation will focus on
teacher turnover, satisfaction, and performance.
Note that the audiences for formative and summative evaluation are very different. In
formative evaluation, the audience is generally the people delivering · the program or
those close to it, in our examples, those responsible for developing the new schedule,
delivering the training program, or managing the mentoring program. Because formative
evaluations are designed to improve programs, it is critical that the primary audience be
people who are in a position to make changes in the program and its day-to-day
operations. Summative evaluation audiences include potential consumers (students,
teachers, employees, managers, or health officials in agencies that could adopt the
program), funding sources (taxpayers or a funding agency), and supervisors and other
officials, as well as program personnel. The audiences for summative evaluations are
often policymakers or administrators, but can, in fact, be any audience with the ability to
make a “go-no go” decision. Teachers make such decisions with curricula. Consumers
(clients, parents, students) make decisions about whether to participate in a program
based on summative information or their judgments about the overall merit or worth of a
A Balance between Formative and Summative. It should be apparent that both formative
and summative evaluation are essential because decisions are needed during the
developmental stages of a program to improve and strengthen it, and again, when it has
stabilized, to judge its final worth or determine its future. Unfortunately sorne
organizations focus too much of their work on summative evaluations. This trend is noted
in the emphases of many state departments of education on whether schools achieve
certain standards. An undue emphasis on summative evaluation can be unfortunate
because the development process, without formative evaluation, is incomplete and
inefficient. Consider the foolishness of developing a new aircraft design and submitting it
to a “summative” test flight without first testing it in the “formative” wind tunnel.
Program “test flights” can be expensive, too, especially when we haven’t a clue about the
probability of success.
Failure to use formative evaluation is myopic, for formative data collected early can help
rechannel time, money, and all types of human and material resources into more
productive directions. Evaluation conducted only when a project nears completion may
simply come too late to be of much help. Apparently. many instructional designers and
trainers understand this point. Zemke (l9S5) surveyed readers of Training magazine and
found that over 60 percent reported that they used formative evaluation in their training
activities. In a later survey of corporate training officials, Tessmer and Wedman (1992)
found that nearly half of their respondents reported that they use formative evaluation.
Conversely, some organizations may avoid summative evaluations. Evaluating for
improvement is critical, but, ultimately, many products and programs should be judged
for their overall merit and worth. Henry (2000) has noted that evaluations emphasis on
encouraging use of results can lead us to serving incremental, often formative, decisions
and may steer us away from the overall purpose of evaluation, determining merit and
worth. While organizations may engage in more summative evaluations, Scriven (1996)
has noted that professional evaluators are more frequently involved in the formative role
and often obtain more satisfaction from it. As a result, he has often come to the defense
of summative evaluations for purposes of balance.
Although formative evaluations more often occur in early stages of a program’s
development and summative evaluations more often occur in their later stages, as these
two terms imply, it would be an error to think they are limited to those time frames.
Well—established programs can benefit from formative evaluations. Some new programs
are so problematic that summative decisions are made to discontinue. However, the
relative emphasis on formative and summative evaluation changes throughout the life of
a program, as suggested in Figure 1.1, although this generalized concept obviously may
not precisely fit the evolution ofany particular program.
Two important factors that influence the usefulness of formative evaluation are control
and timing. If suggestions for improvement are to be implemented, then it is important
that the formative study collect data on variables over which program administrators have
some control. Also, information that reaches those administrators too late for use in
improving the program is patently useless. Summative evaluations must attend to the
timing of budgetary and legislative decisions that may affect program adoption,
continuation, and expansion.
An effort to distinguish between formative and summative evaluation on several
dimensions appears in Figure 1.2. As with most conceptual distinctions, formative and
summative evaluation are often not as easy to distinguish in the real world as they seem
in these pages. Scriven (1991a) has acknowledged that the two are often profoundly
intertwined. For example, if a program continues beyond a summative evaluation study,
the results of that study may be used for both summative and, later, formative evaluation
purposes. In practice, the line between formative and summative is often rather fuzzy.
Scriven (1986) himself
FIGURE 1.2 Differences between Formative and Summative Evaluation
To determine value or quality
To improve the program
Program managers and staff
To determine value or quality
To make decisions about the
programs future or adoption
Administrators, policy makers
and/or potential consumer or
Primarily internal evaluators
Generally external evaluators,
external supported by internal evaluators in
Major Characteristics Provides feedback so program
Provides information to enable
personnel can improve it
decision makers to decide whether
to continue it, Or consumers to
What information is needed?
What evidence is needed for major
Purpose of Data Diagnostic
Frequency of Data Frequent
What is working?
What results occur?
What needs to be improved?
How can it be improved?
Under what conditions?
With what training?
At what cost?
suggested one reason why they sometimes blur, noting that, when programs have many
components, summative evaluations that result in replacing weak components have
played a formative role in improving the program in its entirety.
Needs Assessment, Process, and Outcome Evaluations
The distinctions between formative and summative evaluation are concerned primarily
with the kinds of decisions or judgments to be made with the evaluation ‘ results. The
distinction between the relative emphasis on formative or summative evaluation is an
important one to make at the beginning of a study because it informs the evaluator about
the context, intention, and potential use of the study and has implications for the most
appropriate audiences for the study, However, the terms do not dictate the nature of the
questions the study will address. Chen(1996) has proposed a typology to permit
consideration process and out come along with the formative and summative dimension.
We will elaborate that typology here, adding needs assessment to the mix.
Some evaluators make use of the terms needs assessment, process, and outcome to refer
to the types of questions the evaluation study will address or the focus of the evaluation.
These terms also help make the reader aware of the full array of issues evaluators
examine. Needs assessment questions are concerned with establishing (a) whether a
problem or need exists and describing that problem, and (b) making recommendations for
ways to reduce the problem, i.e., the potential effectiveness of various interventions.
Process, or monitoring studies, typically describe how the program is delivered. Such
studies may focus on whether the program is being delivered according to some
delineated plan or model or may be. more open-ended, simply describing the nature of
delivery and the successes and problems encountered. Process studies can examine a
variety of different issues including characteristics of the clients or students served,
qualifications of the deliverers of the program, characteristics of the delivery environment
(equipment, printed materials, physical plant, and other elements of the context of
delivery), and the actual nature of the activities themselves. Outcome studies are
concerned with describing, exploring, or determining changes that occur in program
recipients, secondary audiences (families of recipients, coworkers, etc.), or communities
as a result of a program. These outcomes can range from immediate impacts (for
example, satisfaction of learners) to final goals and unintended outcomes.
Note these terms do not have implications for how the information will be used. The
terms formative and summative help us distinguish the purposes of the evaluation. Needs
assessment, process, and outcome evaluations refer to the nature of the issues or
questions that will be examined. In the past, people have occasionally misused the terms
formative to be synonymous with process evaluation and ”summative” to be synonymous
with outcome evaluation. However, Scriven (1996) himself notes that “formative
evaluations are not a species of process evaluation….Conversely. summative evaluation
may be largely or entirely process evaluation. (p. 152).
Figure 1.3 illustrates a typology of evaluation terms building on the typology proposed by
Chen (1996); we add needs assessment to Chen’s typology and label this dimension
“evaluation focus.” (Chen views this dimension as reflecting the stage of the program,
but, while process studies typically precede outcome studies, the choice of focus depends,
not on the stage of the program, but on the information needs of the stakeholders.) As
Figure 1.3 illustrates, an evaluation can be characterized by the action the evaluation will
serve (improvement or otherwise) as well as by the nature of the issues it will address. To
illustrate, a needs assessment study can be summative (Should we adopt this new
program or not?) or formative (How should we modify this program to deliver it in our
school or agency?). A process study often serves formative purposes, providing
information to program providers or managers about how to change activities to improve
the quality of the program, but a process study may serve summative purposes when we
find that the program is too complex or expensive to deliver or that program
FIGURE 1.3 A Tjvpology of Evaluation Studies
revise/change What to begin, continue, expand
How should we adapt the
model we are considering?
Is more training of staff
needed to deliver the
How can we revise our
curricula to better achieve
Should we begin a program? is
there Sufficient need?
Are sufficient numbers of the
target audience participating in
Is this program achieving its
goals to a sufficient degree that
recipients (students, trainees, clients) do not enroll as expected. In such cases, a process
study that began as a formative evaluation for program improvement may lead to a
summative decision to discontinue the program. Accountability studies often make use of
process data to make sununative decisions. An outcome study can, and often does, serve
formative or summative purposes. Formative purposes may be best served by examining
more immediate outcomes because program deliverers have greater control over the
actions leading to these outcomes. For example, describing whether students are
achieving immediate learning objectives is more useful to teachers in deciding how to
revise their curricula than examining students’ subsequent employment records or postsecondary performance. Policy makers making summative decisions, however, are often
more concerned with the programs success at achieving “final” outcomes, e.g.
employment, health, safety, because their responsibility is with these outcomes. Their
decisions regarding funding concern whether programs achieve these ultimate outcomes.
The fact that a study examines program outcomes, or effects, however, tells us nothing
about whether the study serves formative or summative purposes.
The formative and summative distinction comes first, then, to help focus our attention on
the judgment to be made or the action to be taken. In beginning an evaluation, evaluators
are first concerned with determining this focus and. then. determining the extent to which
the stakeholder can assist in making such judgments. (For example, a school board or
state legislator is generally an inappropriate audience for a formative evaluation because
they are typically too removed from immediate program activity. If the intention of the
evaluation is formative and these are the primary audience, the evaluator should suggest
that he work more closely with those involved in the day-to-day delivery of the program.)
Only after the focus is determined will the evaluator proceed to examining whether the
focus of the evaluation is needs assessment, process, or outcome and to developing the
particular evaluation questions the study will address.
Internal and External Evaluations
The adjectives internal and external distinguish between evaluations conducted by
program employees and those conducted by outsiders. An experimental year round
education program in the San Francisco public schools might be evaluated by a member
of the school district staff (internal) or by a site-visit team appointed by the California
State Board of Education (external). A large health maintenance organization (HMO)
with facilities in six cities may have a member of each facility’s staff evaluate the utility
of their training of local residents to serve in paraprofessional roles (internal), or the
HMO may hire a consulting firm or university research group to look at that
paraprofessional training program (external).
Seems pretty simple, right? Often it is, but assume that the HMO sends a team out from
their headquarters to evaluate the program in the six separate facilities. Is that an internal
or external evaluation? Actually, the correct answer is “both,” for such an evaluation is
clearly external from the perspective of those in the individual facility, yet it clearly is an
internal evaluation from the perspective of the headquarters administrators who assigned
their staff to evaluate those parts of the parent HMO operation.
There are obvious advantages and disadvantages connected with both internal and
external evaluation roles. Figure 1.4 summarizes some of these. Internal evaluators are
likely to know more about the program, its history its staff, its clients, and its struggles
than any outsider. They also know more about the organization and its culture and styles
of decision-making. They are familiar with the kinds of information and arguments that
are persuasive, and know who is likely to take action and who is likely to be persuasive to
others. These very advantages, however, are also disadvantages. They may be so close to
the program that they cannot see it clearly. (Note, though, that each evaluator, internal
and external, will bring his or her own history and “biases” to the evaluation, but the
internal evaluators’ closeness may prevent them from seeing solutions or changes that
those newer to the situation might see more readily.) While successful internal
FIGURE 1.4 Advantages of Internal and External Evaluators
More familiar with organization &
is present to remind others of results Has
now and in future
Can communicate technical results more
frequently and clearly
Can bring greater credibility, perceived
Typically brings more breadth and depth
of technical expertise
knowledge of how other similar
organizations and programs work
evaluators may overcome the hurdle of perspective, it can be much more difficult for
them to overcome the barrier of position. If internal evaluators are not provided with
sufficient decision-making power, autonomy, and protection, their evaluation will be
The strengths of external evaluators lie in their distance from the program and, if’ the
right evaluators are hired, their expertise. External evaluators are perceived as more
credible by the public and, often, by policy makers. In fact, external evaluators typically
do have greater administrative and financial independence. Nevertheless, the
”objectivity” of the external evaluator can be overdone. (Note the role of the external
Arthur Andersen firm in the 2002 Enron bankruptcy and scandal. The lure of obtaining or
keeping a large contract can prompt external parties to “bend the rules” to keep the
contract.) For programs with high visibility or cost or those surrounded by much
controversy, an external evaluator— can provide a needed degree of autonomy. External
evaluators, if the search and hiring process are conducted appropriately, can also bring
the specialized skills needed for a particular project. In all but very large organizations,
internal evaluators must be ”jacks—of-all-trades” to permit them to address the ongoing
evaluation needs of the organization. When seeking an external evaluator, however, an
organization can pinpoint and seek the types of skills and expertise needed for that time
Possible Role Combinations
The dimensions of formative and summative evaluation can be combined with the
dimensions of internal and external evaluation to form the two-by-two matrix shown in
Figure 1.5. The most common roles in evaluation might be indicated by cells l and 4 in
the matrix. Forrnative evaluations are often conducted by internal evaluators, and there
are clear merits in such an approach. Their knowledge of the program, its history, staff,
and clients is of great value, and credibility is not nearly the problem it would be in a
summative evaluation. Program personnel are often the primary audience, and the
evaluator’s ongoing relationship with them can enhance the use of results in a good
learning organization. Summative evaluations are probably most often (and probably
FIGURE 1.5 Combination Summative of Evaluation Roles
by external evaluators. It is difficult, for· example, to know how much credibility to
accord a Ford Motor Company evaluation that concludes that a particular Ford
automobile is far better than its competitors in the same price range. The credibility
accorded to an internal summative program evaluation (cell 3) may be no better. In most
organizations. summative evaluation is generally best conducted by an external evaluator
or agency, but there are two circumstances in which we would alter that statement quite
dramatically. First, in some instances; there is simply no possibility of the program’s
obtaining such external help because of financial constraints or absence of competent
personnel willing to do the job. In these cases. the summative evaluation is weakened by
the lack of outside perspective, but it might be possible to retain a semblance of
objectivity and credibility by choosing the internal summative evaluator from among
those who are some distance removed from the actual development of the program or
product being evaluated.
For example, assume that an elementary school in a large (in geography, not budget) rural
district in Saskatchewan needs to have a summative evaluation of an innovative French
language and culture program they have been running. No funds are available to bring an
evaluator in from outside the district, and. because much of the program is oral, it would
be hard to bundle it up and send it off for review, Everyone in the school is either a
zealous enthusiast or a bitter opponent of the program, so there is no way to get an
unbiased internal evaluation. In this context, it is far better to obtain a ”quasi external”
summative evaluation than do none at all. By “quasi-external,” we mean that one should
conduct the evaluation so as to maximize its ”externality.” Why not ask a qualified staff
member of another school in the district to evaluate the program in return for helping
with a later task in that school? While still internal to the district, this evaluator would be
external to the school, hence quasi-external. If the evaluation were commissioned with a
strong request for the quasi—outsider to “tell-it-like-it-is,” with no punches pulled and no
weaknesses overlooked, there is good reason to suspect that many of the advantages of a
true external summative evaluation would occur. If one still worried that being in the
same district tainted the outcomes, perhaps someone from an adjacent district, or a school
not too far beyond the province’s boundary, would make it a true external evaluation.
Whatever definitional cutoffs one chooses to use, it is important to remember that there is
a continuum from external to internal; it is a matter of degree. not black or white.
The second circumstance when we might soften our cautions about the biases that can
occur in internal evaluations is when organizations have structured their internal
evaluation unit (and its evaluators) to enhance their ability to be forthright about their
findings. Such structuring can take many forms, but the key is that the internal evaluators
are insulated and shielded from the _consequences of displeasure of those whose program
Fortunately, a number of large agencies have structured their internal evaluation function
to give it maximum independence (and avoid evaluators being placed in the untenable
posture of evaluating programs developed by the boss or close associates). The larger the
organization, the more insulated its evaluation staff can be and the fewer problems or
pressures one might expect to be caused by hierarchical or close social relationships.
Indeed, the unit (and its function) may even lose much of its internal flavor and appear
more like a built-in external evaluation unit (if that non sequitur is permitted), free to
pursue evaluations throughout the organization as need demands. Sonnichsen (2000)
writes of the high impact that internal evaluation can have if the organization has
established the conditions that permit the internal evaluator to operate effectively. The
factors that he cites as being associated with evaluation offices that have a strong impact
on the organization include operating as an in dependent entity, reporting to a top official,
giving high rank to the head of the office, having the authority to self—initiate
evaluations, making recommendations and monitoring their implementation, and
disseminating results widely throughout the organization. He envisions the promise of
internal evaluation, writing, ”The practice of internal evaluation can serve as the basis for
organizational learning, detecting and solving problems, acting as a self-correcting
mechanism by stimulating debate and reflection among organizational actors, and
seeking alternative solutions to persistent problems’ (Sonnichsen, 2000, p. 78).
Evaluations Importance—and Its Limitations
Given its many formative and summative uses, it may seem almost axiomatic to assert
that evaluation is not only valuable but essential in any effective system or society.
Scriven (l99l b) has said it well:
The process of disciplined evaluation permeates all areas of thought and practice …. it is
found in scholarly book reviews, in engineering quality control procedures, in the
Socratic dialogues, in serious social and moral criticism, in mathematics, and in the
opinions handed down by appellate courts .. It is the process whose duty is the systematic
and objective determination of merit. worth. or value, Without such a process, there is no
way to distinguish the worthwhile from the worthless (p. 4).
Scriven also argues the importance of evaluation in pragmatic terms (“bad products and
services cost lives and health, destroy the quality of life, and waste the resources of those
who cannot afford waste”), ethical terms (“evaluation is a key tool in the service of
justice”), social and business terms (”evaluation directs effort where it is most needed,
and endorses the ’new and better way’ when it is better than the traditional way——and
the traditional way where it’s better than the new high-tech way”), intellectual terms (“it
refines the tools of thought”), and personal terms (”it provides the only basis for
justifiable self—esteem”) (p. 43). Perhaps for these reasons, evaluation has increasingly
been used as an instrument to pursue goals of organizations and agencies at local,
regional, national, and international levels.
Potential Limitations of Evaluation
The usefulness of evaluation has led some people to look to it as a panacea for all the ills
of society, but evaluation alone cannot solve all the problems of society. One of the
biggest mistakes of evaluators is to promise results that cannot possibly be attained. Even
ardent supporters of evaluation are forced to admit that many evaluation studies fail to
lead to significant improvements in the programs they evaluate. Why? Partly it’s a
question of grave inadequacies in the conceptualization and conduct of many evaluations.
It’s also a question of understanding too little about other factors that affect the use of
evaluation information, even from studies that are well conceptualized and well
conducted. In addition, both evaluators and their clients may have been limited by an
unfortunate tendency to view evaluation as a series of discrete studies rather than a
continuing system of self-renewal. A few poorly planned, badly executed, or
inappropriately ignored evaluations should not surprise us; such failings occur in every
field of human endeavor. This book is intended to help evaluators, and those who use
their results, to improve the practice and utility of evaluation.
A parallel problem exists when those served by evaluation naively assume that its magic
wand need only be waved over an enterprise to correct all its malfunctions and
inadequacies. For example, developing and measuring standards in education or in
nonprofit agencies, as is the current trend, can certainly provide useful information for
judging the quality of programs, but these performance monitoring programs are only the
first step. Formative evaluations, specific to the context of the program, are then needed
to bring about improvement. Though evaluation can be enormously useful, it is generally
counterproductive for evaluators or those who depend on their work to propose
evaluation as the ultimate solution to every problem or, indeed, as any sort of solution,
because evaluation, in and of itself, won’t effect a solution, though it might suggest one.
Evaluation serves to identify strengths and weaknesses, highlight the good, and expose
the faulty, but it cannot singlehandedly correct problems, for that is the role of
management and other stakeholders, using evaluation findings as one tool that will help
them in that process. Evaluation has a role to play in enlightening its consumers, and it
may be used for many other roles, but it is only one of many influences on improving the
policies, practices, and decisions in the institutions that are important to us.
Major Concepts and Theories
Evaluation is the identification, clarification, and application of defensible
criteria to determine an evaluation objcct’s value, its merit or worth, in regard
to those criteria. The specification and use of explicit criteria distinguish
formal evaluation from the informal evaluations most of us make daily.
Evaluation differs from research in its purpose, its concern with
generalizability, its involvement of stakeholders, and the breadth of training
those practicing it require.
3. The basic purpose of evaluation is to render judgments about the value of
the object under evaluation. Other purposes include providing information
forprogram improvement, working to better society encouraging meaningful
dialogue among many diverse stakeholders, and providing oversight and
compliance for programs.
Evaluators play many roles including scientific expert, facilitator, planner,
collaborator, aid to decision makers and critical friend.
Evaluations can be formative or summative. Formative evaluations are
designed for program improvement and the audience is, most typically,
stakeholders close to the program. Summative evaluations serve decisions
about program adoption, continuation, or expansion. Audiences for these
evaluations must have the ability to make such “go—no go” decisions.
Evaluators may be internal or external to the organization. Internal evaluators
know the organizational environment and can facilitate communication and
use of results. External evaluators can provide more credibility in high-profile
evaluations and bring a fresh perspective and different skills to the evaluation.
Consider a program in your organization. If it were to be evaluated, what
might be the purpose of the evaluation? The goal? The role of evaluators in
conducting the evaluation?
What kind of evaluation do you think is most useful, formative or summative?
What kind of evaluation would be most useful to you in your work? To your
school board or elected officials?
Which do you prefer, an external or internal evaluator? Why?
Describe a situation in which an internal evaluator would be more appropriate
than an external evaluator. What is the rationale for your choice? Now
describe a situation in which an external evaluator would be more appropriate.
1. List the types of evaluation studies that have been conducted in an institution or
agency of your acquaintance, noting in each instance whether the evaluator was
internal or external to that institution. Determine whether each study was
formative or summative and focused on needs assessment, process. or outcome
questions. Finally, consider whether the study would have been strengthened by .
having someone with the opposite (internal/external) relationship to the institution
conduct the study.
2. Think back to any formal evaluation study you have seen conducted (or ii you
have never seen one conducted, find a written evaluation report of one). Identify
three things that make it different from informal evaluations. Then list ten
informal evaluations you have performed so far today. (Oh, yes you have!)
3. Discuss the potential and limitations of program evaluation. Identify some things
evaluation can and cannot do for programs in your field.
4. Within your own organization (if you are a university student, you might choose
your university), identify several evaluation objects that you believe would be
appropriate for study. For each, identify (a) the use the evaluation study would
serve, and (b) the basic focus of the evaluation.
Mark, M. M., Henry; G. T., Er Julnes. G.
(1999). To ward an integrative
framework for evaluation practice.
Arnenkmt Journal af Evaluation, 20,
Rallis, S. F., 8 Rossman, G. B. (2000).
Dialogue for learning: Evaluator as
critical friend. In R. K. Hopson (Ed.),
Haw mm’ why larzgtzage matters in
Evaluation, N0. S6. 81—92. San
Francisco: Jossey Bass.
Stake, R. E. (2000). A modest
commitment to the promotion of
democracy. In K. E. Ryan Er L.
DeStefano (Eds.), Evaluation as a
inclusion, dialogue, and deliberation.
New Directions for Evaluation, No. B5,
97-106. San Francisco: Jossey-Bass
Stufilebeam, D. L. (2001). Evaluation
models. New Directions for Evaluation,
N0. 89. San Fran cisco: Jossey-Bass.
Sonnichscn. R. C. (2000). High impact
internal evaluation. Thousand Oaks, CA:
Origins and Current Trends in Modern
1. How did the early stages of evaluation influence practice today?
2. What major political events occurred in the late 1950s and early 1960s that
greatly accelerated the growth of evaluation thought?
3. What significant events precipitated the emergence of modem program
4. Are the current trends in performance measurement and standards-based
education similar to earlier stages of evaluation? If so, how?
5. How has advocacy emerged as a controversial issue in evaluation?
Formal evaluation of educational, social, and private-sector programs is still
maturing as a field, with its most rapid development occurring during the past
four decades. Compared to professions such as law, education, or accounting or
disciplines such as sociology, political science, and psychology, evaluation is still
quite new. In this chapter we will review the history of evaluation and its progress
toward becoming a full-fledged profession and transdiscipline. We will also
introduce the reader to some of the new issues or debates that are central to
evaluation practice and theory as we enter the twenty-first century.
The History and Influence
of Evaluation in Society
Early Forms of Formal Evaluation
Some evaluator-humorists have mused that formal evaluation was probably at work in
determining which evasion skills taught in Sabertooth Avoidance IOL had the greatest
survival value. Scriven (199lc) apparently was not tongue-in-cheek in suggesting that
formal evaluation of crafts may reach back to evaluation of early stone-chippers’
products, and he was obviously serious in asserting that it can be traced back to samurai
In the public sector, formal evaluation was evident as early as 2000 B.C., when Chinese
officials conducted civil service examinations to measure the proficiency of applicants
for government positions. And, in education, Socrates used verbally mediated evaluations
as part of the learning process. But centuries passed before formal evaluations began to
compete with religious and political beliefs as the driving force behind social and
Some commentators (e.g., Cronbach, et al., 1980) see the ascendancy of natural science
in the seventeenth century as a necessary precursor to the premium that later came to be
placed on direct observation. Occasional tabulations of mortality, health, and populations
grew into a fledgling tradition of empirical social research that grew until “In 1797,
Entyclopedia Britannica could speak of statistics state—istics, as it were—as a ’word
lately introduced to express a view or survey of any kingdom, county, or parish’ ” (p. 24).
But quantitative surveys were not the only precursor to modern social research in the
l700s. Rossi and Freeman (1985) give an example of an early British sea captain who
halved his crew into a “treatment group” forced to consume limes, while their control
counterparts consumed the sailors’ normal diet. Not only did the experiment show that
”consuming limes could avert scurvy,” but “British seamen eventually were forced to
consume citrus fruits—this is the derivation of the label ‘limeys,’ which is still sometimes
applied to the English” (pp. 20—2l).
Program Evaluation: 1800-1940
During the 1800s, dissatisfaction with educational and social programs in Great Britain
generated reform movements in which government-appointed royal commissions heard
testimony and used other less formal methods to “evaluate” the respective institutions.
This led to still existing systems of external inspectorates for schools in England and
Ireland and parallels the activities of presidential commissions in the United States today.
President George W. Bush’s Commission to Strengthen Social Security, exploring the
economic benefits and risks to permitting people to invest their own social security funds,
is a recent example. These commissions review existing research and hear testimony as a
primary means of collecting information before making judgments and recommendations.
In the United States, educational evaluation in the l800s took a slightly different bent,
being influenced by Horace Mann’s comprehensive annual, empirical reports on
Massachusetts’s education in the 1840s and the Boston School Committee’s 1845 and
1846 use of printed tests in several subjects (the first instance of wide-scale assessment of
student achievement serving as the basis for school comparisons). These two
developments in Massachusetts were the first attempts at objectively measuring student
achievement to assess the quality of a large school system. They set a precedent seen
today in the standards—based education
movement’s use of test scores from students as the primary means for judging the
effectiveness of schools.
Later, during the late 1800s, liberal reformer Joseph Rice conducted one of the first
comparative studies in education designed to provide information on the quality of
instructional methods. His goal was to “document” his claims that schooltiine was
inefficiently used. To do so, he compared a large number of schools that varied in the
amount of time spent on spelling drills and then examined the students spelling ability.
He found negligible differences in students’ spelling performance between schools,
where one had students spend as much as 100 minutes a week on spelling instruction
while another had students spend as little as 10 minutes per week. He used these data to
flog educators into seeing the need to scrutinize their practices empirically.
The late 1800s also saw the beginning of efforts to accredit U.S. universities and
secondary schools, although that movement did not really become a potent force for
evaluating educational institutions until several strong regional accrediting associations
were established in the 1930s. The early 1900s saw another example of accreditation
(broadly defined) in Flexner’s (1910) evaluation (backed by the American Medical
Association and the Carnegie Foundation) of the 155 medical schools then operating in
the United States and Canada. Although based only on one·day site visits to each school
by himself and one colleague, Flexner argued that inferior training was immediately
obvious; “A stroll through the laboratories disclosed the presence or absence of
apparatus, museum specimens, library and students; and a whiff told the inside story
regarding the 7 manner in which anatomy was cultivated” (Flexnen 1960, p. 79). Flexner
was not deterred by lawsuits or death threats from what the medical schools viewed as his
“pitiless exposure” of their medical training practices. He delivered his evaluation
findings in scathing terms (labeling, for example, Chicago’s fifteen medical schools as
“the plague spot of the country in respect to medical education” [p. 84]), and soon
“Schools collapsed to the right and left, usually without a murmur” (p. 87). No one was
ever left to wonder whether Flexner’s reports were evaluative.
Other areas of public interest were also subjected to evaluation in the early 1900s;
Cronbach and his colleagues (1980) cite surveys of slum conditions, management and
efficiency studies in the schools, and investigations of local government corruption as
examples. Rossi, Freeman, and Lipsey (1998) note that evaluation first emerged in the
field of public health, which was concerned with infectious diseases in urban areas, and
in education, where the focus was on literacy and occupational training.
Also in the early 1900s, the educational testing movement began to gain momentum as
measurement technology made rapid advances under E. L. Thomdike and his students,
and by 1918 objective testing was flourishing, pervading the military and private industry
as well as all levels of education. The 1920s saw the rapid emergence of norm-referenced
tests developed for use in measuring individual performance levels. By the mid-1930s,
more than half of the United States had some form of statewide testing, and standardized,
norm-referenced testing, in
duding achievement tests and personality and interest profiles, became a huge
During this period, educators regarded measurement and evaluation as nearly
synonymous, with the latter usually thought of as summarizing student test performance
and assigning grades. Although the broader concept of evaluation, as we know it today,
was still embryonic, useful measurement tools for the evaluator were proliferating
rapidly, even though very few meaningful, formally published evaluations of school
programs or curricula would appear for another twenty years. One notable exception was
the ambitious landmark Eight Year Study (Smith Er Tyler, 1942) that set a new standard
for educational evaluation with its sophisticated methodology and its linkage of outcome
measures to desired leaming outcomes. Tyler’s work, in this and subsequent studies (e.g.,
Tyler, 1950), also planted the seeds of criterion-referenced testing as a viable alternative
to norm-referenced testing. (We will return in Chapter 4 to the profound impact that
Tyler and those who followed in his tradition have had on program evaluation, especially
Meanwhile, foundations for evaluation were being laid in fields beyond education,
including human services and the private sector. In the early decades of the 1900s,
Fredrick Taylor’s scientific management movement influenced many. His focus was on
systemization and efficiency, discovering the most efficient way to perform a task and
then training all staff to perform it in that way. The emergence of “efficiency experts” in
industry soon permeated the business community and, as Cronbach et al. (1980) noted,
“business executives sitting on the governing boards of social services pressed for greater
efficiency in those services” (p. 27). Some cities and social agencies began to develop
internal research units, and social scientists began to trickle into government service,
where they began to conduct applied social research in specific areas of public health,
housing needs, and work productivity. However, these ancestral, social-research
“precursors to evaluation” were small, isolated activities that exerted little overall impact
on the daily lives of the citizenry or the decisions of the government agencies that served
Then came the Great Depression and the sudden proliferation of government services and
agencies as President R0osevelt’s New Deal programs were implemented to salvage the
U.S. economy. This was the first major growth in the federal government in the 1900s
and its impact was profound. Federal agencies were established to oversee new national
programs in welfare, public works, labor management, urban development, health,
education, and numerous other human service areas, and increasing numbers of social
scientists went to work in these agencies. Applied social research opportunities
abounded, and soon social science academics began to join with their agency-based
colleagues to study a wide variety of variables relating to these programs. While some
scientists called for explicit evaluation of these new social programs (e.g., Stephan,
1935), most pursued applied research at the intersection of their agency’s needs and their
personal interests. Thus, sociologists pursued questions of interest to the discipline of
sociology and the agency, but the…
Purchase answer to see full
Why Choose Us
- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee
How it Works
- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "PAPER DETAILS" section.
- Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
- From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.