9 Performance Measurement and Evaluation
- A. Premchand
- Published Date:
- March 1993
Between vague wavering capability and fixed indubitable performance, what a difference.
The accountability framework described in earlier chapters suggests specifying performance for which expenditures are incurred and providing information on actual performance to the community. Specifically, in terms of the expenditure management cycle illustrated earlier, it would be considered complete only when the specified goals were achieved within the time and cost frame indicated for the purpose. It thus follows this continuum:
Corporate palns → Budgets and expenditure plans → Performance agreements on standards and indicators → Monitoring → Evaluation
In varying degrees, some of the above aspects are performed even now in the expenditure management systems of many governments. Many may be addressed as part of the budgetary process, or in separate but associated processes. But the imperatives of the times and the progress in applying performance measures have been such that monitoring and evaluation are expected to be carried out under the spotlight of the public or its legislative bodies. They have acquired an intrusive urgency of their own in incipient and newly emerging democracies that are eager to compensate for lost time and opportunities. Since it is generally recognized that a democracy has no economic rationale and is even sometimes considered a drag on economic efficiency, an even greater obligation exists to demonstrate that the actions of a democratic form of government do not have to be inefficient. More important, if proper organizations are available and with an incentive structure tailored to meet the unique needs of a country, a government can be just as efficient as other organizations in the nongovernmental sector. While efficiency is the final result of a variety of efforts in numerous areas including expenditure management, it is also widely believed that a good deal depends on expenditure management, which plays a pivotal role in the whole process as moneys flow through to accomplish the specified objectives.
A long, arduous, and frequently frustrating journey lies behind the currently espoused demands for economy, efficiency, and effectiveness. From the nineteenth century onward, the demand was for legislative accountability that was deemed to consist of the safe custody of moneys, adequate procedures for disbursements to minimize opportunities for leakages, the pursuit of economy by conserving resources (as opposed to profligacy), and the observance of regularity with emphasis on compliance with existing rules and related legislation. No specificity was provided on the content of economy, however. It was more like a commandment to be heeded by the legislator and civil servant alike. Accountability in such a context came to be interpreted as a set of accounts reflecting the completed transactions of the government. These accounts were in turn to be scrutinized by specialized legislative committees. More often than not, however, such committees did not have the expertise to penetrate the mysteries of government accounts and had to depend on the audit office review. This review was (and continues to be in several countries) aimed at providing a financial or appropriation audit and was limited to ascertaining the reasons for the departures from the approved estimates and whether in the process any laws had been broken. This action endowed a kind of legitimacy or stamp of approval on the government’s work. In such a frame of accountability, which was narrow from any point of view, there was no premium or privileged position for efficiency and effectiveness.
With the steady growth in public expenditure—some of it contributed by frequent wars, however—a search began to ascertain whether governments were spending wisely. Attention thus tended to shift from “propriety” to “wisdom.” This approach finally culminated in one of the two leading questions posed by the Hoover Commission in the United States in 1949.1 The Commission asked how efficiently and economically an approved program was being implemented and how the same amount of work could be performed satisfactorily through other arrangements or through improved procedures at less cost. The efforts to find answers to that leading question led to the formulation of the performance budgeting system and its introduction to the U.S. Federal Government in the 1950s. This system, drawn from the experience of local governments, emphasized the measurement of cost and sought to divert the traditional budgeting system’s excessive emphasis on inputs to budgetary outputs. It stressed the improved classification of government transactions, the measurement of costs, and the evaluation of the efficiency and effectiveness of programs and projects after completion. As an integral part of this effort, attempts were also initiated to measure productivity in government departments. But little progress was made in that effort because of the wide-ranging diversity of operations and the difficulties in evolving standard productivity measures. Instead, workload measures were evolved as a substitute for productivity measurement. The main virtue of performance budgeting, despite its generally poor implementation,2 was that it required an expanded budgetary process in which the program of work (and its linkages with goals) and its costs were to be explicitly considered. It sought an equation between inputs and outputs and between financial and physical aspects, and it facilitated the identification and measurement of linkages between manpower, materials, and other factors of production.
The lack of success with performance budgeting was partly attributable to the view that, since most government services and products were unique and were not marketed, inherent difficulties lay in formulating precise goals. Moreover, it was argued (and the argument gained considerable momentum in the 1950–70 period) that measuring costs was difficult as a number of joint products could not be exclusively identified with a single source. Also, the economies to be obtained from strict observance of specified costs were likely to be less, and more was to be gained by improving the allocative process. Accordingly, efforts were directed to systems that emphasized planning, programming, budgeting, zero-based budgeting, and program evaluation (which were more like policy analysis than a review of completed programs). These systems were not very successful in curtailing the rate of growth of expenditure. In that context, the pursuit of efficiency in approved programs received added impetus. This contributed by the early eighties to the recognition that “like any other employer and provider of services, Government must ensure that the people and money in its charge are used efficiently and effectively.”3 It was also recognized that an examination of the obstacles to management was vital. But such an inquiry was by no means a substitute for continuous exercise of good management practices, cost consciousness, and cost responsiveness throughout the management chain. Therefore, emphasis was laid on improving the organizational design and creating the right kind of environment in agencies so that they could manage themselves better. As part of this effort, the responsibilities and goals of agencies were clarified, management information systems (particularly their costs) strengthened, and greater flexibility provided to managers. Simultaneously, ascertaining the continued relevance of programs in the changing economic environment was also emphasized. The major issue for the overall welfare of the community was (and is) whether some of the services provided could have been better performed outside the government sphere. Continuing pursuit of these themes contributed on the one hand to contracting out services on a competitive basis, and on the other to specifying an annual performance agreement with the agencies. The culmination of this can be found in New Zealand in the Public Finance Act of 1989, which envisaged replacing a permanent head of an agency by a chief executive who would be responsible for formulating corporate plans and for specifying an annual performance agreement that indicated the outputs for the moneys appropriated.
Why did these efforts fail to become fully operational? Several factors contributed to the gap between what should have been and what was. On the basis of experience in several countries, and for brevity, they can be analyzed in terms of environment and management.
The economic environment of government was undergoing rapid change over this time. Although change itself is a natural part of evolution, the rate of change, with the attendant uncertainty, came to be viewed as a problem. Inflation and stagflation characterized economies, and the manifold increase in the size of government budgets exacerbated the situation. The growing share of entitlements proved resistant to efforts to make government budgets viable instruments of countercyclical policy. Inflationary policies, followed by restrictive, unemployment-creating actions, became the dominant features of macroeconomic management.
To counteract these trends, the institutional improvements enumerated above were introduced. But as they had little impact on expenditure growth, solutions outside the normal budget were envisaged through balanced budget amendments and constitutional limitations. These approaches had the provisional impact of reducing the momentum of the approaches emphasizing efficiency. These diversions helped to weaken efforts and to relegate them to the background.
Some management factors, as will be illustrated below, also proved unsettling. Whereas efficiency as a generic phrase always gained universal lip service, and some support in more substantive terms, owing to a lack of refinement in the early stages as happens with many new products, the quantitative approaches were bedeviled by the difficulty of finding acceptable ways of measuring the output of government activities. For example, the U.S. Bureau of the Budget noted in its report, “Measuring Productivity of Federal Government Organizations,” published in 1964, that meaningful measures of productivity could not be obtained because of the difficulties inherent in defining the end product or outputs. In 1972, the U.S. Civil Service Commission noted that while no single measure could capture the wide variety of tasks, a system comprising several indicators proved to be “too technical and mysterious.” The persistent problem of measuring the efficiency of organizations whose outputs were nonhomogeneous and subject to rapid change again dominated the situation.
There was also a lack of support from the spending agencies, which felt that some productivity results were forced out of them by reducing manpower allocations. This resulted in the short term in superficial improvement, but in the medium term it adversely affected the quality of their work and led to their alienation, which, instead of contributing to the cooperative effort, injected an adversarial element into the process. The lack of integration of efficiency measurement with the budgetary process was another problem. Carried out as an independent exercise it tended to be distant and less important. In sum, the ambiguity in concepts, abetted by the apathy of agency management and a lack of firm guidance and inadequate implementation of infrastructure, hindered progress.
But the limitations on the resources and the search for alternative sources of services brought to the fore the twin issues of programs’ costs and their effectiveness. These issues in turn spurred new efforts to refine the available concepts and techniques of measurement and to make them more operational. In addition, the development of value-for-money audits and making them an obligatory function of the audit agency helped to formalize some of these techniques. Although they are not perfect, they provide almost the same assistance as an accounting balance sheet. They tell a tale that is adequate to indicate the broad problems but inadequate to formulate precise remedies without further detailed investigations. This new frontier of expenditure management offers yet another opportunity for stimulus to improved performance and is thus worthy of full consideration.
Any effort aimed at measuring the performance of an organization is intended to provide a basis for assessing the efficiency with which that performance has been accomplished. The crucial questions from the standpoint of expenditure management are: What is efficiency? How is its measurement designed? And how should it be utilized?
So far, efficiency has been used to refer to the utilization of resources in relation to outputs. It referred to the relationship between the output of goods, services, or other results and the resources used to produce them in relation to planned levels. This somewhat narrow approach is considered technical efficiency. It is different from allocative efficiency, which moves from physical measures to costs and is said to be achieved when the cost of any given output is minimized by combining so that one input cannot be substituted for another without raising costs. In an all-inclusive sense of the concept, and when attention shifts from the physical aspects, to costs, to utilities, it is considered attained when consumer satisfaction is assured. Thus, allocative inefficiency occurs when inputs are employed in the wrong proportions, given their prices and productivity. Technical inefficiency occurs when the intended level of output could not be achieved with a given level of inputs—in short, when too little output results from inputs. This measure, which may be viewed as output efficiency, is somewhat different from the measurement of productivity. Efficiency seeks a relationship between an output and a norm under a given level of technology. Productivity measurement, on the other hand, reflects changes in technology and other factors in addition to changes in labor efficiency. It thus emphasizes the relationship between outputs and all resource inputs.
Although the term efficiency may have the appearance of being clear and even simple, its practical application is more complex, owing mainly to the ambiguities associated with the multidimensional nature of government organization. In addition, some unintended obfuscation exists in distinguishing the inputs and outputs in governments.
During the last decade, attempts were made to evolve techniques and methods to assess efficiency as well as performance. Efficiency is somewhat narrower in connotation, and, as already indicated, is an expression of the relationship between the resources used and the outputs achieved. Performance encompasses efficiency and includes, in addition, the quality of the goods or the standards of the services and the quality of the organizational contribution. It largely implies a degree of accountability in terms of the results achieved and, more significantly, on the influence that such a measure has over decision making by the executive as well as by the community at large. It has become a shorthand phrase that tends to cover economy, efficiency, and effectiveness. This distinction in coverage of the concepts has also influenced the orientation of the methods and techniques utilized for the purpose.
Efficiency Measurement Techniques
Efficiency measurement broadly covers three techniques.4 These are ratio analysis, regression analysis, which is a statistical approach, and data envelope analysis. Ratio analysis permits comparisons with the progress made in previous years, with reference to specified targets or goals, or in comparison with similar organization, or alternative approaches for achieving the same goal. For the purpose, unit costs (cost divided by output) may be computed for the service or the output or the productivity factors (ratio between inputs and outputs). Both in conception and execution, ratio analysis is one of the simplest methods that can be implemented at the least cost. But these features also appear to have limited application of this technique. Its primary limitation is that it cannot capture the factors that affect the management of an organization in comparison with other similar organizations or its own pattern over a period. Some areas may be better than others but the relative mix of the factors that contributed to different results are less evident from the results. However, the technique has more applicability in the manufacturing sectors of the government than in its service organizations where more than one service is offered.
Regression analysis, which is the statistical approach, has had and continues to have wider application. It seeks to relate an output to a set of inputs and explains the changes in one set with reference to changes in one or more of the ingredients of the other set. Specifically in government organizations, this technique can be and has been applied in the context of management by exception (key variables rather than all variables). In such cases, the explanations of inputs and outputs may be due to structural differences in what is being compared, lack of control over relevant explanatory data, and errors in recorded performance measures or indicators. Here, again, the main limitation is that the results achieved do not offer a meaningful insight into the causal process. Such an understanding is possible only when supplemented by other empirical approaches or subjective understandings. Also, efficiency as measured through this technique is in terms of deviations from an average rather than in terms of best performance.
The third approach—data envelope analysis, which is of relatively recent origin (and still being refined)—seeks to examine the input-output relationship of an organization from the vantage point of efficient organizations at the frontier. Organizations are considered inefficient when they operate below the production level frontier or above the cost frontier. The frontier with reference to which the comparison is made is itself a composite construction comprising the weighted average of a number of efficient units. This approach has several problems common to the preceding two techniques. It assumes a type of causation between inputs and outputs that is more like an a priori argument than a comprehensive illumination of the causation process. The choice of the units constituting the frontier cannot be free from bias; more significantly, there is an implicit assumption of constant returns to scale. In government, the issue is not one of assuming such a pattern, for if that was so, there would be overwhelming demands for allocation of increased funds, for each such increment would certainly bring in increased outputs at the prevailing scale. Also, because the technique is somewhat more mathematical, it is remote in both content and application from an expenditure manager. While every expenditure manager is also an analyst of his own organization, not every analyst is an expenditure manager; the manager would benefit from the findings of an analyst, but both the technique utilized and the findings made should be objective, available faster, and cost effective. In short, they have to be accessible to the manager. Of the three techniques, only ratio analysis has the properties that make it accessible. The pursuit of a language and orientation that are closer to the needs of an expenditure manager and the lack of finality about the findings from statistical and mathematical techniques contributed to the development of another set of approaches.
In the accountability chain, a manager is endowed with a degree of flexibility to produce a range of outputs considered appropriate for the moneys spent. Those outputs are those that can be expected at a given level of performance. A higher level may be achieved through economy in the use of resources or through a greater production of outputs that in turn would be considered more efficient and, thus, would trigger the operation of incentives. The manager should have control over that for which he is considered to be accountable. Specification of performance measurement or indicators that would reflect the manager’s activities and be a basis for judging whether the effort is reasonable is needed. Three issues arise: What is performance measurement? What does it consist of in government? And what is the basis for judging the reasonableness of the effort—or, in short, the design of the performance assessment system?
The term “measurement” implies a degree of precision and a lack of ambiguity. Such measurement is difficult in most government activities, however. This is not to say that all organizations engaged in processing paper, as are governments, do not have any measurable outputs. For example, in a management accounting and information system division, the work typically involves a caseload (people or paperwork processed by a self-contained unit (office)) and the output (accounting information) is provided to a customer.5 Although only information is provided, it lends itself to measurement for three reasons. Both inputs and outputs are measurable; the process of converting inputs into outputs is easy to define because of the unambiguous character of the work; and a single management largely controls the process. This may not be so in other parts of the government and in those cases performance indicators may be adequate for measurement purposes. Indicators are less precise than measurements but are acceptable substitutes when the search for measurements is long, costly, and often futile.
The indicators should follow the objectives set for the organization. These objectives should be specific, disaggregated, and measurable in one form or other. Indicators in such a context seek to serve as a bridge between the objectives, the resources allocated (inputs), and the organization’s outputs. More specifically, they should be designed to reflect the following aspects.
|Throughput or volume||—||Number of cases handled.|
|Productivity||—||Average output units per person. Weights are used to express different types of work on a comparable basis.|
|Cost||—||Average cost per unit to indicate the resources both required and utilized.|
|Time target||—||Time needed for the completion of each caseload.|
|Demand for services||—||Types, frequency, and quantity of outputs: seasonality in demand should also be indicated.|
|Availability of services||—||Range of demand and the equity shown in consumer demand.|
|Outcome||—||Types of results expected; their timing and impact on the overall set of goals of the organization.|
The intent behind these indicators is to spur the managers to greater productivity and to perform the task better. The community is interested in ascertaining whether the budgeted capacity has been effectively utilized. In turn, the formulation of anchors with which comparisons may be made is required. Without such standards, a recitation of the above types of indicators merely indicates what has been done without providing any basis for judging the reasonableness of the effort. In the public sector, this is done with reference to the following factors.6
|Standards or norms||—||Specification of targets for achievement based on the installed capacity and its utilization potential.|
|Time series||—||Historical series of data offer meaningful relationships for costs, caseload, number employed, and associated aspects.|
|Alternative approaches||—||When a similar activity is undertaken either within the government or in a comparable organization outside, the activities of the latter provide a useful benchmark. In some cases interservice (at different locations) comparisons are also helpful.|
|Control groups||—||Comparison of behavior patterns with others that have a similar input or product and with those whose activities are dissimilar from the one whose performance is to be measured.|
In designing the system of performance indicators, the users’ needs and the relationship to the budgetary process need to be given particular attention. They should reflect the users’ interests. If these interests are perceived to be in the interests of central agencies or outside agencies, those who are generating the data and complying with the selected indicators may treat them as less important to them directly and may perform them ritualistically or perfunctorily. Fudging the data is distinctly possible in such a situation. Once designed to reflect the users’ needs, it would be necessary to specify the types of data to be collected, the intervals at which they are to be collected, the form in which they have to be presented, and the action to be taken on the issues arising from the data. None of these activities should be viewed as isolated from the general needs of the budgetary process. The process is multidimensional, reflecting the needs of the spending agencies as well as the macroeconomic policy responsibilities of the central agencies. From the point of view of the spending agencies, the data collected on performance should be utilized to plan their own activities, to set targets, and to ensure a more efficient allocation of time, manpower, and materials. It should be utilized to improve management awareness, so that problems can be pinpointed for the attention of management to seek continuing improvements.7 From the point of view of central agencies, the progress in performance and its impact on the future resources and the policy options available are immensely important. How performance would be affected if the funds allocated were reduced, if such reductions were mandated by the deteriorating financial situation, is also important. The continued maintenance of a targeted level of performance implies, none too subtly, a degree of stability in the provision of resources. But disruptions in allocations are inevitable if external disturbances (presumed to be beyond the reach of a decision maker) have a strong impact on the resource inflows. In such cases, resort may be made to the formulation of low or high values of performance, each level pegged to a specified amount of resources. The levels of effectiveness should correspond to the levels of resources, and these levels should be formulated to cover a range of possibilities.
Performance measurement therefore should be viewed as a value chain, as illustrated in Chart 6, in which each element is as important as any link in a chain.
Issues in Applying Performance Indicators
The application of performance indicators has had a somewhat checkered history. A balanced assessment now will have to take into account some significant improvements in three major areas that on the whole have tended to strengthen this application. The first is the general acceptance of the need for cost computations of government programs; over the years, the development of a running or operational costs profile as an analytical tool has tended to make budgeting and expenditure control more meaningful. Although operational cash costs may not indicate the full accrued cost of a program, and are somewhat more limited in their usefulness, they have a validity of their own in that the trend in operational costs cannot, by its very nature, differ much from the accrued cost. Such a profile reveals the areas that contribute to increased costs and therefore permits an analysis of the policy options. Similarly, project costing has in recent years become more comprehensive and also normal practice through the insistence of the lending institutions on such costs.
Chart 6.Performance Measurement Value Chain
Second, aided by the computer revolution, management information systems, as distinct from the traditional accounting information systems, have become accepted ingredients of governments’ administrative processes. Major spending agencies in industrial and developing countries8 now have information systems to meet their unique needs.9 Since these systems are generally geared to providing a tracking system to conform with the needs of the agencies, and as they usually cover the physical and financial aspects, they proved helpful in monitoring performance also. The financial management initiative in the United Kingdom and similar efforts elsewhere are examples.
Third, the management environment in governments has been transformed. In the sixties, it appears in retrospect that too much credence was given to the view that government was a unique institution and that much of what was happening then in the commercial world had little applicability for the government. The technological revolution and fiscal stress have forced governments to look inward and to gain from some practices in the commercial world. Time and stress have a profound impact on individuals and governments alike.
These facilities, while smoothing the application of performance indicators, do not necessarily compensate for their limitations. Many of the limitations were identified even during the early years of performance budgeting application.10 Some of the indicators communicate more about the inputs and the processes than about the outcomes. The assessment of the results is frequently made difficult when comparisons are made with organizations that are not wholly comparable. In some cases, fortuitous factors, such as the clearance of a backlog, may show higher productivity ratios. In a few others, data were incorrect or were manipulated to cast a favorable light on the management’s actions. Elsewhere, random differences between targets and actual performance may divert attention from the more important to the less important. The quality of services continues to be a matter of judgment influenced by subjective factors. More significant, in analyzing their implications for future policy changes, conclusions should be drawn with caution.
These limitations have been addressed in varying forms in various countries. Data are now largely available in a number of areas that were previously not contemplated. The accumulation of greater evaluative experience in both government and independent audit agencies—in addition to the public’s discerning and critical eye—has helped in evolving more reliable data that are utilized for management purposes. A study undertaken in 1988 in the United Kingdom (Jackson and Palmer, 1989, p. 37) found, for example, that in the National Health Service, indicators were used more for highlighting longer-term trends than for daily management of services because of lags in reporting.
The impact of these indicators on the behavioral patterns of officials and organizations is also important. It is suggested, on the basis of the experience of CPEs, that they make the officials only target oriented, and attainment of that target remains the sole aim. Quality may be sacrificed and the vital link between objectives and performance itself may not get the attention deserved. While the possibility of such behavior cannot be entirely ruled out, the change in the management environment and the growth of transparency in public activities may have had the needed dampening effect in curbing the excessive zeal of some organizations.
The final limitation concerns the possibility of achieving performance in an uncertain world in which changes are frequent and unpredictable. The allocation of resources in such circumstances is also subject to frequent changes, but standards and indicators should not be considered immutable. They should have the same flexibility as resource allocation needs. In accountable management, nonachievement is not a crime. It is an opportunity to explore the alternatives and a journey toward discovering a more feasible course of action. The experience of CPEs conclusively illustrates that if standards were maintained regardless of resource availability it was surely going to lead to fiscal insolvency. Performance indicators have a major role in providing consistent and credible information to decision makers and are not substitutes for judgment.
Several activities in a government are often viewed as different forms of evaluation. Included in this broad range are auditing (including internal audit), inspection, management analysis, monitoring, planning, policy analysis, and program analysis. Evaluation as used here, however, has a specific connotation. Its objective is to improve program and agency effectiveness, and it is viewed as an aid to decision making and management. It consists of assessing the rationale of programs, their progress and impact, finding areas of success and failure, and deriving lessons for improving and for stimulating performance. It should not be viewed as scientific research aimed at producing definitive conclusions about the programs and their effectiveness but as an input into the complex processes of government decision making. In democracies, which, because of vested interests, are reluctant to give up programs even when they are shown to be ineffective, it has the advantage of showing the costs of continuing these programs for society. If the programs are to be continued, evaluation has the built-in opportunity to show where costs could possibly be minimized.
Evaluation is by no means new. While in the past it had no specific forums or periodicity or was undertaken by citizens’ groups outside the realm of government, it acquired a more systematic form in the developing countries in the early fifties and in the industrial countries in the early seventies. In the anxiety to undertake rapid economic reconstruction and development, a good deal of wastage in the use of resources could occur. The theme in those days was not attaining efficiency (which was not then a goal either for policymakers or for financial managers), but minimizing waste. The annual audit, which mostly consisted of financial audit (in terms of explaining deviations from the approved appropriations), was considered somewhat routine and took too long a period to complete the annual financial circle. It was felt that it could be compensated for by the establishment of a separate evaluation organization that could be charged with primary responsibility for evaluating plan programs. India was one such country that set up a program evaluation organization in 1953 to evaluate river valley projects and agricultural programs that were accorded top priority in its First Five-Year Plan.11 This approach was replicated elsewhere too, but on the whole the impact of such evaluation was not widely felt in government finances.
The theme of evaluation was picked up in industrial countries during the sixties and seventies. Thus, for example, in Canada, ad hoc efforts were made to evaluate specific programs. Similar efforts were also made in the United States. Evaluation was initiated in Canada in 1977 as a government-wide activity and was put on a formal footing by the early eighties. In the United States, however, as a prominent administrator (Richardson (1991)) later noted, “evaluation fell out of favour as the federal initiatives of the 1960s and 1970s proved disappointing and early attempts at evaluation were shown to be flawed or biased” (p. 42). Apart from the flaws in the systems evaluated, the fact that both the executive and legislative branches of government were more concerned with starting new programs than with ensuring that those already working were satisfactory or that they could be improved also had an impact. More recent studies on the practices of eight industrial countries show that evaluation has become regular, although in many cases it is associated with the audit of government expenditures.12 Behind this gradual acceptance of evaluation as a necessary and essential part of national expenditure management lies the story of the gradual evolution of evaluation from an attractive idea to an operational framework aimed at facilitating day-to-day management.
Nature of Evaluation
As indicated at the beginning of this chapter, the continuum of expenditure management involves formulating corporate plans and relevant programs and monitoring their progress. This monitoring provides the first insight into the actual developments as well as a qualitative assessment. Evaluation is the next step and is distinct from monitoring. Monitoring is generally associated with the on-going operation of programs while evaluation is associated with the impact and effects of completed programs. Evaluation differs from internal audit. Internal audit, in a narrow context, consists of examining the accuracy and validity of the documents being processed for payment. The role of internal audit is to ensure compliance with all the relevant laws and regulations. In a somewhat broader interpretation, internal audit consists of undertaking a systemic review of all departmental operations to advise the management. In performing this function, internal audit focuses on the interrelationships between inputs, activities, and outputs, while evaluation is more concerned with the impact of the resources used. To some extent, internal audit can serve as a useful precursor to organized evaluation, because the problems highlighted in the internal audit can be followed through more intensive investigation in the evaluation.
Management-oriented evaluation seeks to provide useful information at a minimum cost to management. Specifically, it seeks to stimulate a heightened awareness in three areas: policy and program formulation, program implementation, and the impact of completed programs. Each aspect has different implications for the type of information collected.
Evaluation of policy formulation seeks to provide an in-depth understanding of the objectives of programs selected for study and to ascertain whether any changes have taken place in the contours of the issue that necessitate a change in the focus and content of programs. To what extent have the programs fulfilled their objectives? If clients have changed, what does that imply for the programs? What additional efforts are needed and where and how can they be fulfilled?
The requirements of program implementation are to ascertain the suitability and effectiveness of the organizations, their methods, and the procedures and schedules used. To what extent have their operations been affected by the changes in budgetary allotments, procurement, and contracting procedures? What are the responsibilities of the various levels of government involved in implementation, and has the existing structure of relationships among the levels contributed to easier implementation of the program or has it proved to be a bottleneck? What is the administrative cost of supervision, and how does it compare with the overall results? What changes would be appropriate in all these areas so that implementation can be improved?
The evaluation of outcomes requires analysis of the flow and distribution of benefits and whether they have accrued to the group targeted. It also seeks to ascertain the factors underlying the different impact on different areas and groups and the wastages of men and materials. The additional issue of accountability also focuses on management supervision, attention to, and procedures relating to the use of public moneys.
Design of Evaluation13
Not all activities of the government need to be evaluated regularly. In some cases, the cost of evaluation may be more than the benefits that accrue. The areas to be evaluated should therefore be selected carefully and with reference to objective criteria. What is considered important from an agency’s point of view may not carry the same weight from the cabinet’s point of view. It is equally possible that the legislature will also differ in its views of what is important. These differing orientations naturally contribute to more than one type of evaluation being carried out by more than one agency. In Germany,14 for example, evaluation is undertaken by the individual ministries reflecting their priorities, and on behalf of the legislature, a similar evaluation is undertaken for legislated programs. In addition, new policies or social experiments are also evaluated. In each case, however, the formulation of priorities is only an essential first step. These priorities may in turn be subdivided into those to be performed immediately within the fiscal year and those that can be undertaken over the medium term.
Once programs are selected for evaluation, an initial determination has to be made on the kind of information required and the sources of that information. In turn, the relationship of the questions to be raised on the objectives and the feasibility of the program has to be determined as well as the period of time over which the questions have to be answered. It also has to be decided whether the study will be a pilot study or a full study. In all these areas, the evaluator is engaging also in self-analysis, because his effectiveness depends on the timeliness and the validity of his conclusions.
Determining the design of evaluation involves a choice between a sample survey (random or stratified), a case study approach, a field experiment, or the use of data already available. Once this pre-evaluation planning is completed, the process of evaluation begins with data collection and analysis and concludes with reporting to the relevant authority. The cycle of accountability is completed when decisions based on findings and recommendations of the evaluation study are taken.
An important aspect of the debate on evaluation is who should do it. In answering this question, the conventional wisdom on the traditional roles of the agencies and external audit appears to have had a major impact. For example, when an agency undertakes an evaluation of its own activities, it usually does so to justify the initial policy and to seek additional funding for its support. Self-criticism is not a strong point of any bureaucracy. But on the other hand, forced by a resource shortage and given an incentive to be flexible, the management of the agency may be inclined to explore possible improvements. Evaluation by an external auditor is often viewed as a relentless exercise in pursuing the auditor’s role as a critic. It is further suggested that it lacks the expertise and objectivity needed to undertake evaluations. These allegations appear to have lost their sharp edges over time. The experience of several countries suggests that such evaluations have become a routine part of the work of audit institutions. Indeed, the value-for-money audit is largely done through evaluations.
If evaluation is not to be trivialized it has to focus on areas that are traditionally known for wastage, or where the pay-off is rather high. In turn, greater emphasis should be laid on development projects or in sectors where expenditures tend to be larger. Evaluations of projects and programs undertaken by international financial institutions (such as the World Bank and regional banks) point to the usefulness of this approach. Much, however, remains to be done in strengthening the process of evaluation and in integrating it with the expenditure management process.
Accountability is incomplete if the findings of an evaluation are not made available to the public. Although evaluation agencies have been providing guidelines on how data are to be presented and reports are to be written,15 the user for whom they are written has not been precisely identified. Consequently, they have limited shelf life and even more limited use. These aspects underline the importance of systematic and sustained investment in evaluation.
A discussion of the application of performance budgeting to a developing country (India) is available in Premchand (1969). A study of the implementation of the system in a few Asian countries is provided in Dean (1989).
United Kingdom, Privy Council (1981), p. 2. Similar efforts were initiated at that time in Canada for the formal introduction of evaluation of completed programs and for the introduction of the measurement of efficiency (ratio of output to input). See Premchand (1990), p. 227.
A summary discussion of these three techniques is provided in the paper by Diamond (1990), pp. 147 ff.
A survey carried out by Jackson of public sector managers showed that the best features of these indicators were that they forged an ability to make comparisons, to highlight areas of interest, to provide a comprehensive picture of the service, and to gain a perspective of the service or product over time. See Jackson and Palmer (1989), p. 2.
Much progress still has to be made in former CPEs.
This has contributed, in some cases, to the redundancy of a separate reporting system as part of performance budgeting. See Dean (1989) for a case study of India in this regard.
Although CPEs had development plans for a long time, systems of evaluation were not recognized, and no effort was made to establish them. They also had no formal audit. There was, however, a system of financial inspection that looked into allegations of financial irregularities.
A detailed discussion of this aspect is provided in Canada (1981), United States, General Accounting Office (1985), and Wholey, Newcomer, and Associates (1989).
See, for example, United Kingdom, National Audit Office (1991d).