Psychiatric Annals

Evaluation of Outcome

Stephen E Breuning, PhD; Patrick K Ackles, PhD

Abstract

"Clinical freedom is dead, and no one need regret its passing . . . opinion is not enough . . . Clinical freedom should, however have been strangled long ago, for at best it was a cloak for ignorance and at worst an excuse for quackery. "1

It is clear that more stringent requirements for drug evaluations are being called for by professional organizations, individual clinicians and researchers, judges, the FDA, and funding agencies. This article will provide a brief overview of the current status of ways to measure behavioral change within the context of pharmacotherapy, minimum requirements for a quality evaluation, and design strategies applicable to clinical practice.

The purpose of pharmacotherapeutic evaluation is to determine whether the administration of a medicine, significantly improves some targeted aspect of the client's behavior. Improvements in behavior that can be empirically attributed to the medicine are the only meaningful demonstration of clinical efficacy. The evaluation of the medicine's effects has two components: the measurement of behaviors likely to be affected by the medication, and the use of assessment techniques to compare performance under conditions that assumedly differ only in whether or not a medicine is administered. To obtain data that are useful, it is crucial that target behaviors are measured in ways that are reliable (repeatable over time or across observers) and valid (reflect what is actually being measured) and involve assessment strategies which are sensitive to changes produced by the medicine. Finally, measures of target behaviors should meet other obvious requirements such as practicality, economy, safety, and ethical acceptability. For further discussion of these topics, the reader is referred to Werry.2

MEASURES OF BEHAVIORAL CHANGE

There are seven frequently used measures of outcome. These are behavioral observations, rating scales, global impression, self-reports, standardized tests, learning/performance tasks, and mechanical movement monitors. Perhaps the optimum measure is that of direct behavioral observation. Here, behavior is recorded as it happens with frequent reliability checks. These procedures require that the conditions of interest be clearly observable without reference to inner states and have well defined boundaries so that independent observers can reliably record the same behaviors. The major limitation with behavioral observations is that they are expensive in terms of staff time and are often not feasible in the natural environment. Several excellent sources discuss specific direct observation procedures in detail,3,5

Rating scales are frequently used and can be implemented in virtually all clinical settings. These are generally problem oriented checklists which are completed by staff or parents after specified periods of time with the client. Rating scales can be quick and efficient as well as inexpensive and practical. However, only rating scales of established reliability, validity, and sensitivity to medication changes should be used. Specific rating scales are discussed elsewhere.2,6,7

Global impressions of overall client behavior represent the most frequently used and least acceptable method of evaluating outcome. Problems with this type of assessment include: 1) failure to measure more than a gross relative change in behavior, 2) failure to capture day to day variability, and 3) failure to be replicable across time or clinicians.2,8

Self-reports are also frequently used and troublesome measures of outcome. Problems include: 1) lack of convenient logistics of use, 2) frequent lack of agreement between self-report and other's evaluation of the behavior, and 3) great sensitivity to non-drug factors which make it difficult to attribute changes to pharmacotherapy.

Standardized tests of intelligence have been used to assess outcome but are of limited value even as an adjunct measure. These tests are generally insensitive to medication changes2,9,10 and are typically indirectly related to the behaviors of interest.

Measures of learning…

"Clinical freedom is dead, and no one need regret its passing . . . opinion is not enough . . . Clinical freedom should, however have been strangled long ago, for at best it was a cloak for ignorance and at worst an excuse for quackery. "1

It is clear that more stringent requirements for drug evaluations are being called for by professional organizations, individual clinicians and researchers, judges, the FDA, and funding agencies. This article will provide a brief overview of the current status of ways to measure behavioral change within the context of pharmacotherapy, minimum requirements for a quality evaluation, and design strategies applicable to clinical practice.

The purpose of pharmacotherapeutic evaluation is to determine whether the administration of a medicine, significantly improves some targeted aspect of the client's behavior. Improvements in behavior that can be empirically attributed to the medicine are the only meaningful demonstration of clinical efficacy. The evaluation of the medicine's effects has two components: the measurement of behaviors likely to be affected by the medication, and the use of assessment techniques to compare performance under conditions that assumedly differ only in whether or not a medicine is administered. To obtain data that are useful, it is crucial that target behaviors are measured in ways that are reliable (repeatable over time or across observers) and valid (reflect what is actually being measured) and involve assessment strategies which are sensitive to changes produced by the medicine. Finally, measures of target behaviors should meet other obvious requirements such as practicality, economy, safety, and ethical acceptability. For further discussion of these topics, the reader is referred to Werry.2

MEASURES OF BEHAVIORAL CHANGE

There are seven frequently used measures of outcome. These are behavioral observations, rating scales, global impression, self-reports, standardized tests, learning/performance tasks, and mechanical movement monitors. Perhaps the optimum measure is that of direct behavioral observation. Here, behavior is recorded as it happens with frequent reliability checks. These procedures require that the conditions of interest be clearly observable without reference to inner states and have well defined boundaries so that independent observers can reliably record the same behaviors. The major limitation with behavioral observations is that they are expensive in terms of staff time and are often not feasible in the natural environment. Several excellent sources discuss specific direct observation procedures in detail,3,5

Rating scales are frequently used and can be implemented in virtually all clinical settings. These are generally problem oriented checklists which are completed by staff or parents after specified periods of time with the client. Rating scales can be quick and efficient as well as inexpensive and practical. However, only rating scales of established reliability, validity, and sensitivity to medication changes should be used. Specific rating scales are discussed elsewhere.2,6,7

Global impressions of overall client behavior represent the most frequently used and least acceptable method of evaluating outcome. Problems with this type of assessment include: 1) failure to measure more than a gross relative change in behavior, 2) failure to capture day to day variability, and 3) failure to be replicable across time or clinicians.2,8

Self-reports are also frequently used and troublesome measures of outcome. Problems include: 1) lack of convenient logistics of use, 2) frequent lack of agreement between self-report and other's evaluation of the behavior, and 3) great sensitivity to non-drug factors which make it difficult to attribute changes to pharmacotherapy.

Standardized tests of intelligence have been used to assess outcome but are of limited value even as an adjunct measure. These tests are generally insensitive to medication changes2,9,10 and are typically indirectly related to the behaviors of interest.

Measures of learning and performance, such as task completion, rate of performance, and shortterm memory, are very useful as adjunct measures of outcome. It is essential to know how a medication is affecting adaptive and habilitative behaviors as well as the behaviors which warranted the intervention. This is the only way an adequate riskbenefit analysis can be performed.11,12

Mechanical devices, primarily to measure activity, have proven useful in laboratory settings.2 However, most devices are presently too cumbersome or expensive for routine clinical use.

MINIMUM METHODOLOGICAL REQUIREMENTS

A number of general methodological criteria are believed to represent the minimum requirements for a valid scientific study of efficacy of psychotropic drugs. In particular, Sprague and Werry13 have specified six criteria that have been rather widely accepted. These criteria include: 1) placebo control, 2) double-blind, 3) standardized doses, 4) standardized evaluations, 5) appropriate statistical analysis, and 6) random assignment of subjects. In 1978 Sprague and Baxley14 added a seventh criteria that compared drug treatment to an alternate treatment. Several of these criteria are applicable to virtually all clinical settings. First, placebo and double blind conditions are not difficult to implement and help differentiate drug and placebo responders as well as control for clinician and observer expectations and biases.1518 Second, standardized doses (eg, mg/kg, or mg/ml) are essential for comparisons across clients or research participants. Also, only one drug or treatment change should occur at a given time. Third, standardized evaluations of established reliability, validity, and sensitivity should go without saying. Fourth, comparisons with alternative treatments are rarely difficult. Finally, in most clinical applications statistical analyses and random assignment are not essential but occasionally could be useful.9,19

EVALUATION DESIGNS

Two major categories of evaluation strategies have been used for testing the effects of pharmacotherapeutics: the group and single-case designs.8,1923 Although the group designs have been and continue to be most widely used in pediatric psychopharmacological research,24 they have characteristics which preclude their use by clinicians wishing to adopt more rigorous empiricallybased evaluation procedures in their everyday practice.20·22 Hence, the group designs will not be discussed, but rather, the focus will be on the singlecase designs in which an individual client is repeatedly assessed across several control and treatment conditions. A broad spectrum of studies in psychiatry and clinical psychology attest to the feasibility of these designs for clinical practice.

As in experimental psychopharmacology, the problem of evaluation design refers to how one may arrange a sequence of control (no-treatment and placebo) and treatment (drug or dose) conditions so that any therapeutic gains can be unambiguously attributed to the medicine (or dose) and not to other extraneous factors (eg, client expectancies, coincidental environmental changes). 19,20,22.25.26 The Table lists examples of control and treatment design options using a conventional notation system for single-case designs.20 For example, in the uncontrolled case study (B design) there is only a treatment phase (B) and no baseline (A) or placebo (A1) phase. The lack of a baseline or a placebo phase in the B design precludes determining unequivocally whether or to what extent any therapeutic or contratherapeutic changes occur during treatment.

In the A-B design, baseline data are first obtained until stable rates or levels of the target behaviors are observed. The baseline is then followed by the treatment phase (B). The baseline in the A-B design allows for determining whether and to what extent changes occur during treatment but does not permit determination of whether the medication or some extraneous factor produced the change. Including a placebo phase (A1) in the A-A1-B design aids in ruling out extraneous factors such as client expectancies about taking a medication and some coincidental environmental changes if no changes in the target behaviors occur in the A or A1 phases but do change during treatment. Nevertheless, neither the B nor A-B designs include sufficient controls for the unequivocal determination of a medication's efficacy.8,19,20,27

Table

TABLEEXAMPLES OF SINGLE-CASE EVALUATION DESIGNS1

TABLE

EXAMPLES OF SINGLE-CASE EVALUATION DESIGNS1

Withdrawal Designs. By including a second baseline following treatment, the A-B-A withdrawal design overcomes many of the methodological shortcomings of the B and A-B quasiexperimental designs. The logic of the A-B-A design is straightforward. If the target behaviors change appreciably during treatment from the pretreatment baseline, and return to pre-treatment levels during the post-treatment baseline, then it is reasonable to conclude that the medicine produced the changes and not some other extraneous factor.20 A similar logic applies to the B-A-B or A-A1B-A1 designs. Extensions of the basic A-B-A designs (eg, the A-B-A-B, A-A1-B-A1-B, etc., designs) and the interaction designs20,27 include additional control and treatment conditions which provide for control of more extraneous factors and replication of the effect from a medication or dose within the individual client thereby providing more compelling evidence of the drug's effects. The length of the phases within these designs should be equal to reduce the likelihood of changes due to chance masquerading as treatment effects. The pharmacokinetic and pharmacodynamic properties of the medicine must also be taken into account when determining phase length. Moreover, it should be noted that these designs are appropriate for medications which do not have irreversible effects and if residual and carryover effects are controlled by including suitable washout periods interposed between adjacent conditions.

Multiple Baseline Designs. In the multiple baseline designs, baseline (A) data are collected for two or more target behaviors for one client (multiple baseline across behaviors), or for one target behavior for one client in two or more settings (multiple baseline across settings) or for one target behavior for two or more clients (multiple baseline across individuals). In each of these designs, the treatment (B) is introduced in a temporally staggered fashion across behaviors, settings, or clients, as if one were conducting a series of simultaneous A-B designs with baselines of varying lengths. Since the treatment is introduced at different points in time, if the target behaviors change only after the introduction of the treatment, then the changes may be attributed to the treatment.

The designs using multiple baselines across behaviors and settings are not feasible when the medication of interest has long lasting effects (ie, long half-life) since it is not possible to limit the drug's effects to only one of the target behaviors or settings at any given time. This, of course, is not a problem for the multiple baseline across individual designs. As in the withdrawal and interaction designs, interposing a placebo phase between the initial no-treatment baseline and treatment phase greatly strengthens the conclusions which may be drawn from this design.

CONCLUSION

Briefly presented are many of the methodological issues surrounding rigorous empirically-based evaluation strategies for assessing psychopharmacotherapeutic interventions. The references cited will provide the interested reader with more detailed discussions of these issues. It should be noted that virtually all of the measures of behavioral change, methodological requirements, and evaluation designs discussed can be readily incorporated into a clinician's everyday practice. Finally, the evaluation designs presented may be extended to include comparisons of more than one drug, or to evaluate the main and interactive effects of drug and non-drug therapies.19

REFERENCES

1. Hampton JR: The end of clinical freedom. Brit Med I 1983; 287: i237-1238.

2. Werry JS: Measures in pediatric psychopharmacology, in Werry JS (ed): Pediatric Psychopharmacology: The Use of Behavior Modifying Drugs in Children. New York, Brunner/Mazel. 1978.

3. Kazdin AE: Behavioral observation, in Hersen M, Bellack AS (eds): Behavioral Assessment, ed 2. New York. Pergamon Press. 1981.

4. Shapiro ES, Barrett RP: Behavioral assessment of the mentally retarded, in Matson JL, Andrasik F (eds): Treatment Issues and Innovations in Mental Retardation. New York, Plenum Press. 1983.

5. Hartman DP, Wood DD: Observational methods, in Bellack AS, Hersen M. Kazdin AE (eds): International Handbook of Behavior Modification and Therapy. New York, Plenum Press. 1982.

6. Early Clinical Drug Evaluation Unit: Assessment Manual. Rockville, MD, National Institute of Mental Health, 1976.

7. Conners CK, Werry |S: Pharmacotherapy, in Quay HC. Werry [S (eds): Psychopathological Disorders of Childhood- New York. John Wiley & Sons. 1979.

8. Sisson LA1 Breuning SE: Assessing medication effects, in Matson JL, Breuning SE (eds): Assessing the Mentally Retarded. New York, Crune and Stratton, 1983.

9. Breuning SE, Davidson NA: Effects of psychotropic drugs on the intelligence teat performance of mentally retarded adults. Am I Mem Defic 1 981 : 85:575-579.

10. Breuning SE, Ferguson DG. Davidson NA. et al: Effects of thioridazine on the intelligence test performance of mentally retarded drug responders and nonresppnders. Arch Gen Psychiatry 1983; 40:309-313.

11. Brevsravig SE, Poling A: Pharmacotherapy , in Matson IV, Barrett RP (eds): Psychopathology in the Mentally Retarded. New York. Crune and Stratton. 1982.

12. Sprague RL: Litigation, legislation, and regulations, i? Breuning SE, Poling A (eds): Drugs and Mental Retardation. Springfield, I L. Charles C. Thomas, 1982.

13. Sprague RL, Werry JS: Methodology of psychopharmacologica) studies with the retarded, in Ellis NR (ed) : International Revieiv of Research on Mental Retardation. New York, Academic Press, 1971, vol 5.

14. Sprague RL. Baxley GB: Drugs for behavior management with comment on some legal aspects, in Wortis I (ed): Mental Retardation. New York, Brunner/Mazel. 1978, vol 10.

15. Breuning SE, Ferguson DG, Cullati S: Analysis of single-double blind procedures, maintenance of placebo effects, and drug induced dyskinesia with mentally retarded persons. Appi Res Ment Retard 1980: 1 :237-252.

16. Gadow KD, White L. Ferguson DG; Placebo controls and double-blind conditions: Part I. Placebo theory In experimental design, in Breuning SE. Poling A, Matson JL (eds): Applied Psychopharmacology: Methods of Assessing Medication Effects. Grune and Strauon. to be published.

17. Gadow KD. White L, Ferguson DG: Placebo controls and double-blind conditions: Part II. Experimenter bias, conditioned placebo response, and drugpsychotherapy combinations, in Breuning SE. Poling A. Matson JL (eds): Applied Psychopharmacology: Methods of Assessing Medication Effects. New York. Grune and Stratton. to be published.

18. Loney J. Milich R: Development and evaluation of a placebo for studies of operant behavioral intervention. / Behav Ther Exp Psychiatry 1978; 9:327-333.

19. Ackles PK: Evaluating pharmacological-behavioral treatment interactions, in Hersen M. Breuning SE (eds): Pharmacological and Behavioral Treatment: An Integrative Approach. New York, lohn Wiley & Sons, to be published.

20. Hersen M, Barlow DG: Single Case Experimental Designs: Strategies for Studying Behavior Change. New York, Plenum Press, 1976.

21. Kazdin AE: Research Design in Clinical Psychology. New York, Harper & Row, 1980.

22. Kazdin AE: Single-Case Research Designs: Methods for Clinical and Applied Settings. New York. Oxford University Press, 1982.

23. Whalen CK, Henker B: Group designs in psyehopharmacological research, in Breuning SE. Poling AD, Matson (L (eds): Applied Psyehopharmacological Research: Methods for Assessing Médication Effects, New York, Grune & Stratton, to be published.

24. Schroetter SE, Lewis [H. Upton MH; Interactions of pharmacotherapy and behavior therapy among children with learning and behavioral disorders, in Gadow K, Bigler 1 (eds): Advances in Learning and Behavioral Disabilities. Greenwich. CT. |AI Press, 1981. vol 2.

25. Campbell DT, Stanley IC: Experimental and Quasi-experimental Designs for Research. Chicago. Rand McNally, i963.

26. Cook TD, Campbell DT: Quusi-experimemation: Design and Analysis Issues for Field Settings. Chicago, Rand McNaIIy. 1979.

27. Hersen M: Single case experimental designs, in Hersen M, Bellack AS, Kazdin AE (eds): International Handbook of Behavior Modification and Therapy. New York. Plenum Press, 1 982.

TABLE

EXAMPLES OF SINGLE-CASE EVALUATION DESIGNS1

10.3928/0048-5713-19850201-09

Sign up to receive

Journal E-contents