Clinical simulations have been used for years and in nursing education (Boreham, 1977; Crawford & Ver Steeg, 1976; Hölzerner, Schieutermann, Farrand, & Miller, 1979; Mclntyre, 1972; McLaughlin, Cesa, Johnson, Lemons, & Anderson, 1978 & 1979; McLaughlin, Carr, & Delucchi, 1981; Sherman, Miller, Farrand, & Hölzerner, 1979). Simulation exercises have the potential advantage of presenting life-like situations without the associated complexities and dangers, while retaining the advantages of conventional instruction (McGuire, Solomon, and Bashook, 1976). Clinical simulations may be used either for purposes of instruction or evaluation. When using paper-and-pencil or computer-assisted clinical simulations for instructional purposes, the student would receive positive feedback including item rationales, depending upon which items were selected. However, when a simulation is used for evaluation purposes, response feedback is designed to be comparable to an actual clinical setting. Therefore, item rationales, corrective feedback, and positive reinforcement are not provided the student. Simulations in the evaluative mode can provide feedback of performance similar to an objective test within a practical time frame. There is an objectivity contained in simulations which results from the standardization of a stimulus set, something which cannot be achieved in a clinical setting. In addition, this type of test is believed to be more closely linked with clinical performance than is the standard multiple -choice classroom examinations (Boreham, 1977).
Evidence supporting concurrent and predictive validity (subcategories of criterion-related validity) of clinical simulations is weak, and often the relationship between performance on a clinical simulation and performance as measured by direct observation or chart audit is contradictory. For example, Donnelly and Gallager (1978) found nonsignificant correlations between PMP and multiplechoice examinations and negative correlations between PMP and rating scale scores. Other studies have found modest, positive correlations between similar measures (Farrand, Hölzerner, & Schieutermann, 1982; Hölzerner, et al., 1981; Molidor, Elstein, & King, 1978; Sherman, et al., 1979).
If evidence for criterion-related validity of clinical simulations is weak, perhaps educators ought to focus their use of simulations primarily for instructional purposes, not evaluation. This study continued to explore the validity of clinical simulations.
The purpose of this study was to explore the criterionrelated validity of a clinical simulation by examining nurse practitioners' performance as measured by a chart audit and direct observation of care with performance on a clinical simulation. The clinical problem for the simulation, the chart audit, and the direct observation was the management of a hypertensive client. Assessing the validity of an instrument provides evidence that the instrument measures what it purports to measure.
Will nurse practitioner's (NP) performance in the management of a hypertensive client as measured by direct observation, chart audit, and clinical simulations be significantly intercorrelated, thus providing evidence for the criterion-related validity of the simulation?
Clinical Simulation or Patient Management Problems (PMP). The PMP used for this study was reported in the literature by Farrand, Hölzerner, and Schleutermann (1982). The PMP is a branched clinical simulation in the management of hypertension and consists of 318 items.
The instrument is scored by calculating the proficiency scores for sections of psychosocial, pathophysiological, and total. Items are assigned values of +1 if appropriate to management, 0 if optional but not essential, and - 1 if inappropriate. The proficiency score is the sum total of points earned (+Is minus the -Is) divided by the total number of +Is, or the optimal route, converted to a percentage. Psychosocial items refer to the patient's relationship to his social/psychological environment including patient education. Pathophysiological items are concerned with the disease process itself and refers to objective data such as laboratory findings, physical examination findings, etc.
DESCRIPTIVE STATISTICS FOR CRITERION-RELATED VALIDITY STUDY (N = 17)
Farrand et al. (1982) reported both on the content validity, established through literature review and expert clinician reviews, and the modest construct validity, demonstrated by finding significant differences in performance between medical-surgical nurses and nurse practitioners on the clinical simulations, two groups whose performance was expected to differ.
Chart Audit. The chart audit score consisted of summaries of observations in 16 areas of NP behaviors (Resnik, 1981). These summaries consisted of tallies of noted behaviors from 64 possible activity areas in each of the 16 categories (Table 1). The activity areas included such things as diet, exercise, smoking, sexual activity, stress, and the like. In each area, one point was scored for each behavior which took place. No attempt was made in the original instrument or the present scoring procedure to distinguish between desirable and undesirable nursing behaviors, nor to assess the quality of interaction between the nurse practitioner and the patient. The items are constructed, however, to reflect favorable practice in the general sense (e.g., referral of a patient to the community social service agency).
Observation Study. The observation study scores consisted of 30 psychosocial behavior items in eight categories (parenting, career, body changes, parent's death/debility, other) and 61 physiological items of two types: information about normal body functioning and secondary preventions (Resnik, 1981). The specific topics regarding body functioning were determined by each patient's report of his or her problems, and therefore varied from patient to patient. The secondary prevention items included such things as height/ weight, blood pressure, cholesterol and other indicators of physical status. These items were scored if they were verbally mentioned by the NP, and again if observed by the evaluator during that visit. All other items of both psychosocial and physiological type were scored if they were mentioned, and scored again for inclusion in health information when given, in counseling or establishing an intervention contract with the patient, or as part of a referral. Thus, these items were scored if they were included at all in the patient session, and given multiple counts if they appeared in more than one way in the process. Thus, a high score on the observation instrument indicated a degree of thoroughness in the patient interview, but as in the case of the chart audit, did not necessarily indicate a degree of correctness of the nursing behaviors.
Thirty NPs who had graduated from 1978 to 1981 from the Master's degree Adult Nurse Practitioner programs at the School of Nursing, University of California, San Francisco, who were participating in an extensive evaluative research comprised the population (Resnik, 1981).
Trained nurse data collectors observed NPs in practice with a hypertensive client for a two-day period averaging 19.0 minutes each day. They observed and recorded psychosocial and pathophysiological data gathering behaviors on the part of the NP. Because each NP was observed for more than 10 points in time, the totals used in this analysis were averages of the number of observations recorded. These same trained nurse data collectors later returned (two to four months) and conducted a process chart audit on the same client they had observed with the NP. Data collected for the chart audit and observation study were reported in Resnik (1981).
SPEARMAN RANK ORDER CORRELATIONS FOR WITHIN AND BETWEEN INSTRUMENTS (N= 17)
The 30 NPs who participated in the chart audit and observation study were invited through the mail to participate in this criterion-related validity study. Twenty-one NPs agreed to participate in the study and 17 completed the simulation and returned it through the mail for a 57% response rate (17/30).
The descriptive statistics for the three instruments are presented in Table 1. The mean total proficiency performance on the simulation was 37%. The average number of behaviors observed on the chart audit was 30 (12%) and 21 (23%) on the observation study. Subjects were observed to perform more appropriate behaviors than they charted as indicated by the higher percentage score for the observation study
The Spearman rank order correlations comparing performance within and between instruments are presented in Table 2. The PMP was highly intercorrelated with two of the three correlations being significant at the .05 level. Neither the chart audit nor the observation study appear to be highly correlated within itself as noted by non significant within instrument correlations. Across instruments, the chart audit was most clearly associated with the observation study. Chart audit subscore correlations with observation total were: History, r=.77; Physical, r=.52; and, Health Information, r=.50. The PMP did not correlate in the predicted relationship with either the chart audit or the observation study. In fact, the PMP psychosocial score was negatively correlated with the Total Observed Study (t=. 52).
No evidence of criterion-related validity was noted for the PMP when comparing it with the chart audit or the observation study. This finding supports the work of others (Donnelly & Gallager, 1978; Hölzerner et al, 1981; Molidor, Elstein, & King, 1978). Clinical performance appears to be contextually bound such that performance with a simulated hypertensive patient will be different than performance with an actual patient. Page & Fielding (1980) noted that "more behaviors are pursued on PMPs than in the clinical work setting" (p. 536).
Two major limitations of this study are the small sample size and the difficulty of interpreting the chart audit and observation study scores as outcome measures. Correlations are highly affected by sample size so one must cautiously interpret these findings. However, because these findings are consistent with the literature, they provide additional evidence on the difficulty of establishing criterion-related validity of clinical simulations. Secondly, it is important to recall that both the chart audit and the observation scores were process measures of behaviors; whereas the simulation score was a weighted scoring of correct and incorrect selections of items on the PMPs. This study was completed because the scores on the chart audit and observation study could be interpreted to be measures of good clinical practice.
Three implications for nurse educators are drawn from the findings of this study. First, because of the lack of criterion-related validity evidence reported in the literature and by this study, educators should be cautious in utilizing simulations for evaluation purposes. If the validity of a clinical simulation is questionable, it would be difficult to have confidence in assessing student performance with its use. Evidence for content validity can be assured through adopting rigorous expert reviews of the content of the items. Evidence for construct validity or criterion-related validity is problematic. However, because of the positive student response to using clinical simulations and the ability to assure evidence for content validity, educators should consider adopting clinical simulations for instructional purposes.
Second, perhaps the difficulty of establishing the criterion-related validity of clinical simulations has more to do with our modest knowledge of clinical problem solving, than with the instrument itself. Harasym et al. (1979) suggested five possible theoretical explanations for clinical problem solving. One theory posited that problem solving was contextually bound and that it may not be possible to demonstrate a generalizable theory of clinical problem solving. Benner (1984), who applied the Dryfus model of skill acquisition from novice to expert to nursing, also supports the contextual nature of clinical problem solving.
Finally, clinical simulations are forcing measurement experts to rethink techniques for providing evidence for validity and reliability because some of the old rules do not work. This problem is most clearly seen when attempting to demonstrate reliability estimates for simulations. For example, each student does not respond to each item and hence a common assumption of the existing techniques for assessing internal consistency reliability is violated. The technique for estimating test-retest reliability does not apply well to clinical simulations because a particular simulation is "solved" by the student and therefore it is not possible to re-administer the simulation at a later date. To assess test-retest reliability would require an investigator to develop parallel measurement techniques, a difficult task. Techniques for assessing validity of clinical simulations are equally as problematic as the techniques for estimating reliability. As discussed above, one theory of clinical problem solving suggests that problem solving may be contextually bound. It may not be possible to develop parallel simulations that could be used to assess concurrent validity within a criterion-validity framework.
In summary, the findings from this study caution the educator who wishes to utilize clinical simulations for purposes of evaluation to reexamine this position. These findings support those in the literature which have questioned the evidence for criterion-related validity of clinical simulations.
- Benner, P. (1984). From novice to expert: Excellence and power in clinical nursing practice. Menlo Park, CA.: Addison-Wesley Publishing Co.
- Boreham, N.C. (1977). The use of case histories to assess nurses' ability to solve clinical problems. Journal of Advanced Nursing, 2(1), 55-56.
- Crawford, W.R., & Ver Steeg, RF. (1976, March). Immediate and long term changes in simulated clinical performance following didactic instruction and preceptorship. In Research Memorandum Number 20. Los Angeles: U.C.L.A. Center for Health Sciences.
- Donnelly, A.B., & Gallager, R.E. (1978, October). A study of the predictive validity of patient management problems, multiple choice tests, and rating scales. Proceedings of the Seventh Annual Conference on Medical Education, 67-72.
- Farrand, L., Hölzerner, W.L., & Schieutermann, JA. (1982). A study on construct validity: Simulations as a measure of nurse practitioner problem solving. Nursing Research, 31(1), 37-42.
- Harasym, P-H. , Alexander, F., Baumber, J.S., Bryant, H. , Fundytus, D., MacPhail, I., Preshaw, R., Sosnowski, M., Watanabe, M., & Wyse, G. (1979, November). The underlying structure of clinical problem solving process or content. Proceedings of the Eighteenth Annual Conference on Research in Medical Education, (67-72).
- Hölzerner, W.L., Schieutermann, J.A., Farrand, L., & Miller, A.G. (1981). A validation study: Simulations as a measure of nurse practitioners' problem solving skills. Nursing Research, 30(3), 139-144.
- McGuire, C. H., Solomon, L.M., & Bashook, PG. (1976). Construction and use of written simulations. Princeton, NJ: The Psychological Corporation.
- Mclntyre, H.N. (1972). A simulated clinical nursing test. Nursing Research, 21, 429-435.
- McLaughlin, EE. , Carr, J., & Delucchi, K. (1981). Measurement properties of clinical simulation tests: Hypertension and chronic obstructive pulmonary disease. Nursing Research, 30(1), 5-9.
- McLaughlin, EE., Cesa, T., Johnson, H., Lemons, M., Anderson, S., Larson, P., & Gibson, J. (1978, November). Primary care judgments of nurses and physicians, Vol. 1-3. San Francisco: Sah Francisco Veterans Administration Hospital (NTIS #HRP0900605-7).
- McLaughlin, EE., Cesa, T., Johnson, H., Lemons, M., Anderson, S., Larson, P., & Gibson, J. (1979). Nurses and physicians performance on clinical simulation test: Hypertension. Research in Nursing and Health, 2, 61-72.
- Molidor, JB. , Elstein, A.S., & King, L. (1978, October). Assessment of problem solving skills as a screen for medical school admissions. Proceedings of the Seventeenth Annual Conference on Research in Medical Education, 119-124.
- Page, G.G., & Fielding, D.W. (1980). Performance on PMPs and performance in practice: Are they related? Journal of Medical Education, 55, 529-537.
- Resnik, B. (1981). Final report. Core Pathway Program, HRA, Division of Nursing, NU00123.
- Sherman, J, Miller, A., Farrand, L., & Hölzerner, W.L. (1979). A simulated patient encounter for the family nurse practitioner. Journal of Nursing Education, 18(5), 5-15.
DESCRIPTIVE STATISTICS FOR CRITERION-RELATED VALIDITY STUDY (N = 17)
SPEARMAN RANK ORDER CORRELATIONS FOR WITHIN AND BETWEEN INSTRUMENTS (N= 17)