Program evaluation is essential for the development and maintenance of quality education programs. Results of the evaluation process are used to determine whether scarce financial resources are being used appropriately, suggest curriculum revisions, identify faculty strengths and development needs, determine whether the graduates have achieved the intended objectives, and facilitate accreditation decisions (Watson & Herbener, 1990).
A variety of frameworks for educational evaluation have been proposed (Whiteley, 1992). While each framework is unique, most focus in some way on the concepts of structure, process and outcome, originally suggested by Donabedian (1979). Although each of these concepts is important, resources do not usually permit simultaneous evaluation of all three. Instead, the purpose of the evaluation should guide the specific approach.
The purpose of this article is to outline an outcome evaluation designed to assess the overall impact of a new educational program. In outcome or impact evaluation, data are collected from graduates to determine the degree to which terminal competency objectives were achieved (Watson & Herbener, 1990). Waddell (1991) explains that impact evaluation is used to measure the learners' practical application of the content of the learning experience.
The Educational Program
In 1986, it was decided to introduce neonatal nurse practitioners (NNPs) into tertiary level neonatal intensive care units (NICUs) in Ontario. The role of the NNP was designed to include complete responsibility for the management of neonates in the delivery room and NICU under the indirect supervision of an attending neonatologist, participation in the education of nursing staff and residents, and active involvement in research projects (Hunsberger, et al., 1992). To prepare nurses for this new role, a neonatal stream was incorporated into the existing Master of Health Sciences (MHSc) program at McMaster University in Hamilton, Ontario, Canada. In September 1986, the first class of NNP students enrolled in the graduate program. Since 1986, 7 classes of NNPs (n=28) have graduated from the 16month program with a MHSc degree. The program is designed to provide: Ua general knowledge base to practice at an advanced level in healthcare; 2) advanced theoretical knowledge related to healthcare in general; 3) a specific neonatalfocused theoretical base; 4) advanced clinical practice; and 5) research preparation. It consists of 600 hours of theory-specific problem-based tutorials and 720 hours of supervised clinical practice.
Students are required to complete core MHSc courses which focus on individual, family, and group theory; health/illness models; factors which influence health; and interprofessional team function. In addition, they complete an advanced neonatal concepts course which provides the opportunity to develop problem solving and clinical decision-making skills using selected neonatal problems. This course is designed to assist the student in acquiring advanced theoretical knowledge in the physiologic and behavioral sciences.
As part of the clinical course, the students participate in weekly laboratory sessions to practice delegated medical acts on mannequins and animals prior to actual clinical application. Clinical experiences in the NICU permit the progressive development of skills and knowledge through exposure to a variety of patient and family situations. Under the supervision of a practicing NNP and a neonatologist, they develop a problem-oriented approach to management using various theoretical models and frameworks. The students are required to develop skills in physical, family, and behavioral assessment; problem identification; management of health/ illness needs; and communication with families and colleagues. The learning environment provides the opportunity for the students to practice these skills within the context of an interprofessional health care team in which collaboration is emphasized.
A requirement for all students in the MHSc program is the completion of a research project. Students are responsible for identifying a research question, collecting data on a small sample of clients, analyzing the data, and writing and presenting a summary of their work.
Evaluation of the Educational Program
To evaluate the NNP educational program, we compared first year NNP students to graduating NNPs in terms of knowledge, problem solving, and communication skills.
The study took place in November of 1987, 1988, and 1989 with all 8 first year and 10 graduating NNP students of those years. There were five first year and three graduating NNP students in 1987; three first year and four graduating NNPs in 1988; no first year and three graduating NNPs in 1989. Four of the five first year students in 1987 became the graduating NNP students in 1988 (the fifth first year student dropped out of the program) and the three first year students in 1988 became the graduating NNP students in 1989. To address a separate question related to the evaluation of graduating NNPs, pediatric residents were also included in the study; however, these data have been reported elsewhere (Mitchell, et al., 1991). The NNPs were aware of the overall structure of the evaluation 1 month in advance; no specific preparation was provided.
Participants were assured that faculty would not be informed of their individual performance in the various components of this study, that the scores would not in any way influence their formal program evaluations, nor would their scores be recorded in their student files. For feedback purposes, evaluations were shared with the participants, if they desired, but these were not given to instructors nor anyone else associated with the program.
To ensure that graduating NNPs were not at an advantage because of exposure to the tests the previous year as first year NNP students, the content of each test was changed from year to year. Similarly, to ensure blindness, the same evaluators were not used for consecutive years to avoid their recognizing graduating NNPs from their previous year's involvement as first year NNP students. All evaluations were conducted by experienced NICU clinicians and experts in communication skills who were blinded to group membership. The NICU clinicians were recruited from different universities and communication experts had no affiliation with the NICU. All evaluators were shown a list with the names of candidates to be evaluated to determine whether any individuals were familiar to them.
Multiple-choice questions (MCQ) relating specifically to neonatology were taken from several existing self-assessment tests (Dworkin, 1987; Kravath, 1987; LaCerva, 1984; Lipman, 1985; Lorin, 1983; Pantell, 1987; Sahler, 1986, 1987, 1988) and screened for content by two neonatologists and a neonatal clinical nurse specialist all of whom were familiar with the objectives of the NNP program. One hundred questions were selected for the MCQ examination each year. A radiologist, familiar with the objectives of the program, chose 20 radiographs with corresponding clinical scenarios for students to evaluate. Participants were given 2 hours to complete the MCQ examination and 1 minute to evaluate each radiograph.
Items from the MCQ and radiographs on which all candidates scored poorly were reviewed by a neonatologist. If these questions were found to be inappropriate due either to content or format, they were discarded and then the final scores were compiled. As a result, three of the 100 MCQ and two of the 20 radiographs were withdrawn in 1987; five MCQ were withdrawn in 1988.
A semi-structured oral examination was developed from a pool of clinical scenarios submitted by NICU medical directors who had not been involved in the education of the study participants. Six directors contributed 12 scenarios in 1987 and 8 directors contributed 15 scenarios in 1988. For each scenario, the directors provided a description of the presenting situation, clinical data including a short history and details of physical examination, the results of laboratory investigations, and answers expected from the candidates. These included possible diagnoses, crucial information required to resolve the problem, key underlying pathophysiologic principles, interpretation of investigations and major management decisions.
Eight of the scenarios independently rated as most appropriate by two neonatologists and a neonatal clinical nurse specialist were included in the oral examination each year. Each participant was examined on four problems and given 20 minutes per problem. The method used for assignment of problems to participants ensured that each comparison group was exposed to all eight problems. Each year, an Ontario NICU medical director, with experience as an examiner for the Royal College of Physicians and Surgeons, and unfamiliar with any of the study participants administered the oral examination. All candidates were asked to remove identifying articles such as name tags so that the examiner remained unaware of their group identity.
Scores ranging from 1 indicating major problems to 7 indicating superior performance were assigned to each of the following five categories: hypothesis generation, data gathering and interpretation, knowledge, interim problem formulation, and issue identification. Total scores out of a maximum of 35 per problem were calculated for each candidate. The examiner was also asked to provide an overall assessment per problem rating each candidates performance as satisfactory, borderline, or unsatisfactory.
Three different scenarios were used each year to test the participants' skills in communicating with parents of neonates. The scenarios were developed with input from a NICU parent program coordinator, a follow-up clinic nurse, a neonatal clinical nurse specialist and two neonatologists. Individuals with previous experience as simulated patients were trained to assume the parent roles and rehearsed them with the neonatal clinical nurse specialist and the NICU parent coordinator.
For each scenario, study participants were given a brief written description immediately prior to meeting the respective parents. The candidates spent a maximum of 15 minutes communicating with parents in each of the three scenarios. Each scenario was observed from behind a oneway mirror by a communication expert.
Evaluation of each participant was completed both by the expert observer and the simulated parents, all of whom were unfamiliar with the study participants. The observer scoring form permitted evaluation of 11 criteria each on a scale of 1 signifying major problems to 7 representing superior performance, with a maximum total score of 77. The parent scoring form consisted of 30 statements each rated on a scale of 1 indicating disagreement with the statement to 5 indicating agreement with the statement, with a maximum total score of 150.
Reliability of the test measures was calculated using repeated measures ANOVA and intraclass correlation coefficients (Winer, 1971). To examine the differences between first year and graduating NNPs, multivariate ANOVA was performed, followed by univariate tests. Because there were three graduating NNP students in 1987 for whom there were no first year scores and because one first year student dropped out of the program, unpaired t tests were used to compare scores for the 8 first year and 10 graduating NNPs on the various measures of outcome. Paired t tests were also calculated for the 7 students for whom there were both first year and graduating year scores. Confidence intervals around differences between groups were calculated.
All NNP students in the 1987, 1988, and 1989 classes agreed to participate in the study. They all had completed a 4-year baccalaureate degree in nursing in a university setting (mean years since graduation: 3.7 for first year NNPs and 6.9 for graduating NNPs). The groups were similar in age (first year NNPs: mean = 29.0 years; graduating NNPs: mean = 30.7 years) and all were women. The number of years they had worked in an NICU as staff nurses was similar (3.2 for first year NNPs and 4.2 for graduating NNPs).
Reliability of Test Measures
Reliability of the test measures was determined prior to examination of group differences. There was overall consistency within test items for each student for the MCQ and radiographs with Cronbach alpha ranging from 0.75 to 0.87 for the MCQ and 0.58 to 0.62 for the radiographs.
In each year of the oral examination, there was only a single examiner and therefore no opportunity to determine interrater reliability. The "intercase" reliability (correlation of category scores across the 4 cases for each candidate) ranged from 0.13 to 0.29 (mean = 0.19). This is consistent with the literature on content specificity (Elstein, Shulman, & Sprafka, 1978; Norman, Tugwell, Feightner, & Muzzin, 1985). However, the reliability of the average score across cases would be expected to be higher, and ranged from 0.37 to 0.63 (mean = 0.47). In the communication skills, the average "interrater" reliability between the two parent ratings on the 30 items for each scenario was 0.44 (range 0.02 to 0.78). This reliability is quite low and likely the result of the small variance in scores across candidates. The low between-subject variance would not affect a comparison between groups.
Blindness of E valuators
The oral examiner evaluated 14 candidates in 1987, 11 in 1988, and 6 in 1989 (including residents). After each of 4 problems, the examiner was asked to note whether the candidate was a first year NNP student, a graduating NNP, or a resident. The candidate's status was correctly identified in 64% of the cases in 1987, 64% in 1988, and 46% in 1989 (K = 0.41; 95% CI = 0.29, 0.53). The three observers who rated communication skills correctly identified the participants' status 71% of the time in 1987, 60% in 1988, and 44% in 1989 (K = 0.44; 95% CI = 0.30, 0.58). Individuals portraying parents of neonates and rating the participants, correctly identified their status 44% of the time in 1987, 38% in 1988, and 42% in 1989 (K = 0.11; 95% CI = 0.06, 0.18).
Comparison of First Year and Graduating NNP Students
The multivariate comparison of graduating and first year NNP students approached significance (F[5,7]=2.76, p = 0.10). Unpaired t tests showed that graduating NNPs (n = 10) scored significantly higher than first year NNPs (n = 8) on the radiograph test (difference 19.6%; 95% confidence interval [CI] around difference 8.4,30.8; p = 0.01) and the oral examination (difference 13.7%; 95% CI 0.6,26.9; p = 0.04) but not in the MCQ (difference 7.1%; 95% CI - 2.0,16.3; p = 0.14), communication skills as assessed by either parents (difference 1.1%; 95% CI -6.2,8.4; p = 0.75) or observers (difference 10.9%; 95% CI -0.02,21.8; p = 0.05). Paired t tests comparing scores for the 7 students who participated both as first year and as graduating NNP students confirmed the statistically significant difference in x-ray scores, but did not result in a statistically significant difference in oral examination scores.
In the oral examination, first year NNPs were rated satisfactory on 72% of the scenarios over the 3 years and the graduating NNPs on 86% of the scenarios.
The literature reveals a paucity of studies regarding the development and testing of quantitative evaluation tools in nursing education (Watson & Herbener, 1990). This article describes the evaluation of knowledge, problem solving, and communication in one graduate program. Because the neonatal stream of the MHSc program was new and designed specifically to prepare neonatal nurses for an advanced practice role with complete responsibility for the management of infant care, it was essential that an outcome evaluation be conducted to ensure that terminal competencies had been achieved. In this study, graduating NNPs scored significantly higher than first year NNP students in the radiograph test and in problem-solving measures. When analyses were restricted to those seven who had participated in both the before and after phases, scores on the radiograph test remained significantly different; however, differences in scores on the problem-solving measure were no longer statistically different. A larger sample size would have permitted detection of small but important changes in scores.
The study did not include an evaluation of clinical procedures because the first year NNPs had not yet learned how to perform the delegated medical acts. While clinical procedures are an important part of the NNP role, the delegated medical acts are continually evaluated in the educational program and cannot be performed independently until the individual has been certified. Certification for each delegated medical act requires successful completion of the skill a specified number of times under the direct supervision of a neonatologist, neonatal fellow, or certified NNP.
By definition, results of program evaluations are not usually generalizable to other programs (Waddell, 1991). However, this study may be useful to others as a program evaluation model which uses a combination of reliable methods to assess a variety of domains of competence, includes a comparison group and blind assessment of outcomes. The sample size was large enough to detect a statistically significant difference in knowledge regarding radiographs and problem solving. At the same time, the small sample size limits the precision of the study findings as indicated by the wide confidence intervals around the differences. Further work should be done to test the reliability and validity of the evaluation measures.
- Donabedian, A. (1979). Evaluating the quality of medical care. In H.C. Schulberg & F. Baker (Eds.), Program evaluation in the health fields (pp. 186-218). New York: Behavioral Publications.
- Dworkin, P.H. (Ed.). (1987). Pediatrics. New York: John Wiley & Sons.
- Elstein, A.S., Shulman, L.S., & Sprafka, S.A. (1978). Medical problem solving: An analysis of clinical reasoning. Cambridge, MA: Harvard University Press.
- Hunsberger, M., Mitchell, A., Blatz, S., Paes, B., Pinelli, J., Southwell, D., French, S., & Soluk, R. (1992). Definition of an advanced nursing practice role in the NICU: The Clinical Nurse *| Specialist/Neonatal Practitioner. Clinical Nurse Specialist, 6, 300-301.
- Kravath, R.E. (Ed.). (1987). Pediatrics: Pretest self-assessment and review. New York: McGraw-Hill Book Company Health Professions Division Pretest Series.
- LaCerva, V. ( 1984). Medical examination review: Pediatrics. New York: New Hyde Park.
- Lipman, R.P (Ed.). (1985). Pediatrics: Pretest self-assessment and review. New York: McGraw-Hill Book Company Health Professions Division Pretest Series.
- Lorin, M.I. (1983). Pediatrics review New York: ARCO Publishing Inc.
- Mitchell, A, Watts, J, Whyte, R., Blatz, S., Norman, G.R., Guyatt, G. H., Southwell, D., Hunsberger, M., & Paes, B. (1991). Evaluation of graduating neonatal nurse practitioners. Pediatrics, 88, 789-794.
- Norman, G.R., Tugwell, P., Feightner, J.W., & Muzzin, L.J. (1985). Knowledge and clinical problem solving. Medical Education, 19, 344356.
- Pantell, R.H. (Ed.). (1987). Rudolph's pediatrics: A study guide. East Norwalk, CT: Appleton & Lange.
- Sahler, O.J.Z. (Ed.). (1986). PREP selfassessment relating to educational objectives for PREP-2-year two: 1986-1987. Elk Grove Village, IL: American Academy of Pediatrics.
- Sahler, O.J.Z. (Ed.). (1987). PREP selfassessment relating to educational objectives for PREP-2-year two: 1987-1988. Elk Grove Village, IL: American Academy of Pediatrics.
- Sahler, O.J.Z. (Ed.). (1988). PREP selfassessment relating to educational objectives for PREP-2-year two: 1988-1989. Elk Grove Village, IL: American Academy of Pediatrics.
- Waddell.D.L. (1991). Differentiating impact evaluation from evaluation research: One perspective of implications for continuing nursing education. The Journal of Continuing Education in Nursing, 22, 254-258.
- Watson, JE., & Herbener, D. (1990). Programme evaluation in nursing education: The state of the art. Journal of Advanced Nursing, 15, 316-323.
- Whiteley, S. (1992). Evaluation of nursing education programmes - Theory and practice. International Journal of Nursing Studies, 29, 315-323.
- Winer, B.J. (1971). Statistical principles in experimental design. New York: McGraw-Hill.