Patient safety requires that all nurses be able to provide basic care, yet each year, the state board of nursing investigates approximately 300 cases of reported practice errors. Currently, the National Council of State Boards of Nursing (NCSBN) is exploring practice errors nationally through root cause analysis of cases submitted by at least 20 nursing boards (Zhong & Thomas, 2012). Within the past 2 years, more than 860 substantiated cases of practice breakdown were submitted for analysis. However, case analysis alone may not distinguish whether a reported case was the result of human error, negligence, or willful misconduct, which is a differentiation recommended by the Institute of Medicine (2004).
Beginning in the 1990s, various assessment methods have been explored by the Arizona Board of Nursing to seek or develop an accurate, legally defensible measurement of nursing competency. Recent advances in technology may now support standardized assessment of the nurse in routine practice, providing valuable insights into nursing performance and relative risk. This article describes a collaborative effort to evaluate nurse performance with Arizona State University (ASU), Scottsdale Community College (SCC), and the Arizona State Board of Nursing (Board) to develop a valid and reliable process to rate nursing performance in a high-fidelity simulation environment.
Searching for Competency Evaluation Methods
Complaints to the Board about substandard nursing practice have been problematic for Board members and investigators because of the lack of an independent assessment of the nurse’s true competency across multiple dimensions of nursing practice. The practice errors described in a complaint may be one-time human errors, in which case discipline would be inappropriate (Marx, 2001), or they may be a reflection of reckless behavior, in which case the nurse’s practice must be assessed and remediated to ensure public safety. Board investigators currently seek to discover the causes of practice errors by interviewing the nurse respondent, reviewing the nurse’s employment history, and examining other indicators of practice proficiency, such as the supervisor’s perceptions and letters of recommendation. All such data, however, are highly subjective and of questionable reliability.
In the past, the Board has explored the following options for competency assessment: del Bueno’s Performance Based Development System (PBDS) assessment (del Bueno, 2011); Lenburg’s (1999) Competency Outcomes and Performance Assessment (COPA) model; the Clinical Performance in Nursing Examination (CPNE) (Nagelsmith, 2010), administered by Excelsior College; and individualized assessment by an educator.
All of these measures were ultimately deemed unsuitable for Board disciplinary purposes because they lacked sufficient validity and reliability for assessing practice problems (COPA, CPNE) or were not licensed for use in that context (PBDS).
Patient safety requires that all nurses possess basic competencies. Nationally, there is a call for the development of processes to measure clinical competency for both new graduates and experienced nurses (Benner, Sutphen, Leonard, & Day, 2010; Institute of Medicine, 2010). Competent clinical performance is situation specific, requiring the integration of cognitive, psychomotor, interpersonal, and attitudinal skills (Price, 2007). One promising mechanism for developing such a measurement is the Objective Structured Clinical Examination (OSCE), first described in the 1970s (Harden, Stevenson, Downie, & Wilson, 1975). OSCEs have been used to assess high-level clinical performance of health care professionals at the “shows how” level with high validity (Auewarakul, Downing, Jaturatamrong, & Praditsuwan, 2005). However, the reproducibility of OSCEs is dependent on the availability of a cadre of experienced, highly trained and skilled actors (Baid, 2011). New electronic simulation technologies provide an alternative to OSCEs (Birkhoff & Donner, 2010; Fero et al., 2010). Simulation tests, with an instrumented mannequin that mimics physiologic events with high fidelity, may be scored live or on video by multiple raters. Simulation scenarios can be scripted, and this technique is less dependent on the repeated human performance of those involved in crafting the situation (Hatala et al., 2008).
Collaboration for Competency Test Development
In 2006, the education advisory committee of the Board developed guidelines for a nursing practice competency assessment using high-fidelity simulation and incorporating Taxonomy of Error Root Cause Analysis and Practice Responsibility (TERCAP) (Benner et al., 2006) categories. TERCAP is an ongoing effort by nursing regulation to categorize root causes of practice breakdown (Benner et al., 2006). High-fidelity simulation was defined as the use of technology to mimic the clinical environment, where participants can provide comprehensive, realistic patient care, including communication, assessment, clinical reasoning, decision making, procedures, and documentation.
The guidelines were sent to all nursing program directors in the state with a call to propose solutions. Two schools with the resources, expertise, and interest needed to develop a valid and reliable assessment of nursing practice using high-fidelity simulation responded. One school was a research-intensive state university, and the other was a large metropolitan community college. Board staff facilitated discussions between the Board and the two schools to design and seek funding to implement the project. Because this project focused on issues central to regulation, the preferred funding source was the National Council of State Boards of Nursing (NCSBN) Center for Regulatory Excellence (CRE) grant program.
The challenge involved in developing an evaluation instrument and an assessment process that would be both general enough to allow evaluation of nursing performance across client conditions and specific enough to provide a basis for remediation, in addition to a unique profile, was embraced. The authors retained the TERCAP categories that were originally proposed (Benner et al., 2006), with the modification that the categories would be reframed in positive terms. For example, “lack of attentiveness” became “attentiveness.” Eight distinct, although not mutually exclusive, competency categories were identified for initial inclusion: (1) professional responsibility, (2) client advocacy, (3) attentiveness, (4) clinical reasoning, (5) communication, (6) prevention, (7) procedural competency, and (8) documentation.
The authors then reviewed the NCSBN survey tool, Clinical Competency Assessment of Newly Licensed Nurses (CCANLN) (NCSBN, 2007), consisting of 35 items that measure clinical competency, practice errors, and risks for practice breakdown through information reported by nurse-preceptor dyads according to a Likert-type scale. With permission, items contained in the CCANLN were categorized into the modified TERCAP-based categories to develop a new instrument. For example, in the area of attentiveness, the following observable behaviors were adapted from the CCANLN: recognizes changes in client condition necessitating intervention, meets client’s emotional/psychological needs, proactively monitors client to identify changes in a timely manner, and recognizes complications of treatments and procedures. The authors then eliminated the Likert-type scale and adopted a dichotomous scale that clearly distinguished competent versus incompetent performance on an item. A nurse’s performance on the item was considered competent if the “action or intervention is consistent with standards of practice and free of actions that may place the client at risk for harm.” According to this scale, a nurse’s performance was considered incompetent if the nurse “fails to perform or performs in a manner that exposes the client to risk for harm.” Some of the items were separated into two items, and items were added to address gaps. Later in the project, the clinical reasoning category, which contained eight items, was divided into two categories, noticing and understanding. This change resulted in an evaluation instrument consisting of 41 items divided into nine categories. Each category had four to six items. This instrument was used to build a nursing performance profile (NPP), with results mapped across categories to obtain a graphic representation of a nurse’s practice, similar to standardized examinations in education (Figure). The authors developed a pilot scenario based on an adult patient who was admitted to a medical or surgical unit and then videotaped volunteer nursing students performing the scenario.
Figure. Example of a nursing performance profile.
Content validity of the NPP instrument was evaluated by alternating instrument development with ratings by content experts in nursing practice and education until the experts reached 100% agreement on each item’s representativeness, clarity, and consistency. The instrument was pilot tested by asking different content experts to score videos of volunteer student performances on the simulation scenario. The mean percentage of rater (n = 5) agreement per item (interrater reliability) was 92% across all raters (nurses with supervisory experience) and all items. Reliability testing showed a Cronbach’s alpha of 0.93. A week later, the same experts evaluated the videos a second time. The intrarater reliability (degree of consistency between two ratings by the same individual) ranged from 85% to 97% for individual raters, with an average of 92% across all raters. Refinements to the NPP process and instrument were made based on the inter- and intrarater reliability testing. During pilot testing, it was noted that experienced nurses in current practice tended to rate performance more critically than educators. Therefore, it was decided that rater qualifications in subsequent phases of the project would include current practice, a minimum of 3 years of nursing practice, and experience evaluating nursing performance.
From Concept to Grant Award
The grant proposal, accompanied by a promising tool and a video illustrating the process and procedures, was approved and funded by the NCSBN CRE grant program in December 2009. Before the project was funded, institutional review board approval was obtained from each respective school’s human subjects committee. Interagency and intergovernmental agreements were crafted for disbursement of project funds from the Board to the schools.
Because of the mutual respect developed during the development of the grant proposal, the roles and tasks of team members were determined by consensus. The Board staff member organized meetings, kept minutes, wrote quarterly reports, and coordinated team member activities and communication. Simulation staff at both schools developed and refined the scenarios. The expert simulation programmer on the team wrote all of the programs. The lead researcher, who used simulation as the basis for her doctoral dissertation, served as the principal author of the grant and subsequent scientific publications. Another doctoral-level nurse with previous research funding provided invaluable consultation and direction. The team was fortunate to recruit a biostatistician to provide technical advice during the experimental design phases and to analyze the data. Other team members contributed to the development of the scenarios, videotaping participants and recruiting subjects. A project secretary scheduled participants and recorded the data.
Scenario development was the first major task. Scenarios were intended to measure basic competency with broad applicability and to provide opportunities for individual nurses to exhibit competency on all nursing performance items. Simulation scenarios were based on simple adult health situations that require action on the part of a nurse to avoid patient harm. This area of focus was chosen because approximately 60% of Arizona nurses work in acute care, the majority in medical-surgical or special care units. Medical-surgical scenarios provide opportunities to incorporate clinical nursing skills and clinical reasoning skills that are foundational to other practice settings and populations. Later projects may include the development of scenarios based on other populations, such as pediatrics, home care, and school nursing.
Each scenario included a conflict situation (e.g., a relative calls for information about a patient, in violation of Health Insurance Portability and Accountability Act regulations) and an opportunity for teaching. Each scenario called for demonstration of at least one basic psychomotor skill (e.g., oxygen administration, administration of intravenous medications) and the provision of basic comfort measures (e.g., positioning, provision of nutrition). The scenarios were designed to be completed in 30 minutes or less. Three sets of three scenarios (nine total) were developed.
Once the scenarios were written, all team members participated in validation day. On this day, the authors took turns performing as the nurse in the scenario and other team members provided patient responses, ran the scenario, observed the scenario for inconsistencies, and verified that all NPP items were included. This validation process resulted in refinement of scenarios before their implementation and led to a greater understanding of the anxiety that subjects may experience during the NPP process. After validation, scripts and set-up processes were refined and camera angles adjusted to provide for the best possible simulation environment.
The next task was to recruit volunteer nurse performers and volunteer nurse raters in three phases. Each volunteer nurse performer completed three scenarios. Some of the volunteers were coached to make errors, others were coached to perform well, and some were instructed to perform at their personal best. Videos of each nurse’s performance were rated by three volunteer nurse raters. With three sets of three scenarios that had been developed and 21 unique nurse performers, 63 videos were produced. The 63 videos generated 189 rating instruments. Each instrument had 41 items, generating 7,749 ratings. Videos were rated in random order, and the raters were unaware of which nurses were coached before the scenarios. After the first round of testing, the authors agreed that training was needed to help raters to identify key components of the scenarios and provide appropriate ratings. Before the institution of rater training, raters seemed to have difficulty in the following areas: (a) distinguishing adequate but not perfect performance of a procedure from unsafe performance that puts the patient at risk (e.g., scoring as unsafe a nurse who addressed a patient’s concern about a medication but did not extensively educate the patient about side effects); (b) scoring overlapping items (e.g., if two identifiers were not used, scoring all procedures as unsafe); and (c) distinguishing practicing outside of the scope of practice from making an error (e.g., scoring a medication error as a scope of practice violation because the nurse was “prescribing”). Rater training for the last round of rating included verbal instructions, a packet of materials, and a facilitated practice session with three training videos. Raters discussed their scores on the training videos until consensus was reached.
Analysis of statistical data, review by project consultants, and discussion of observations in administering the profile and simulations informed changes to the instrument and process. The following changes were made during the yearlong endeavor:
- Although all scenarios involved coordinating care with the health care team, some raters did not identify this activity as delegation without nursing assistants or other unlicensed personnel represented in the script. Therefore, an item regarding delegation was changed from “Delegates aspects of care appropriately” to “Delegates/coordinates aspects of care appropriately.”
- Raters did not initially identify a difference in goals or plans as a “conflict,” resulting in many blank or “not applicable” scores on that item. As a result, conflict was redefined within the tool as “any situation where the wishes of an individual (patient/family member/health team member) are not consistent with the plan of care or patient safety.” This change resulted in more consistent scoring for that item.
- Blanks in scoring could make reliability and validity difficult to establish. Therefore, instructions for raters were changed to include the additional step of verifying that there were no blank items.
- In the first iteration of the profile, the unequal number of items in each category in effect weighted the categories unevenly. Some categories included more opportunities for passing or failing than others. The weighting of categories was inconsistent with the purpose of the competency testing, the dichotomous nature of the evaluation, and the concept of root causes from which the NPP originated. Although some behaviors may be easier to learn or demonstrate, any behavior that is essential to nursing competency and patient safety was considered as important as any other. Thus, the instrument was revised to include 41 items and nine categories. Each category contained four to six items. To accomplish this goal, the clinical reasoning category was divided into two categories, noticing and understanding, and additional measures were added to the documentation category.
Hinton et al. (2012) reported that the NPP instrument and the processes used in this project resulted in a promising measure of nursing competency. Internal consistency was high (alpha = 0.91). Construct validity was demonstrated because participants scored lower on items known to have high rates of noncompliance in practice, such as infection control and documentation, and higher on items that were easier to demonstrate, such as advocacy. A two-way (2 × 9) mixed analysis of variance was used to determine whether ratings differed based on experience. Nurses with 1 to 3 years of experience performed significantly better than nurses with less than 1 year of experience in six of the nine categories: attentiveness, p = .002; clinical reasoning (noticing), p = .01; clinical reasoning (understanding), p = .005; communication, p = .03; procedural competency, p = .002; and documentation, p = .02 (Table). Results of other analyses using the same statistical process showed that the order of videotaping, the specific scenario, previous experience with simulation, and the location of testing were not significant factors. The process generated a unique competency profile (Figure) for each nurse that was based on a substantive behavior sample (three simulation scenarios each rated by three raters).
Table: Comparison of Scores for Nurses with Different Levels of Experience
During the yearlong process, the authors experienced challenges in the areas of institutional collaboration and volunteer recruitment.
Although the authors worked well together, the coordination of processes and policies across a governmental agency, a state university, and a community college was challenging, and the authors were surprised at the length of time and the difficulty involved in negotiating contracts among the agencies. In one case, enacting a subcontract took almost the entire duration of the research project. Illness and leave time in the support offices of two of the institutions delayed processes. The institutional review board process was different at the different campuses. One school declared the project exempt and the other did not, resulting in different levels of supervision of the same project by the two different institutions.
Recruitment of volunteers to perform and rate the scenarios required significant effort. Details of the project were discussed at length with potential volunteers. Three barriers were identified: the time required, the travel required, and the idea of being evaluated in an unfamiliar setting. Each volunteer nurse performed three scenarios that involved at least 3 to 4 hours of participation and travel to one of the schools. Each volunteer rater devoted several hours to traveling to a school and evaluating scenarios. Initially, recruitment of raters was easy because the time commitment was 1 day. However, later in the project, when raters were asked to devote several days to rating, the process of securing raters became more difficult. Committee decisions about the recruitment and retention of volunteers included seeking support from local nurse leaders and posting notices seeking volunteers in Board publications, in the state nurse’s association newsletter, and on the Board website. Volunteers were provided with paid parking, a letter thanking them for participation, and a $30 gift card per day of participation.
Volunteer nurses, although initially nervous about having their performance evaluated, reported that they appreciated the opportunity to practice skills, and some noted that they found the experience to be fun. The authors were impressed by the enthusiasm of the volunteer raters and their commitment to the project. One participant rater stated, “I learned so much about competency.” Some local hospitals credited participation in the project as community service toward promotion on their clinical ladders.
Additional benefits of participation in this endeavor were noted by each participating agency. Participating nursing programs adopted the template that was used to validate scenarios to design and run multidisciplinary simulations for students. Selected competencies and sets of essential behaviors from the NPP instrument were used to assess the performance of nursing students in practicum examinations. Selected NPP performance items were mapped to the nursing program curricula to verify that course competencies were aligned with essential behaviors. The Board benefited by having a process with the potential to assess competency in practicing nurses objectively and fairly.
Future of the Project
Funding by the NCSBN for phase II research was awarded through another CRE grant. Phase II research is designed to provide additional evidence of validity and reliability in a larger and more diverse sample of nurses. During phase II, the NPP process will be used to further explore performance trends, test validity, and characteristics related to competent performance during simulation examinations. A phase III project to quantify the sensitivity, diagnostic validity, and instructional utility and treatment validity of the NPP process is planned.
The Board has begun to refer nurses and licensees for a nursing performance evaluation using the NPP process, based on the phase I results. The participating schools both submitted proposals to the Board for evaluation; contracts between each institution and the Board were signed and adopted by the Board. Many questions about liability and legal requirements were raised by the institutional representatives. Board staff provided education and direction on Board processes. The evaluator recommendation is only one piece of information that the Board would consider in determining an action on a nursing license. Referring a nurse for testing is similar to the process for other evaluations that the Board orders, such as substance abuse and psychological evaluations. An important difference in this process is that the individual raters are blind to the reason for the evaluation and would not be identified in the report. Therefore, ratings and recommendations are based solely on the nurse’s activities in the scenario and not on the reason for the referral.
Barriers to more extensive use of this process include cost and the suitability of the existing scenarios for specialty practice areas. The participating schools are also refining processes to accept referrals from other entities interested in nursing competency, such as hospitals and schools of nursing.
The development of the NPP process involved a unique collaboration among a state regulatory agency, a state university, and a community college that yielded a reliable and valid instrument and process for assessing performance in nurses during high-fidelity, mannequin-based simulation testing. The entities are currently engaged in phase II testing of this process and are using the process to evaluate the competency of nurses reported for practice breakdown.
- Auewarakul, C., Downing, S. M., Jaturatamrong, U. & Praditsuwan, R. (2005). Sources of validity evidence for an internal medicine student evaluation system: An evaluative study of assessment methods. Medical Education, 39, 276–283. doi:10.1111/j.1365-2929.2005.02090.x [CrossRef]
- Baid, H. (2011). The objective structured clinical examination within intensive care nursing education. Nursing in Critical Care, 16(2), 99–105. doi:10.1111/j.1478-5153.2010.00396.x [CrossRef]
- Benner, P., Malloch, K., Sheets, V., Bitz, K., Emrich, L. & Thomas, M. B. et al. (2006). TERCAP: Creating a national database on nursing errors. Harvard Health Policy Review, 7(1), 48–63.
- Benner, P., Sutphen, M., Leonard, V. & Day, L. (2010). Educating nurses: A call for radical transformation. Stanford, CA: Jossey-Bass.
- Birkhoff, S. D. & Donner, C. (2010). Enhancing pediatric clinical competency with high-fidelity simulation. The Journal of Continuing Education in Nursing, 41(9), 418–423. doi:10.3928/00220124-20100503-03 [CrossRef]
- del Bueno, D. (2011). Performance based development systems. Retrieved from www.pmsi-pbds.com
- Fero, L. J., O’Donnell, J. M., Zullo, T. G., Dabbs, A. V., Kitutu, J. & Samosky, J. T. et al. (2010). Critical thinking skills in nursing students: Comparison of simulation-based performance with metrics. Journal of Advanced Nursing, 66(10), 2182–2193. doi:10.1111/j.1365-2648.2010.05385.x [CrossRef]
- Harden, R., Stevenson, M., Downie, W. & Wilson, G. (1975). Clinical competence in using objective structured examination. British Medical Journal, 1, 447–451. doi:10.1136/bmj.1.5955.447 [CrossRef]
- Hatala, R., Issenberg, S. B., Kassen, B., Cole, G., Bacchus, C. M. & Scalese, R. J. (2008). Assessing cardiac physical examination skills using simulation technology and real patients: A comparison study. Medical Education, 42(6), 628–636. doi:10.1111/j.1365-2923.2007.02953.x [CrossRef]
- Hinton, J. E., Mays, M. Z., Hagler, D., Randolph, P., Brooks, R. & De-Falco, N. et al. (2012). Measuring post-licensure competence with simulation: The nursing performance profile. Journal of Nursing Regulation, 3(2), 45–53.
- Institute of Medicine. (2004). Keeping patients safe: Transforming the work environment of nurses. Washington, DC: National Academies Press.
- Institute of Medicine. (2010). The future of nursing: Leading change, advancing health. Washington, DC: National Academies Press.
- Lenburg, C. (1999). The framework, concepts and methods of the Competency Outcomes and Performance Assessment (COPA) model. Online Journal of Issues in Nursing, 4(2). Retrieved from www.nursingworld.org/MainMenuCategories/ANAMarketplace/ANAPeriodicalOJIN/TableofContents/Volume41999/No2Sep1999/COPAModel.aspx
- Marx, D. (2001). Patient safety and the “just culture”: A primer for health care executives. New York, NY: Columbia University.
- Nagelsmith, L. (2010). Dependability and accuracy of clinical performance in nursing examination scores. Albany, NY: State University of New York.
- National Council of State Boards of Nursing. (2007). Clinical competency assessment of newly licensed nurses. Retrieved from www.ncsbn.org/07_Final_impact_of_Transition.pdf
- Price, B. (2007). Practice-based assessment: Strategies for mentors. Nursing Standard, 21(36), 49–56.
- Zhong, E. & Thomas, M. (2012). Association between job history and practice error: An analysis of disciplinary cases. Journal of Nursing Regulation, 2(4), 16–18.
Comparison of Scores for Nurses with Different Levels of Experience
||Nurses With < 1 Year of Experience(M)
||Nurses With 1 to 3 Years of Experience(M)
|Clinical reasoning (noticing)
|Clinical reasoning (understanding)