Nursing students and instructors feel most rewarded when learning is stimulating and evaluation is fair. The nurse educator competency III, Use Assessment and Evaluation Strategies (National League for Nursing, 2018), encourages use of evidence-based evaluative practices. The purpose of this pilot study was to evaluate the reliability of an undergraduate clinical performance grading rubric that was developed while coordinating several sections of a first-semester geriatric continuum of care experience, and then standardize it for all nursing education practica.
Challenges continue when clinically based behavior objectives (Krautscheid, Moceri, Stragnell, Manthey, & Neal, 2014) are related to educational pedagogy (Benner, 2012: Benner, Sutphen, Leonard, & Day, 2010; Nielsen, 2016). Fair and objective safety evaluation is vital in health care (Bourbonnais, Langford, & Giannantonio, 2008; Heaslip & Scammell, 2012; Tanicala, Scheffer, & Roberts, 2011). Evaluating patient safety with nursing students is generally described through specific behaviors, such as level of supportive cues needed, coordination, time efficiency, and ability to apply theoretical knowledge to clinical decisions (DeBrew & Lewallwen, 2014; Killam, Montgomery, Luhanga, Adamic, & Carter, 2010; Scanlan & Chernomas, 2016; Tanicala et al., 2011). Skill mastery also involves levels of confidence and ability to remain focused on the client (Lasater, 2007). Clearly defined expectations empower instructors and students and may help avoid litigation (Amicucci, 2012; Bofinger & Rizk, 2006; Oermann, Yarbrough, Saewert, Ard, & Charasika, 2009). Finally, Helminen, Coco, Johnson, Turunen, and Tossavainen (2016) reported that there is little evidence to support evaluating clinical performance through written assignments.
Clinical educators must integrate an educational perspective with behavioral objectives. Although it is impossible to remove all subjectivity, research has shown that nonspecific criteria encourage clinical grade inflation (Amicucci, 2012; Isaacson & Stacy, 2009; Seldomridge & Walsh, 2006). One example is the use of broad course objectives, which can result in subjective, inconsistent, and disputable evaluations (DeBrew & Lewallwen, 2014; Tanicala et al., 2011). Clinical instructors and students often lack academic literacy when interpreting comprehensive course objectives as applied to the clinical setting, decreasing face validity (Bowie, 2010; DeVon et al., 2007; Isaacson & Stacy, 2009). For example, a course objective may state “provide safe and effective nursing care.” However, unless there are clearly leveled expectations, behaviors may be inconsistently interpreted and evaluated.
Subjectivity remains an essential theme with all clinical evaluation. The “shades of grey” (Amicucci, 2012, p. 52) when evaluating clinical students creates challenges and rewards. The complexity of nursing environments fosters variance in clinical experiences, making standardized clinical evaluation even more perplexing for academic pedagogy. Helminen et al. (2016) provided a literature review indicating that summative prelicensure clinical student assessment methods vary considerably between programs and often contain subjective bias.
Academic grading rubrics can offer a consistent means to bridge criterion-based clinical behaviors with evidence-based teaching (Shipman, Roa, Hooten, & Wang, 2012). According to Isaacson and Stacey (2009) grading rubrics can enhance critical thinking by helping students notice patterns of improvement or decline, thereby encouraging self-assessment and improvement. When grading rubrics are used to facilitate an educational perspective for criterion-referenced clinical behaviors, standardization across a nursing curriculum is possible. Grading rubrics can guide educators to refine teaching methods using learning theory and evidence-based teaching practice (Stevens & Levi, 2012).
Leveling expected clinical performance outcomes within a nursing curriculum reflects adult learning theory (Candela, 2016; Oermann & Gaberson, 2017) and is encouraged in the literature (Helminen et al., 2016; Roberts, 2011; Tanicala et al., 2011). DeBrew and Lewallwen (2014) conducted a qualitative study of critical student incidents with faculty. One of their conclusions was that nursing students need time to make mistakes before meeting accepted competencies. Without proper leveling of clinical expectations, student safety errors may be missed or potentially be interpreted as an automatic fail.
Furthermore, most nursing programs consider written assignments when calculating final clinical grades (Oermann et al., 2009). According to O'Connor (2015), written work can fill evaluation gaps caused by unobserved clinical experiences as the instructor works with other students. Pedagogical approaches encourage multiple methods of evaluation. Written assignments help demonstrate affective learning, problem analysis, and clinical judgment (Bonnel, 2016). However, written assignments exhibit weak predictive power when evaluating clinical performance skills (Terry, Hing, Orr, & Milne, 2017).
This study adds to the reliability assessment literature of clinical performance evaluation methods. The standardized grading rubric tested in this study is meant to compliment criterion-referenced clinical behavioral objectives in any setting, regardless of student level in the curriculum.
Between 2011 and 2015, the author collaborated with 23 clinical instructors to create an evaluation tool that would extend the course objectives to more consistent measurements. Several previously published clinical evaluation tools informed this process and guided the development of clinically specific criterion-referenced behavior objectives meant to measure the school's nine program outcomes (Bofinger & Rizk, 2006; Bourbonnais et al., 2008; Clark, 2006; Heaslip & Scammel, 2012; Isaacson & Stacey, 2009; Killam et al., 2010; Lasater, 2007; Seldomridge & Walsh, 2006). The following hypotheses were tested: (a) a reliable assessment method will detect increased scores from midterm to final evaluation, and (b) a reliable assessment method will detect no correlation between written assignment scores and clinical performance scores.
Reliability pertains to how consistent an instrument is able measure an element, and internal consistency demonstrates congruence of instrument concepts (Grove, Burns, & Gray, 2013). Few studies have assessed reliability of clinical performance assessment tools (Helminen et al., 2016; Shin, Shim, Lee, & Quinn, 2014). Controlled simulation environments have established tool reliability using methods such as interrater and test–retest assessment (Adamson, Gubrud, Sideras, & Lasater, 2012; Adamson, & Kardong-Edgren, 2012; Lasater, 2007). At the time of this study, no additional funding was available to employ two clinical instructors for one group of students. According to DeVon et al. (2007), subscale coefficient equivalence reliability is the most widely accepted assessment of internal consistency when only a one-test administration is feasible, such as in clinical settings where one evaluator is the norm.
Design and Participants
This study was an assessment of the grading rubric for measures of equivalence reliability. A convenience sample of 58 traditional undergraduate baccalaureate nursing students in a first-semester geriatric continuum of care practice environment was used. Students were predominately Caucasian and female with an average age of 20 years. Prior to data collection, expedited institutional review board approval was obtained from Concordia University Wisconsin. Students were informed of the study's purpose, that participation was voluntary, that all data would be deidentified, and that their final transcript grade would remain a pass/fail. Students could opt out of the study before the semester ended. All students in the cohort accepted participation. Seven instructors at nine clinical sites used the assessment tool at midterm (7 weeks) and then again at the final evaluation (14 weeks).
Clinical instructors were oriented to the clinical evaluation process prior to the beginning of the semester and then mentored through periodic structured meetings. Student written assignments were graded throughout the 14 weeks, each having its own grading rubric. There is an expectation that all written work must earn a minimum of 79% for a passing grade. Instructors scored the clinical performance grading rubric assessed in this study at 7 weeks and again at 14 weeks. All scores were entered by the clinical instructors into the online learning management system. The school's policy for first-semester nursing students included weighting written assignments as 40% and clinical performance as 60% of final scores. Final grades were entered as pass or fail on student transcripts. Several behavioral objectives, marked as critical safety indicators, prevented students from progressing in the program if deemed unsafe (Table 1). Student clinical scores were calculated by the online learning management system and used to assess the grading rubric after the semester ended.
Criterion-Referenced Behaviors Applied to One Program Outcome
Instrument: Standardized Grading Rubric
Instead of leveling the detailed behavioral objectives, we elected to develop a grading rubric separate from the criterion that classified seven behaviors reflective of any clinical rotation: (a) performance safety and accuracy, (b) supportive cues, (c) coordination, (d) time efficiency, (e) confidence, (f) relating clinical decisions to theoretical knowledge, and (g) client focus versus skill focus (Table 2). Each behavior had four levels marked with a letter grade (A, B, C, or D) using one row of the table. The established criterion-referenced behavioral objectives were measured against the rows in the grading rubric. It was determined that some of the rows in the grading rubric were more appropriate to score certain behavioral objectives. Consequently, behavior objectives were mapped to specific rows in the grading rubric. The online learning system allowed clinical instructors to view only pertinent rows when scoring (Table 1) and elect the letter grade using a four-point range. Safety was an essential focus; therefore, critical indicators remained pass/fail by indicating a minimum of a low B level when scored with the rubric.
Clinical Performance Grading Rubric Rows 1 to 7
Data collection occurred by downloading scores from the online learning management system into an Excel® spreadsheet. After the semester had ended and all pass/fail grades were entered, the overall percentage scores calculated by the learning management system were obtained.
Data were analyzed using SPSS® version 24 software, with the significance level set at p ⩽ .05. Differences in the performance scores were compared, as well as differences between written assignment grades and the final clinical performance. Mean scores were calculated by the learning management system for each of the nine performance subscales. Independent sample t tests and Pearson correlation was performed to compare written assignment grades and final performance scores, and analysis of variance was used to compare repeated measures of midterm and final performance. Cronbach's alpha scores were used to analyze rubric consistency when measuring all nine performance outcomes with the grading rubric. In addition, Cohen's d and post hoc power analysis using G*Power were calculated using the pre–post mean scores.
There was a significant difference between midterm (M = .89) and overall final performance evaluations (M = .94; t = −15.896; p ⩽ .001, two-tailed) showing an increase in final scores in all nine performance outcomes; Cohen's d = .262, 1-β = .628. No correlation was found between written assignments and final performance evaluations, r(56) = .164, p ⩾ .05). There was a significant difference between clinical written work (M = .973) and performance evaluations (M = .915; t = 14.536, p ⩽ .001). The grading rubric produced an overall Cronbach's alpha score of .917 when measured against all nine performance outcomes. The grading rubric produced a normalized bell curve (Figure).
Nine subscales measuring overall clinical performance without written assignments. Note. Pts = points.
The results of this study supported the hypotheses that a reliable assessment method would detect increased scores from mid-term to final evaluation and that no correlation between written assignment scores and clinical performance scores would exist. When using the criterion-referenced behavioral objectives, the standardized grading rubric demonstrated significant improvement in student scores from midterm to final evaluation. Instructors expect to see student clinical performance improve as time and experience progress (DeBrew & Lewallwen, 2014). Internal consistency was found to be excellent within the grading rubric. The normalized bell curve may indicate less grade inflation and therefore less subjective assessment. The lack of correlation between written work and clinical performance supports current literature (Bonnel, 2016; Helminen et al., 2016; O'Connor, 2015; Terry et al., 2017). There is little consistency in how to interpret written work when grading clinical performance, other than addressing diverse learning styles. The clinical instructors acknowledged that the scores accurately measured student performance. When evaluation methods are assessed, as in this study, more effective grading practices are possible.
There were several limitations with this study. The convenience sample (N = 58) of one cohort from one school of nursing limits generalization. We have not yet tested the grading rubric on more than one clinical setting or more than one set of criterion-referenced behavioral objectives. It was fiscally not possible to assess interrater reliability during this study. Finally, it is possible that the lower B score essential on critical indicators may have persuaded clinical instructors to score higher than anticipated on some criterion behaviors, and in future cohorts the requirement will be revised to the C level.
This study supports the paradigm shift toward integrating educational pedagogy with clinical evaluation. Replication of this research in future cohorts, using multiple schools and including interrater assessment, is needed before generalization can be considered. Assuming the slightly small effect size (d = .262) remains consistent, the G*Power analysis revealed that 92 students would be needed to obtain a power of .80. Developing a valid and reliable evaluation method is vital to prevent variability between instructor evaluation approaches and improving student understanding of clinical expectations (Heaslip & Scammel, 2012). Just as Krautscheid et al. (2014) asserted, clinical faculty mentoring is vital. Clinical instructors require guidance to recognize salience in student performance and interpret patterns of clinical competency to ensure valid and reliable grading (Amicucci, 2012; Isaacson & Stacy, 2009).
One standardized performance grading rubric can be adopted for all clinical experiences when pedagogy is linked with behavior. Criterion-referenced clinical objectives do not require painstaking, subjective leveling. Fair grading can equate to consistency and reliability (Bourbonnais et al., 2008; Heaslip & Scammell, 2012), and this performance rubric has the potential to produce fair scores. Critical indicators help identify safe practitioners and supporting pass/fail and letter-grade policies. Students deserve clear direction for their learning needs to ultimately provide safe, effective, professional, patient-centered nursing care.
- Adamson, K.A., Gubrud, P., Sideras, S. & Lasater, K. (2012). Assessing the reliability, validity, and use of the Lasater clinical judgment rubric: Three approaches. Journal of Nursing Education, 51, 66–73. doi:10.3928/01484834-20111130-03 [CrossRef]
- Adamson, K.A. & Kardong-Edgren, S. (2012). A method and resources for assessing the reliability of simulation evaluation instruments. Nursing Education Perspectives, 33, 334–339. doi:10.5480/1536-5026-33.5.334 [CrossRef]
- Amicucci, B. (2012). What nurse faculty have to say about clinical grading. Teaching and Learning in Nursing, 7, 51–55. doi:10.1016/j.teln.2011.09.002 [CrossRef]
- Benner, P. (2012). Educating nurses: A call for radical transformation—how far have we come?Journal of Nursing Education, 51, 183–184. doi:10.3928/01484834-20120402-01 [CrossRef]
- Benner, P., Sutphen, M., Leonard, V. & Day, L. (2010). Educating nurses: A call for radical transformation. San Francisco, CA: Jossey-Bass.
- Bofinger, R. & Rizk, K. (2006). Point systems versus legal system: An innovative approach to clinical evaluation. Nurse Educator, 31, 69–73. doi:10.1097/00006223-200603000-00008 [CrossRef]
- Bonnel, W. (2016). Clinical performance evaluation. In Billings, D. & Halstead, J. (Eds.), Teaching in nursing: A guide for faculty (5th ed., pp. 443–462). St. Louis, MO: Elsevier.
- Bourbonnais, F.F., Langford, S. & Giannantonio, L. (2008). Development of a clinical evaluation tool for baccalaureate nursing students. Nurse Education in Practice, 8, 62–71. doi:10.1016/j.nepr.2007.06.005 [CrossRef]
- Bowie, B.H. (2010). Clinical performance expectations: Using the “you-attitude” communication approach. Nurse Educator, 35, 66–68. doi:10.1097/NNE.0b013e3181ced8be [CrossRef]
- Candela, L. (2016). Theoretical foundations of teaching and learning. In Billings, D. & Halstead, J. (Eds.), Teaching in nursing: A guide for faculty (5th ed., pp. 211–229). St. Louis, MO: Elsevier.
- Clark, M. (2006). Evaluating an obstetric trauma scenario. Clinical Simulation in Nursing, 2, e75–e77. http://dx.doi.org/10.1016/j.ecns.2009.05.028 doi:10.1016/j.ecns.2009.05.028 [CrossRef]
- DeBrew, J.K. & Lewallwen, L.P. (2014). To pass or fail? Understanding the factors considered by faculty in the clinical evaluation of nursing students. Nurse Education Today, 34, 631–636. http://dx.doi.org/10.1016/j.nedt.2013.05.014 doi:10.1016/j.nedt.2013.05.014 [CrossRef]
- DeVon, H.A., Block, M.E., Moyle-Wright, P., Ernst, D.M., Hayden, S.J., Lazzara, D.J. & Kostas-Polston, E. (2007). A psychometric toolbox for testing validity and reliability. Journal of Nursing Scholarship, 39, 155–164. doi:10.1111/j.1547-5069.2007.00161.x [CrossRef]
- Grove, S.K., Burns, N. & Gray, J.R. (2013). The practice of nursing research: Appraisal, synthesis and generation of evidence (7th ed.). St. Louis, MO: Elsevier Saunders.
- Heaslip, V. & Scammell, J.M. (2012). Failing underperforming students: The role of grading in practice assessment. Nurse Education in Practice, 12, 95–100. doi:10.1016/j.nepr.2011.08.003 [CrossRef]
- Helminen, K., Coco, K., Johnson, M., Turunen, H. & Tossavainen, K. (2016). Summative assessment of clinical practice of student nurses: A review of the literature. International Journal of Nursing Studies, 53, 308–319. doi:10.1016/j.ijnurstu.2015.09.014 [CrossRef]
- Isaacson, J.J. & Stacy, A.S. (2009). Rubrics for clinical evaluation: Objectifying the subjective experience. Nurse Education in Practice, 9, 134–140. doi:10.1016/j.nepr.2008.10.015 [CrossRef]
- Killam, L.A., Montgomery, P., Luhanga, F.L., Adamic, P. & Carter, L.M. (2010). Views on unsafe nursing students in clinical learning. International Journal of Nursing Education Scholarship, 7, Article 36. doi:10.2202/1548-923X.2026 [CrossRef]
- Krautscheid, L., Moceri, J., Stragnell, S., Manthey, L. & Neal, T. (2014). A descriptive study of a clinical evaluation tool and process: Student and faculty perspectives [Supplemental material]. Journal of Nursing Education, 53, S30–S33. doi:10.3928/01484834-20140211-02 [CrossRef]
- Lasater, K. (2007). Clinical judgment development: Using simulation to create an assessment rubric. Journal of Nursing Education, 46, 496–503.
- National League for Nursing. (2018). Nurse educator core competency. Retrieved from http://www.nln.org/professional-development-programs/competencies-for-nursing-education/nurse-educator-core-competency
- Nielsen, A. (2016). Concept-based learning in clinical experiences: Bringing theory to clinical education for deep learning. Journal of Nursing Education, 55, 365–371. doi:10.3928/01484834-20160615-02 [CrossRef]
- O'Connor, A.B. (2015). Clinical instruction and evaluation: A teaching resource (3rd ed.). Burlington, MA: Jones & Bartlett Learning.
- Oermann, M.H. & Gaberson, K.B. (2017). Evaluation and testing in nursing education (5th ed.). New York, NY: Springer.
- Oermann, M.H., Yarbrough, S.S., Saewert, K.J., Ard, N. & Charasika, M.E. (2009). Clinical evaluation and grading practices in schools of nursing: National survey findings part II. Nursing Education Perspectives, 30, 352–357.
- Roberts, D. (2011). Grading the performance of clinical skills: Lessons to be learned from performing arts. Nurse Education Today, 31, 607–610. doi:10.1016/j.nedt.2010.10.017 [CrossRef]
- Scanlan, J.M. & Chernomas, W.M. (2016). Failing clinical practice & the unsafe student: A new perspective. International Journal of Nursing Education Scholarship, 13, 109–116. doi:10.1515/ijnes-2016-0021 [CrossRef]
- Seldomridge, L.A. & Walsh, C.M. (2006). Evaluating student performance in undergraduate preceptorships. Journal of Nursing Education, 45, 169–176.
- Shin, H., Shim, K., Lee, Y. & Quinn, L. (2014). Validation of a new assessment tool for a pediatric nursing simulation module. Journal of Nursing Education, 53, 623–629. doi:10.3928/01484834-20141023-04 [CrossRef]
- Shipman, D., Roa, M., Hooten, J. & Wang, Z.J. (2012). Using the analytic rubric as an evaluation tool in nursing education: The positive and the negative. Nurse Education Today, 32, 246–249. doi:10.1016/j.nedt.2011.04.007 [CrossRef]
- Stevens, D.D. & Levi, A.J. (2012). Introduction to rubrics: An assessment tool to save grading time, convey effective feedback, and promote student learning (2nd ed.). Sterling, VA: Stylus.
- Tanicala, M.L., Scheffer, B.K. & Roberts, M.S. (2011). Defining pass/fail nursing student clinical behaviors phase I: Moving toward a culture of safety. Nursing Education Perspectives, 32, 155–161. doi:10.5480/1536-5026-32.3.155 [CrossRef]
- Terry, R., Hing, W., Orr, R. & Milne, N. (2017). Do coursework summative assessments predict clinical performance? A systematic review. BMC Medical Education, 17. doi:10.1186/s12909-017-0878-3 [CrossRef]
Criterion-Referenced Behaviors Applied to One Program Outcomea
|1. Independently provides the total ADL daily care of individual patients and residents.||X||X||X||X||X|
|2. Uses appropriate support personnel including RNs, LPNs, CNAs, and peers as needed.||X||X||X||X|
|3. Gives and receives feedback in a constructive manner and accepts responsibilities for own actions.||X||X|
|4. Works effectively with other individuals within the context of the intra- and interprofessional team—as evidenced by staff reports and student journal entries.||X||X||X||X||X|
|5. Prioritizes and participates in clinical unit responsibilities.||X||X||X||X||X|
|6. Within the context of the intra- and interprofessional team—as evidenced by journal entries.||X||X|
Clinical Performance Grading Rubric Rows 1 to 7
|EXEMPLARY - A||ACCOMPLISHED - B||BEGINNING - C||UNSAFE - D|
|A1. Performs safely and accurately each time behavior is observed.||B1. Performs safely and accurately most of the time behavior is observed.||C1. Performs safely and accurately with close supervision.||D1. Performs in an unsafe manner, or unable to demonstrate appropriate behavior.|
|A2. Never requires supportive cues.||B2. Occasionally requires supportive cues.||C2. Frequently requires supportive cues.||D2. Requires continuous supportive and directive cues.|
|A3. Always demonstrates coordination.||B3. Demonstrates coordination most of the time.||C3. Occasionally demonstrates coordination.||D3. Consistently lacks coordination; Attempts behavior, yet unable to complete.|
|A4. Always utilizes time on activities efficiently.||B4. Spends reasonable time on activities. Able to complete behavior.||C4. Takes longer than reasonable time to complete activities.||D4. Performs activities with considerable delay; activities are disrupted or omitted.|
|A5. Always appears relaxed and confident. Demeanor consistently puts patients or families at ease.||B5. Usually appears relaxed and confident. Occasionally anxious but does not interfere with skills. Patient/family do not question or feel uneasy.||C5. Anxiety occasionally interferes with ability to perform skills; results in questioning or uneasiness in patient/family.||D5. Anxiety interferes with ability to perform skills; results in questioning or uneasiness in patient/family.|
|A6. Applies theoretical knowledge accurately each time while demonstrating critical thinking (making decisions based on client's assessment data).||B6. Applies theoretical knowledge accurately with occasional cues.||C6. Identifies principles of theoretical knowledge, but needs direction to identify application.||D6. Applies theoretical knowledge principles inappropriately.|
|A7. Consistently focuses on client during skills without cues.||B7. Focuses on client initially without cues, as complexity increases, focuses on skills.||C7. Focuses on client initially with cues, as complexity increases, focuses on skills.||D7. Focuses on activities or own behaviors, not on client.|