Journal of Nursing Education

Research Briefs 

Writing Across the Curriculum: Reliability Testing of a Standardized Rubric

Margo Minnich, DNP, RN; Amanda J. Kirkpatrick, MSN, RN; Joely T. Goodman, MSN, RN; Ali Whittaker, EdD, RN; Helen Stanton Chapple, PhD, RN, CT; Anne M. Schoening, PhD, RN, CNE; Maya M. Khanna, PhD

Abstract

Background:

Rubrics positively affect student academic performance; however, accuracy and consistency of the rubric and its use is imperative. The researchers in this study developed a standardized rubric for use across an undergraduate nursing curriculum, then evaluated the interrater reliability and general usability of the tool.

Method:

Faculty raters graded papers using the standardized rubric, submitted their independent scoring for interrater reliability analyses, then participated in a focus group discussion regarding rubric use experience.

Results:

Quantitative analysis of the data showed a high interrater reliability (α = .998). Content analysis of transcription revealed several positive themes: Consistency, Emphasis on Writing Ability, and Ability to Use the Rubric as a Teaching Tool. Areas for improvement included use of value words and difficulty with point allocation.

Conclusion:

Investigators recommend effective faculty orientation for rubric use and future work in developing a rubric to assess reflective writing. [J Nurs Educ. 2018;57(6):366–370.]

Abstract

Background:

Rubrics positively affect student academic performance; however, accuracy and consistency of the rubric and its use is imperative. The researchers in this study developed a standardized rubric for use across an undergraduate nursing curriculum, then evaluated the interrater reliability and general usability of the tool.

Method:

Faculty raters graded papers using the standardized rubric, submitted their independent scoring for interrater reliability analyses, then participated in a focus group discussion regarding rubric use experience.

Results:

Quantitative analysis of the data showed a high interrater reliability (α = .998). Content analysis of transcription revealed several positive themes: Consistency, Emphasis on Writing Ability, and Ability to Use the Rubric as a Teaching Tool. Areas for improvement included use of value words and difficulty with point allocation.

Conclusion:

Investigators recommend effective faculty orientation for rubric use and future work in developing a rubric to assess reflective writing. [J Nurs Educ. 2018;57(6):366–370.]

The Essentials of Baccalaureate Education for Professional Nursing Practice indicates that nurses must be able to effectively communicate in a written format (American Association of Colleges of Nursing, 2008). Writing serves as a foundation for critical thinking, assists with exploring and acquiring new knowledge, and engages students in creative thinking and problem solving about specialized topics and areas of interest in the nursing discipline. Therefore, scholarly writing is an important skill that nurse educators must develop and foster throughout the nursing curriculum. To accomplish this, nurse educators must provide students with appropriate and meaningful feedback for all written assignments.

Accurate and consistent evaluation of student written assignments within courses and across the curriculum can be a daunting task as individual faculty often develop their own criteria for paper content and writing standards. This can lead to a lack of uniform understanding of assignment requirements, inconsistent grading practices, and ineffective feedback to students (O'Donnell, Oakley, Haney, O'Neill, & Taylor, 2011). Unclear expectations and goals can be major barriers to providing students with useful feedback (Archer, 2010). The current literature has focused on the use of grading rubrics in individual courses; however, there is a paucity of research evaluating the use of rubrics to improve student writing across an educational curriculum. In this manuscript, we describe the development, implementation, and evaluation of a standardized rubric for writing assignments in our baccalaureate nursing program.

Background

Grading rubrics are tools used to evaluate student writing (Panadero & Jonsson, 2013) by providing specific criteria for an assignment. A well-written grading rubric includes concise criteria for performance, a rating scale, and a specific description of the expected student performance at each rating level (Greenberg, 2012; Shipman, Roa, Hooten, & Wang, 2012). Rubrics provide clear expectations for both students and faculty and serve as an objective method of grading written work (Howell, 2014).

Direct and meaningful feedback to students using a rubric is critical in improving student writing (Jonsson, 2014; Panadero & Jonsson, 2013; Shipman et al., 2012; Solan & Linardopoulos, 2011). In a systematic review of the use of scoring rubrics in formative assessment, Panadero and Jonsson (2013) concluded that grading rubrics allow students to learn more effectively. Greenberg (2012) found that students in both introductory and advanced courses improved writing performance when rubrics were used for the assignment. Students report that rubrics help them focus their writing and feel less anxious about an assignment, resulting in higher academic performance (Reddy & Andrade, 2010).

Development of a Standardized Rubric

Historically, nursing faculty teaching in our baccalaureate nursing program developed their own criteria for paper content and writing standards. Faculty at different levels of the curriculum used rubrics and grading criteria that provided varying levels of guidance and expectations for students. This resulted in confusion regarding paper expectations, grader bias, and student frustration. Furthermore, this process was ineffective in promoting writing skill as students advanced through the curriculum and made it impossible to assess improvements in writing over time. To address these issues, interested faculty formed a taskforce to improve student feedback and standardize grading criteria across the curriculum.

The taskforce began by collecting the rubrics and grading criteria used to evaluate written assignments in all undergraduate nursing courses. The rubrics varied widely in appearance, level of detail, and standard of achievement. However, five common assessment categories emerged from this review. The categories included:

  • Content.
  • Organization and writing skill.
  • Grammar, punctuation, and spelling.
  • American Psychological Association (APA) formatting style.
  • The effective use of scholarly references.

In addition, the taskforce acknowledged that a standardized rubric must meet several requirements of our nursing program and the university. First, the rubric needed to align with the program and university assessment plans. Second, the rubric needed to be adaptable for use within the university learning management system. Finally, the rubric needed to meet the needs of faculty across all levels and courses, in turn providing flexibility for the variety of courses and assignments within the curriculum.

To meet these needs, the taskforce developed a rubric in a format using the five identified assessment categories and three levels of performance (Table 1). Because the content requirements for each paper vary widely among courses, we left the content section open for faculty to tailor to the specific courses and assignments as necessary. The taskforce considered each of the remaining assessment categories and developed specific criteria to demonstrate achievement at each of the levels.

Organization and Writing Skilla

Table 1:

Organization and Writing Skill

Members of the taskforce piloted the standardized rubric in their courses. The graders who used the rubric were queried using an anonymous online survey assessing ease of use and suggestions for improvement. Overall, the comments were positive and faculty members were receptive to the use of the new rubric, with few suggestions for change. The taskforce made minor edits to the rubric and presented the rubric to the curriculum committee. The curriculum committee then approved the rubric for assessment of all writing assignments in all undergraduate courses.

Evaluation of the Standardized Rubric

After implementation of the rubric into all undergraduate courses, the taskforce set out to determine interrater reliability and general usability of the rubric using a mixed-methods study design. We used quantitative measures to determine if faculty graders used the grading rubric consistently across papers. We used qualitative measures to explore perceived positive and negative aspects of the rubric and to solicit input regarding possible changes, clarification, and training to promote consistent use of the rubric. We obtained institutional review board exempt status for the study.

To evaluate interrater reliability and solicit feedback from faculty, we used papers graded with the standardized rubric in a junior-level nursing course. After completion of the course and posting of final grades, the course coordinator selected six papers representing a range of grades. The course leader blinded these papers, removing all student-identifying information, faculty comments, and scoring. One paper was reserved for rubric calibration, and the remaining five were used to examine interrater reliability. We modified the standardized rubric for use in this study by removing the content portion of the rubric (as this section varies widely among courses), leaving four remaining sections to be analyzed:

  • Organization and writing skill.
  • Grammar, punctuation, and spelling.
  • APA formatting.
  • The effective use of scholarly references.

We recruited 12 faculty raters for the study. Raters received lunch and a $20 gift card incentive for their voluntary participation in a rubric calibration session, evaluation of five student papers, and participation in a 1-hour focus group. The voluntary raters included faculty from both of our campuses and across all programs. The raters were graduate and undergraduate nursing faculty from a variety of specialties who were either full-time, part-time, or special (adjunct) faculty. Some of the volunteers had used the rubric in student assessment and others were unfamiliar with the rubric.

We began with a rubric calibration session. All raters were present during rubric calibration and each rater received printed copies of the six blinded papers, each with an attached standardized rubric. The session began with a comprehensive review of the standardized rubric. Participants then collectively graded the first paper, discussing the point allocation and developing a consensus on the meaning of the criteria and scoring for each level of achievement. The paper used in calibration was excluded from interrater analysis.

The volunteer raters were then dismissed to grade the remaining five papers on their own. At the end of a 2-week period, the group reconvened to submit their completed rubrics and participate in a 1-hour focus group. A representative of the research team facilitated the focus group using a list of predetermined questions to generate discussion among the faculty regarding their experiences in using the rubric. The discussion was captured with an audio-recorder and transcribed verbatim. Content analysis was used to analyze the focus group data using the approach suggested by Creswell (1998).

Quantitative Results

In order to test whether faculty raters were consistent in their use of the grading rubrics across papers, we examined the inter-rater reliability for the scores given across the four sections of the five sample papers using a Cronbach's alpha test. Initially, a summary test across all the 12 raters found a very high-degree of interrater reliability, (12 raters; α = .998). Separate reliability analyses were then performed across the assessment categories of the rubric that were evaluated on different numeric scales. For example, the organization and writing skill, as well as the grammar, punctuation, and spelling components, were rated on 5-point scales, whereas the APA formatting and scholarship and references components were rated on 10-point scales; thus, a separate reliability analysis was conducted for these ratings to verify the reliability of raters within each of these scales. This analysis also yielded a high degree of reliability across raters. For the 5-point scales (organization and writing; grammar, punctuation, and spelling), there was an α = .912 across the 12 raters. For the 10-point scales (APA formatting; scholarship and references) there was an α = .920 across the 12 raters. In addition, there was little difference in the variability of ratings across the five papers and across the four assessment categories of each paper.

We then conducted a correlation analysis to examine whether there were any raters who differed from the others. We found the pairwise correlations of ratings across all raters were very highly correlated, indicating that no one rater differed from the others in the values given to the papers across the assessment categories (Table 2).

Interrater Correlationsa

Table 2:

Interrater Correlations

Qualitative Analysis

Consistency

The first theme identified by focus group members was that the grading rubric provided a consistent means of grading papers. Members discussed the perceived differences that exist between faculty graders. One faculty member stated:

I've seen it where you got [sic] five graders and [students] are really hoping that they get Grader A because Grader A always gives As.

Group members agreed that using the rubric decreased differences between faculty graders. The consistency of the rubric was viewed as being “more fair” to the students.

Emphasis on Writing Ability

Focus group members found that the grading rubric provided a balanced assessment of all areas of the students' writing. Most agreed that because APA can be objectively measured, it is often emphasized more than actual writing ability. As one faculty member pointed out:

I think we need to be very concerned about organization, writing skill, and grammar more so than APA. I'm very focused on [being able to assess] writing ability because as you develop your writing skill, you develop your oral communication skills.

Use as a Teaching Tool

The third theme identified the value of the rubric as a teaching tool, as students have clear expectations provided to them prior to writing their paper:

It really helps them in their writing skill set and their ability to see what is [an] exemplary paper, what [is] satisfactory or unsatisfactory. So, as a teaching tool, I like it.

Focus group participants also identified how the rubric can be used to guide the faculty–student conversation when students are dissatisfied with their paper grade. The objective rubric criteria can be used to help students understand the writing process and improve their writing ability. As one faculty member suggested:

So you can say to the student, “let's walk through the rubric. Let's walk through your paper. I'll explain what I did. You tell me what you did then let's discuss it.” It gives them some great insight.

Difficulty With Point Allocation

Two themes that suggested limitations related to using the rubric emerged. There was consensus among group members that it was often difficult to determine specific point allocations for each level of performance. Because the rubric provides a range of possible scores, faculty had difficulty in determining how many points to award or deduct on a specific section. One rater identified this challenge:

I think I struggled most with the points. I had a hard time determining 4.6 versus 4.1 and I don't know...it caused me a little anxiety.

Use of Value Words

Several focus group members pointed out several “value words” that were used in the rubric. Phrases such as “poorly developed” contain an adjective that does not add to the overall clarity of the description of the criteria. One faculty grader summarized the group discussion with:

We use lots of adjectives when I don't think they're necessary.... What you want in those areas are clearly identified by a number and I don't think you have to add a value word to it.... You've got other things in there that clearly state what unsatisfactory is.

Discussion

The use of rubrics is thought to make assessment of student writing more objective and consistent; however, prior research has demonstrated limitations associated with the use of rubrics (Oakleaf, 2009; Solan & Lindardopoulos, 2011; Stellmack, Konheim-Kalkstein, Manor, Massey, & Schmitz, 2009). Among common limitations, poorly written criteria and unclear rating systems have been identified (Oakleaf, 2009; Shipman et al., 2012). Studies have found significant grader bias and lack of interrater reliability among faculty using rubrics (Oakleaf, 2009; Stellmack et al., 2009).

The current study examined whether a standardized rubric could be used consistently across faculty graders. Nursing faculty were asked to grade sample papers using the standardized rubric we developed. We found that not only were faculty raters able to use the rubric with ease, they did so very consistently. There was a high-degree of interrater reliability across all assessment categories of the rubric.

Confounding factors and potential bias existed in our study. Our tests for reliability were limited in that the raters in the study knew that their judgments would have no effect on student grades. This fact may have caused them to use the rubric differently in evaluating the sample papers than they might have in a normal grading situation. Further, the raters were not asked to evaluate the content of the student papers, only the remaining assessment categories. Omitting content may have altered the raters' approach to the task and their use of the rubric.

The focus group provided helpful feedback in making adjustments and offering guidance in point allocation. Reliability of the rubric will be maintained by ongoing calibration of the rubric within individual course groups. Interrater consistency will be sustained by faculty orientation to the rubric's use along with continued faculty development.

Conclusion

This article describes the development, implementation, and evaluation of a standardized rubric for writing assignments in our baccalaureate nursing program. Specifically, we examined whether the standardized rubric could be used easily and consistently by faculty. There was a general consensus among faculty that the rubric was easy to use and provided a means for objective assessment of student writing. Furthermore, a high degree of interrater reliability was found across all rubric assessment categories. Future development will involve leveling the rubric so that it measures improvement for students as they progress through the program. The rubric will also be adapted for use with other assignments, such as reflective writing.

References

  • American Association of Colleges of Nursing. (2008). The essentials of baccalaureate education for professional nursing practice. Retrieved from http://www.aacn.nche.edu/education-resources/BaccEssentials08.pdf
  • Archer, J. (2010). State of science in health professional education: Effective feedback. Medical Education, 44, 101–108. doi:10.1111/j.1365-2923.2009.03546.x [CrossRef]
  • Creswell, J. (1998). Qualitative inquiry and research design: Choosing among five traditions. London, UK: Sage.
  • Greenberg, K. (2012). A reliable and valid weighted scoring instrument for use in grading APA-style empirical research report. Teaching of Psychology, 39, 17–23. doi:10.1177/0098628311430643 [CrossRef]
  • Howell, R. (2014). Grading rubrics: Hoopla or help?Innovations in Education and Teaching International, 51, 400–410. doi:10.1080/14703297.2013.785252 [CrossRef]
  • Jonsson, A. (2014). Rubrics as a way of providing transparency in assessment. Assessment & Evaluation in Higher Education, 39, 840–852. doi:10.1080/02602938.2013.875117 [CrossRef]
  • Oakleaf, M. (2009). Using rubrics to assess information literacy: An examination of methodology and interrater reliability. Journal of the American Society for Information Science and Technology, 60, 969–983. doi:10.1002/asi.21030 [CrossRef]
  • O'Donnell, J., Oakley, M., Haney, S., O'Neill, P. & Taylor, D. (2011). Rubrics 101: A primer for rubric development in dental education. Journal of Dental Education, 75, 1163–1175.
  • Panadero, E. & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: A review. Education Research Review, 9, 129–144. doi:10.1016/j.edurev.2013.01.002 [CrossRef]
  • Reddy, Y. & Andrade, H. (2010). A review of rubric use in higher education. Assessment & Evaluation in Higher Education, 35, 435–448. doi:10.1080/02602930902862859 [CrossRef]
  • Shipman, D., Roa, M., Hooten, J. & Wang, Z. (2012). Using the analytic rubric as an evaluation tool in nursing education: The positive and negative. Nurse Education Today, 32, 246–249. doi:10.1016/j.nedt.2011.04.007 [CrossRef]
  • Solan, A. & Linardopoulos, N. (2011). Development, implementation, and evaluation of a grading rubric for online discussions. MERLOT Journal of Online Learning and Teaching, 7, 1–12.
  • Stellmack, M., Konheim-Kalkstein, Y., Manor, J., Massey, A. & Schmitz, J. (2009). An assessment of reliability and validity of a rubric for grading APA-style introductions. Teaching of Psychology, 36, 102–107. doi:10.1080/00986280902739776 [CrossRef]

Organization and Writing Skilla

Outstanding: 91% to 100% (A)

Paper flows logically; purpose and development of ideas clear and adequate; includes formal introduction, body, and conclusion.
Conceptual clarity is evident throughout; paragraphs are well focused and organized, allowing for consistency of idea and topic.
Paragraphs demonstrate logical connection between thoughts and ideas, using appropriate transition statement between ideas.

Satisfactory: 75% to 90% (B–C)

Purpose and ideas need further development and articulation; introduction, body, and conclusion not fully developed.
Conceptual clarity is evident; paragraphs focused, but improvement in organization of ideas and topics is needed.
Connection between thoughts and ideas evident, but can be improved; transition statements not always clear.

Unsatisfactory: 0% to 74% (D–F)

Purpose is unclear; ideas poorly developed and articulated; missing conclusion or final summary.
Conceptual clarity absent; paragraphs contain multiple ideas and topics.
Paragraphs do not demonstrate logical connection between thoughts and ideas; lacks appropriate transition statement between ideas.

Interrater Correlationsa

Grader 1Grader 2Grader 3Grader 4Grader 5Grader 6Grader 7Grader 8Grader 9Grader 10Grader 11Grader 12
Grader 11.000.978.971.934.972.972.961.972.954.976.981.978
Grader 21.000.982.953.974.980.976.984.970.987.986.984
Grader 31.000.955.994.994.976.991.980.981.992.988
Grader 41.000.957.975.987.979.986.981.968.978
Grader 51.000.993.978.988.982.983.993.989
Grader 61.000.991.997.991.992.995.995
Grader 71.000.991.995.994.989.993
Grader 81.000.993.994.991.997
Grader 91.000.989.987.993
Grader 101.000.993.996
Grader 111.000.994
Grader 121.000
Authors

Dr. Minnich, Ms. Kirkpatrick, Ms. Goodman, and Dr. Whittaker are Assistant Professors, Dr. Stanton Chapple and Dr. Schoening are Associate Professors, College of Nursing, and Dr. Khanna is Professor, College of Arts and Sciences, Creighton University, Omaha, Nebraska.

The authors have disclosed no potential conflicts of interest, financial or otherwise.

The authors thank the following colleagues who participated in their research: Marilee Aufdenkamp, Becky Davis, Amy Cosimano, Cindy Costanzo, Megan Gunnell, Ann Harms, Lindsay Iverson, Dana Koziol, Anna Nolette, Meghan Potthoff, and Nancy Shirley. The authors also thank the Office of Academic Excellence and Assessment at Creighton University. Without their generous support, this project would not have been possible.

Address correspondence to Margo Minnich, DNP, RN, Assistant Professor, College of Nursing, Creighton University, 2500 California Plaza, Omaha, NE 68178; e-mail: margominnich@creighton.edu.

Received: November 30, 2017
Accepted: January 17, 2018

10.3928/01484834-20180522-08

Sign up to receive

Journal E-contents