Journal of Nursing Education

The articles prior to January 2012 are part of the back file collection and are not available with a current paid subscription. To access the article, you may purchase it or purchase the complete back file collection here

EDITORIAL 

Measurement and Evaluation in Nursing Education

Christine A Tanner, PhD, RN, FAAN

Abstract

n this issue of JNE you will find articles addressing a potpourri of issues related to measurement and evaluation: using external examiners for quality assurance in graduate education, improving student performance on examinations by providing feedback, developing of valid and reliable measures of critical thinking, and some innovative approaches to assessing student learning. These are all critical issues facing faculty in their day-to-day teaching practices. I think that you will find some provocative and useful ideas in this month's JNE.

As the Editor of this Journal, I have the opportunity to review more than 300 manuscripts a year, giving me a bird's-eye view of what is capturing the intellectual energy of nursing faculty and what is perhaps being overlooked. Here are two areas that need additional attention:

Quality of instruments used in nursing education research. Anyone who has completed the basic undergraduate course in nursing research knows that it is important to evaluate the reliability and validity of instruments used to measure key study variables. Yet in the press, to study the effectiveness of our educational practices, we seem to overlook some of these basic lessons. I can understand the argument that we may need to sacrifice some degree of precision in measurement in order to evaluate our practices in a timely way. But, if we want to disseminate the findings of our study, we must provide evidence that we are consistently measuring the constructs we intend to measure. In other words, the credibility and usefulness of findings from research studies depend, in great part, on the reliability and validity of the measures used in data collection.

As we go about planning our studies, there are two important questions to pose when selecting or constructing instruments:

1) "To what extent will the interpretation of the scores be appropriate, meaningful, and useful for the intended application of the results? and

2) What are the consequences of the particular uses and interpretations that are made of the results?" (Linn and Gronlund, 1995).

All too often, manuscripts reporting important and worthwhile studies fall short of the mark because of failure to answer these questions. Here are some common errors in study design, and/or reporting:

* Instruments were selected because of "established" validity or "validity was established" by review of an expert panel. The claim of established validity sends our reviewers right up the wall! A widely accepted definition of validity was offered by Messick: "Validity is an overall evaluative judgment, founded on empirical evidence and theoretical rationales of the adequacy and appropriateness of inferences and actions based on test scores." (1988, p. 33). Validity is never established; several kinds of evidence must be accumulated to adequately estimate it. As Goodwin pointed out, "Validity is a matter of degree and largely situation-specific." (1997, p. 107).

* An "investigator-developed" questionnaire was used with inadequate description of techniques to develop it or evaluate it. An instrument is typically developed to measure a domain of knowledge, attitude, skill, etc; the amount ofthat "domain" an individual possesses is inferred from a sample of items. The domain may be defined through concept analysis or qualitative study; then items are generated that represent the domain and evaluated by a panel of experts. While this socalled "content-validation" is an important part of instrument development and is frequently reported, it may be inadequate to support the types of inferences the investigators wish to draw from the data. Other kinds of evidence should be provided.

* Factor analysis was used with small sample sizes and/or with dichotomous items. Frequently, because of the difficulty in accumulating other empirical evidence for the validity of measure,…

n this issue of JNE you will find articles addressing a potpourri of issues related to measurement and evaluation: using external examiners for quality assurance in graduate education, improving student performance on examinations by providing feedback, developing of valid and reliable measures of critical thinking, and some innovative approaches to assessing student learning. These are all critical issues facing faculty in their day-to-day teaching practices. I think that you will find some provocative and useful ideas in this month's JNE.

As the Editor of this Journal, I have the opportunity to review more than 300 manuscripts a year, giving me a bird's-eye view of what is capturing the intellectual energy of nursing faculty and what is perhaps being overlooked. Here are two areas that need additional attention:

Quality of instruments used in nursing education research. Anyone who has completed the basic undergraduate course in nursing research knows that it is important to evaluate the reliability and validity of instruments used to measure key study variables. Yet in the press, to study the effectiveness of our educational practices, we seem to overlook some of these basic lessons. I can understand the argument that we may need to sacrifice some degree of precision in measurement in order to evaluate our practices in a timely way. But, if we want to disseminate the findings of our study, we must provide evidence that we are consistently measuring the constructs we intend to measure. In other words, the credibility and usefulness of findings from research studies depend, in great part, on the reliability and validity of the measures used in data collection.

As we go about planning our studies, there are two important questions to pose when selecting or constructing instruments:

1) "To what extent will the interpretation of the scores be appropriate, meaningful, and useful for the intended application of the results? and

2) What are the consequences of the particular uses and interpretations that are made of the results?" (Linn and Gronlund, 1995).

All too often, manuscripts reporting important and worthwhile studies fall short of the mark because of failure to answer these questions. Here are some common errors in study design, and/or reporting:

* Instruments were selected because of "established" validity or "validity was established" by review of an expert panel. The claim of established validity sends our reviewers right up the wall! A widely accepted definition of validity was offered by Messick: "Validity is an overall evaluative judgment, founded on empirical evidence and theoretical rationales of the adequacy and appropriateness of inferences and actions based on test scores." (1988, p. 33). Validity is never established; several kinds of evidence must be accumulated to adequately estimate it. As Goodwin pointed out, "Validity is a matter of degree and largely situation-specific." (1997, p. 107).

* An "investigator-developed" questionnaire was used with inadequate description of techniques to develop it or evaluate it. An instrument is typically developed to measure a domain of knowledge, attitude, skill, etc; the amount ofthat "domain" an individual possesses is inferred from a sample of items. The domain may be defined through concept analysis or qualitative study; then items are generated that represent the domain and evaluated by a panel of experts. While this socalled "content-validation" is an important part of instrument development and is frequently reported, it may be inadequate to support the types of inferences the investigators wish to draw from the data. Other kinds of evidence should be provided.

* Factor analysis was used with small sample sizes and/or with dichotomous items. Frequently, because of the difficulty in accumulating other empirical evidence for the validity of measure, investigators use factor analysis as the sole evidence. It provides information about the underlying structure of the measure. Factor analysis requires large sample sizes to have stable results; moreover, it assumes that the items are continuous rather than dichotomous (Goodwin, 1997).

* The investigator did not acknowledge limitations in validity evidence for the measures they used. There are, of course, times when we must proceed with studies even before we have what we feel would be adequate evidence for an instrument's validity. That does not preclude publication of the results, if the limitations in the evidence are acknowledged.

Focus on assessment of program outcomes. It is no news that accrediting bodies are increasingly requiring educational programs to provide evidence that their students are meeting the program's specific objectives for student learning. Both the National League for Nursing Accrediting Commission (NLNAC) and the Commission on Collegiate Nursing Education (CCNE) require evidence of student achievement on required and selected outcomes (NLNAC) or expected results (CCNE). Moreover, both accrediting bodies require that there be evidence that the student achievement results be used to change or improve the curriculum. As a result of these requirements, we have received an unprecedented number of manuscripts that address measurement of attainment of learning outcomes, particularly in the areas mandated by NLNAC.

But there is another kind of assessment that could easily be overlooked in our zeal to study learning outcomesthat is the extent to which the "best practices" that support learning are in place. Of course, accreditation criteria include the traditional input and process standards about organization, administration, faculty, students, resources, what Wellman (2000) refers to as accountability measures. But what about evaluation of the teaching practices thought to contribute most to the attainment of learning outcomes? To actually use findings from student performance appraisals, some assessment of the teaching practices would seem to be necessary.

Interestingly, there is a move afoot to do just thatmeasure student learning experiences and satisfaction, assessing the extent to which best teaching practices are used, at individual 4-year colleges across the US. The National Survey of Student Engagement will sample undergraduates from 750 colleges and will provide data to establish national benchmarks for different types of institutions (Gose, 1999).

The survey draws on extensive research by Alexander Astin, Arthur Chickering, and John Gardner on practices that support student learning. For example, the survey asks students how many times they were required to make a class presentation, worked with classmates outside of class to prepare class assignments, talked about career plans with a faculty member or adviser, worked with a faculty member on a research project, or had serious discussions with students of a different race or ethnicity than their own. It also asks whether they would. attend the same institution a second time.

JNE receives very few manuscripts addressing important aspects of program evaluation, and even fewer that go beyond what is currently required for accreditation. But if we truly use data to improve our educational programs, we need to develop approaches that evaluate what practices are used, how they relate to "best practices," how they affect learning outcomes.

REFERENCES

  • Goodwin, L.D. (1997). Changing conceptions of measurement validity. Journal of Nursing Education, 36(3), 102-107.
  • Gose, B. (1999). A new survey of 'good practices' could be an alternative to rankings. The Chronicle of Higher Education, 46, A-, 65. October 22, 1999. [On-line]. Available: http://www.chronicle.com, accessed November 30, 2000.
  • Linn, RL. & Gronlund, N.E. (1995). Measurement and assessment in teaching. (7th ed). Englewood Cliffs, NJ: Prentice Hall.
  • Messick, S. (1988). The once and future uses of validity: Assessing the meaning and consequences of validity. In H. , Wainer & H.I. Braun (Eds.), Test validity, pp. 33-45. Hillsdale, NJ: Lawrence Erlbaum.
  • Wellman, J. (2000). Accreditors have to see past "learning outcomes." The Chronicle of Higher Education, 47, B-20. September 22, 2000. [On-line]. Available: http://www.chronicle.com, accessed November 30, 2000.

10.3928/0148-4834-20010101-03

Sign up to receive

Journal E-contents