Journal of Nursing Education

The articles prior to January 2012 are part of the back file collection and are not available with a current paid subscription. To access the article, you may purchase it or purchase the complete back file collection here

Assessment of Multiple-Choice Questions in Selected Test Banks Accompanying Text Books Used in Nursing Education

Joan C Masters, MA, MBA, RN; Barbara S Hulsmeyer, EdD, RN; Mary E Pike, MSN, RN; Kathy Leichty, MSN, RN; Margaret T Miller, PhD, RN; Amy L Verst, MSN, PNP

Abstract

ABSTRACT

The purpose of this study was to assess multiple-choice questions used in test-banks accompanying selected nursing textbooks. A random sample of 2913 questions was selected from a convenience sample of 17 test banks. Questions were evaluated on (a) adherence to generally accepted guidelines for writing multiple-choice questions; (b) cognitive level as defined by Bloom's (1961) taxonomy; and (c) distribution of correct answers as A, B, C, or D. The results were 2233 violations of item-writing guidelines, most of which were minor but some were serious. A large number of questions (47.3%) were written at the knowledge level and only 6.5% were written at the analysis level. The correct answers were evenly distributed: C2S ranged from 0.00 to 4.84; chi square value needed to reach .05 probability was 26.30. Faculty are encouraged to evaluate multiple-choice questions from test banks carefully before using them for exams.

Abstract

ABSTRACT

The purpose of this study was to assess multiple-choice questions used in test-banks accompanying selected nursing textbooks. A random sample of 2913 questions was selected from a convenience sample of 17 test banks. Questions were evaluated on (a) adherence to generally accepted guidelines for writing multiple-choice questions; (b) cognitive level as defined by Bloom's (1961) taxonomy; and (c) distribution of correct answers as A, B, C, or D. The results were 2233 violations of item-writing guidelines, most of which were minor but some were serious. A large number of questions (47.3%) were written at the knowledge level and only 6.5% were written at the analysis level. The correct answers were evenly distributed: C2S ranged from 0.00 to 4.84; chi square value needed to reach .05 probability was 26.30. Faculty are encouraged to evaluate multiple-choice questions from test banks carefully before using them for exams.

ien a new faculty member asked two of the authors to review an exam on which many students did poorly, several problems with the multiple-choice question test were identified immediately. These included choppy sentence structure, ambiguous questions, two correct answers for some questions, and no apparent correct answer for others. The new faculty member explained that in trying to write an exam with good questions she had relied heavily on the test bank that accompanied the text. Experience had taught us that although test banks were useful in generating questions, they were sometimes poorly written. The purpose of this study was to assess multiple-choice questions used in test-banks accompanying selected nursing textbooks.

Multiple-choice question tests have several advantages. Multiple-choice questions are efficient, objective, easy to grade, and can be used to test a variety of content. They also help to familiarize students with the format of the NCLEX-RN examination (Parley, 1989b). WhUe multiple-choice questions also have some disadvantages (they may encourage guessing and questions may be prone to misinterpretation), they remain the most common type of exam question used in undergraduate nursing courses (Parley, 1989b). However, multiple-choice questions are tedious to construct. It has been estimated that it requires one hour to write a good multiple-choice question (Parley, 1989b). Test banks can save substantial faculty time in preparing exams. It is evident from textbook marketing that publishers perceive that faculty want test banks as part of the textbook package. Some publishers even promote the use of computer programs that allow questions to be directly run from a disk, with no editing by faculty.

REVIEW OF THE LITERATURE

Tests are critical in evaluating student knowledge. Ideally, the use of questions from a test bank would help ensure congruence between the exam and the text. However, the content and structure of test bank questions are, in some cases, questionable (Clute & McGrail, 1989). In a review of the nursing literature using the database CINAHL, we were unable to find any literature that assessed the content and structure of questions from test banks used in nursing. Using the ERIC (education), PSYCHLIT (psychology), and MEDLINE (medicine) databases we did find several articles in other disciplines that addressed this issue.

Principles of Multiple-Choice Item Construction

It is important that test questions are well written. Test grades affect student opportunities for continuation in an educational program, awarding of honors, employment, and graduate school admission (Clute & McGr ail, 1989). If tests are biased because of poorly written questions, faculty evaluations of student competency will be distorted. However, there is little empirical support for generally accepted item-writing guidelines. Guidelines for writing test questions are generally baaed on "experience and wisdom" rather than research (Crehan & Haladyna, 1991, p. 183).

Crehan and Haladyna (1991) examined the validity of the popular recommendation to put the stem in the form of a question rather than in a sentence-completion format. In an experimental design using alternate forms of multiple-choice exams, they found no evidence to support the use of writing the stem in the form of a question rather than using a sentence-completion format. Violato (1991) also examined the validity of using a question format compared to a sentence-completion format. In an experimental study with alternate forms of questions, no difference in difficulty or discrimination was found.

The use of none of the above and the use of fewer than the traditional four or five options have also been studied. Using a repeated measures design and alternate forms of exam questions, it was found that none of the above increased exam difficulty but did not affect discrimination (Crehan, Haladyna & Brewer, 1993). It was not less discriminating using three options than using four options, although difficulty was decreased. The use of three options is supported; it decreases writing and administration time and decreases the possibility of writing weak options that violate good item-writing rules.

Authors of educational psychology texts that are instructing readers on writing test questions may also have difficulty writing good test bank questions. Ellsworth, Dunnell, and Duell (1990) reviewed 42 educational psychology textbooks for guidelines on writing test questions, 32 of which had clear guidelines. From these 32 textbooks they developed 37 test-question-writing guidelines. They reduced this set of guidelines from 37 to 12. A guideline was retained if it was (a) recommended by more than 50% of authors, (b) did not require excessive text book review, and (c) was associated with test-wiseness strategies.

Using these guidelines, they reviewed 60 items from each of 18 educational psychology test banks (n=l,080). Only 40% of items did not violate guidelines. The most common problems were errors in grammar and redundant material in the options. Other problems were overuse of "C" as the correct option, use of all of the above, and none of the above, use of negatives, clues to the correct answer in the stem, and use of specific determiners such as always and never.

Levels of Objectives

In a professional discipline such as nursing, students are required to process large amounts of complex information and to act responsibly upon that information. Test questions should be written to reflect the level of sophistication at which students are expected to practice; it is unrealistic to think that students who take exams requiring rote recall (knowledge level) can perform in the clinical area at levels requiring higher levels of thinking. Assessment in the form of test questions should be congruent with faculty expectations of student learning (Demetrulias & McCubbin, 1982; King, 1978). Exam questions written at Bloom's knowledge and comprehension levels cannot adequately assess student ability.

Bloom's taxonomy of educational objectives has been widely used in nursing education as a framework to cíassify educational objectives and test questions. The six levels in the taxonomy are knowledge, comprehension, application, analysis, synthesis, and evaluation (Bloom, 1961 as cited by Demetrulias & McCubbin, 1982). Only the first four levels can be tested with multiple-choice questions.

We found no research-based guidelines on what percent of questions should be written at each level. Levels of questions should probably vary with the course, with more low level questions used in lower level courses and higher level questions in higher level courses. The NCLEX-RN test plan uses questions at the knowledge, comprehension, application, and analysis level with more questions at the application and analysis level (Parley, 1989a). NCLEX-RN does not assess graduates at the synthesis and evaluation level; these cognitive levels cannot be assessed with multiple-choice questions. Good tests, particularly in upper level courses, should use more highlevel domain questions. Since this cannot be done exclusively with multiple-choice questions, other ways of assessment (e.g., essay questions, case studies, and class discussion) to evaluate higher-level thinking should be considered.

Placement of Answers

Clute and McGrail (1989) studied whether there was bias in the placement of correct answers in multiplechoice test bank questions. They reviewed 8 test banks from cost accounting textbooks for bias in placement of options. Chi square analysis indicated 3 of the 8 test banks were significantly biased (p =.10). In only 1 of 8 test banks were answers essentially evenly distributed. In a test bank that used 5 options "E" only appeared as the correct choice 5% of the time when each option would be expected to be used 20% of the time. In 7 of the 8 test banks, there were some placement bias (Clute & McGrail, 1989).

METHOD

Sample

Questions from a convenience sample of 17 test banks were reviewed for structure and cognitive complexity. Teat banks were selected to cover major content areas in undergraduate baccalaureate nursing courses: communications (1), community health (4), fundamentals (2), health assessment (1), medical-surgical (1), pediatrics (1), psychiatric-mental health (3), management and leadership (2), and research (2). All test-banks were from major nursing textbook publishers.

Instrument

There were 30 guidelines developed to assess test bank questions: 13 from Ellsworth et al. (1991) and 15 from nursing literature (Parley, 1989b; Gaberson, 1996; King, 1978; Klisch, 1994); there were some overlaps in guidelines. It was evident to us that some principles of testitem writing, such as accuracy, are more important than others, such as double-spacing between the stem and the options. However, since there were no published critiques of test banks used in nursing, all published guidelines that made sense educationally and did not require close textbook examination were used (see Appendix). These guidelines include both substantive and structural criteria. There were 2 guidelines covering problems not dealt with by Ellsworth et al. or the other literature added during the course of the study: (a) content should be current, and (b) no important content should be omitted. The cognitive level of questions was evaluated using a table describing Bloom's cognitive taxonomy (Demetrulias & McCubbin, 1982).

Procedure

A random sample of 30% of the chapters from each test bank was selected. Each question in a chapter was evaluated on (a) whether or not it violated generally accepted item writing guidelines, and (b) the cognitive level of the question. The distribution of letter options for each chapter in the sample was also assessed. The total number of questions evaluated from the 17 test banks was 2913, with a mean of 171, and a range from 29-623.

Six reviewers using a series of practice tests assessed inter-rater reliability. A practice test was compiled with 33 questions from several different test banks. An effort was made to include a variety of poorly written questions covering different clinical areas. All reviewers then used the 28 guidelines to evaluate the questions. This was repeated a second time with 22 new questions. A third 15question exam was then created. The results of the third exam were used to calculate inter-rater reliability; interrater reliability was .97. (None of the questions used to develop inter-rater reliability was included in the study sample.) Reviewers were selected based on a minimum of four years of classroom teaching experience and all were nursing faculty. Each reviewer examined 1 to 5 test banks in her area of teaching and clinical expertise. The number of test banks reviewed depended on the availability and size of test banks.

LIMITATIONS

A limitation of this study was that test banks from maternity nursing texts were not evaluated. We did not have the expertise within our work group to evaluate this content. It was initially thought that teaching an NCLEXRN review would have provided sufficient expertise for one of us to evaluate maternity content. There would have been no problem evaluating structural problems, such as combined options and length. However, it soon became obvious that content errors and omissions were often quite subtle and that content mastery, not just familiarity, was needed to evaluate test questions competently.

RESULTS

Violation of Item-Writing Guidelines

Out of the 2913 questions reviewed, there were 2233 violations of item-writing guidelines. (Some questions contained multiple violations.) The most common violations were inadequate spacing (960), uneven length of options (239), negative questions (166), more than one correct answer (120), use of options that were not plausible (98), and errors in grammar (97). Some violations were rarely or never found. These included overlapping options (6), not using both generic and brand names of medications (6), testing student opinion (6), using a "fill in the blank" format (10), the correct option echoing the stem (10), and use of humor and names of famous people (O) (Table 1).

Levels of Cognition

Of the 2913 questions, 47.3% were written at the knowledge level; 24.8% were at the comprehension level; 21.8% were at the application level; and only 6.5% were at the analysis level. The number of analysis questions ranged from zero (5 test banks) to 1-5% (7 test banks), 2122% (2 test banks) and 44% (1 test bank). One test bank, with 76% knowledge questions, did contain essay questions to assess higher cognitive levels (Table 2).

Distribution of Correct Options

Similar to the findings of Ellsworth et al. (1990), the correct answer in this study was fairly evenly distributed among the four options. Mean percents (rounded) in this study were: A, 24.6%; B, 29.6%; C, 25.8%; and D, 22.8%. Comparison of the observed with the expected percent frequencies in each cell (25%) indicated no significant differences in chi square calculations; all c2s< 1. Mean percents for each test bank ranged from D 16% in one test bank to D 36% in another. Comparison of the observed with the expected percent frequencies in each cell indicated no significant differences in chi square calculations; C2S ranged from 0.00 to 4.84. The chi square value needed to reach .05 probability was 26.30. The correct answers were evenly distributed (Table 3).

Table

TABLE 1Frequency of Guideline Violations in Test Bank Herns

TABLE 1

Frequency of Guideline Violations in Test Bank Herns

Table

TABLE 2Percent Distribution of Levels of Cognitive Complexity In Multiple-Choice Test Bank Items

TABLE 2

Percent Distribution of Levels of Cognitive Complexity In Multiple-Choice Test Bank Items

Table

TABLE 3Percent Distribution of A, B, C1 D Options Among Test Bank Items

TABLE 3

Percent Distribution of A, B, C1 D Options Among Test Bank Items

While the overall distribution of As, Bs, Cs, and Ds were evenly distributed, within the chapters distributions were often skewed. For example, in one chapter with 5 questions, "A" was the correct answer four times, "B" was correct once, and "C" and "D" were not used at all.

DISCUSSION

Most test bank questions are satisfactory. However, most test banks contain some questions that violate good test-writing practices. Violations of guidelines generally followed a pattern in each test bank, that is violations tended to be limited in type but when present tended to be pervasive. For example, only one test bank used combined options but this problem was found 22 times in the sampled items.

The large number of questions written at the knowledge and comprehension levels (72.1%) is of concern, particularly considering the large number of NCLEX-RN questions written at the application and analysis levels. This was also surprising because most of the textbooks reviewed were intended for upper division courses. Not all authors identified the cognitive levels of questions. When they were identified they were not always identified correctly. There were a number of questions that were identified as higher-level questions but were actually lower level questions. (The reverse was not seen.) Also, some authors used alternate taxonomies to Bloom without explaining why, and did not give the citation for the levels used to categorize questions when using taxonomies other than Bloom's. In test banks with multiple contributors, the quality of questions and correct identification of cognitive levels sometimes varied from chapter to chapter.

While the overall distribution of correct options was not a problem in this sample it is suggested that faculty check the distribution of correct answers before administering the exam. One suggestion is to arrange options in alphabetical order to assist in randomly distributing the correct option (Gaberson, 1996).

Some test banks may be suitable only as a source of ideas and questions should not be used as written. This is especially true if the faculty member does not have mastery of the content. The focus of more and more MSN programs is on preparing advanced practice nurses and fewer programs are offering teaching tracks. Fewer nursing faculty will now have the opportunity to learn about teaching, including item-writing, in a disciplined way (Parley, 1989a). This would seem to increase the risk that new faculty might unknowingly rely on test banks unless otherwise cautioned.

Several problems were found that have not been previously identified in the literature. However, future research studies should develop criteria from these problems. One problem we found was test questions with obsolete content, such as assuming a psychoanalytic explanation for schizophrenia or defining "family" as blood relationships or adoption. A second problem was identifying an option as correct that was wrong, for example, saying that lithium, a category D pregnancy risk, can be given to pregnant women. (The way these questions were written clearly indicated that these were not merely typographical errors.) A third problem was questions that suggested a lack of current clinical knowledge by the writer. For example, a question in one test bank asked about allergies to phenothiazines (antipsychotic drugs). True allergies to phenothiazines are unusual, but patients frequently complain of "allergies" when they experience common adverse reactions such as dystonia. A fourth problem was that important content was sometimes not tested. In one 1997 test bank, there was only one question about clozapine (Clozaril), a highly effective but potentially dangerous and expensive drug that has led to revolutionary changes in the pharmacological treatment of schizophrenia. The question asked about which neurotransmitters clozapine acted on while ignoring critical nursing actions associated with administering the drug, patient education, and social policy issues. It is important to note all of these problems may reflect problems with the textbook rather than with the test bank (Table 4).

Table

TABLE 4Test Banks and the Test Books Used In Study

TABLE 4

Test Banks and the Test Books Used In Study

Using established item-writing guidelines is necessary to improve test validity but it is not sufficient. Before the test is written, a test plan in which the allocation of questions is mapped out should be developed (Parley, 1989a). After valid test questions are written, the reliability of each item should be assessed using item analysis. Items should be evaluated for difficulty (percent of test-takers who answered an item correctly), discrimination (the degree to which an item discriminates between students who do well on the exam and those who do not), and the effectiveness of distractors (Parley, 1990; Flynn & Reese, 1988; Jenkins & Michael, 1986).

RECOMMENDATIONS

Faculty who use test banks should do so with caution. Some test bank questions may be useful only as sources of ideas for faculty written questions. More research should be done on writing test questions. Replications and extensions of Crehan and Haladyna's 1991 and Crehan, Haladyna, and Brewer's 1993 work using questions from nursing exams should be done to evaluate the validity of item-writing guidelines.

Further research using (a) more test banks, (b) including test banks with maternity nursing and other specialized content, (c) test questions used in nursing service and in credentialing exams, and (d) test banks accompanying practical nursing texts should also be carried out. It is also suggested that publishers carefully evaluate the clinical expertise and academic qualifications of test bank authors.

We have found that our multiple-choice questions have been considerably improved by following the recommendations generated by this study. Revising exams has also made us more confident that the exams we are now using accurately assess student performance.

ACKNOWLEDGMENT

The authors wish to thank Michele Ruby, Academic Resource Center, Bellarmine College, for her expert and and generous assistance in editing the manuscript.

REFERENCES

  • Clute, R.C., & McGrail, G.R. (1989). Bias in examination test banks that accompany cost accounting texts. Journal of Education for Business, 64, 245-247.
  • Crehan, K., & Haladyna, T.M. (1991). The validity of two itemwriting rules. Journal of Experimental Education, 59, 183192.
  • Crehan, K., Haladyna, T.M., & Brewer, B.W. (1993). Use of an inclusive option and the optimal number of options for multiple-choice items. Educational and Psychological Measurement, 53, 241-247.
  • Demetrulias, D.A.M., & McCubbin, L.E. (1982). Constructing test questions for higher level thinking. Nurse Educator, 7, 13-17.
  • Ellsworth, R.A., Dunnell, P., & Duell, O.K. (1990). Multiplechoice test items: What are textbook authors telling teachers? Journal of Educational Research, 83, 289-293.
  • Farley, J.K. (1989a). The multiple-choice test: Developing the test blueprint. Nurse Educator, 14(5), 3-5.
  • Farley, J.K (1989b). The multiple-choice test: Writing the questions. Nurse Educator, 14(6), 10-39.
  • Farley, J.K (1990). Item analysis. Nurse Educator, 15(2), 8-9.
  • Flynn, M.K., & Reese, J.L. (1988). Development and evaluation of classroom tests: A practical application. Journal of Nursing Education, 27(2), 61-65.
  • Gaberson, KlB. (1996). Test design: Putting all the pieces together. Nurse Educator, 21(4), 28-33.
  • Jenkins, H.M., & Michael, M, M. (1986). Using and interpreting item analysis data. Nurse Educator, 11(1), 10-14.
  • King, E.G. (1978). Constructing classroom achievement tests. Nurse Educator, 3(5), 30-36.
  • Klisch, M.L. (1994). Guidelines for reducing bias in nursing examinations. Nurse Educator, 19(2), 35-39.
  • Violato, C. (1991). Item difficulty and discrimination as a function of stem completeness. Psychological Reports, 69, 739-743.

TABLE 1

Frequency of Guideline Violations in Test Bank Herns

TABLE 2

Percent Distribution of Levels of Cognitive Complexity In Multiple-Choice Test Bank Items

TABLE 3

Percent Distribution of A, B, C1 D Options Among Test Bank Items

TABLE 4

Test Banks and the Test Books Used In Study

10.3928/0148-4834-20010101-07

Sign up to receive

Journal E-contents