Dr. Ogrinc is Director, Office of Research and Education in Medical Education (ORIME), Associate Professor, Community and Family Medicine and Medicine, Dartmouth Medical School, and Senior Scholar, VA National Quality Scholars Program, White River Junction VA, White River Junction, Vermont; and Dr. Batalden is Director, Center for Leadership and Improvement, Dartmouth Institute for Health Policy and Clinical Practice, and Professor of Pediatrics, Dartmouth Medical School, Lebanon, New Hampshire.
This material is based on support from and with resources and the use of facilities at the White River Junction VA, White River Junction, Vermont.
The authors have no financial or proprietary interest in the materials presented herein.
Address correspondence to Greg Ogrinc, MD, MS, WRJ VA, 215 North Main Street (170), White River Junction, VT 05009; e-mail: email@example.com.
Evaluation of health professions education is challenging. Although courses and seminars often are evaluated locally, the comprehensive assessment of innovative educational interventions to share the findings across sites does not occur consistently. Some advocate multi-institutional studies as the best way to test educational innovations. Designs such as randomized controlled trials, nonrandomized studies, and even prospective cohort studies are held up as the best way to build knowledge about the development of health professions education.
The design of an educational study often is combined with a familiar educational evaluation technique such as Miller’s framework (1990) for learners to demonstrate what they know, that they know how, that they can show how, and that they can do. This developmental evaluation framework is useful when assessing the conjunction of knowledge and skills for a task. Kirkpatrick’s model (1976) emphasizes the use of knowledge, attitudes, and skills as a foundation for the development of clinical skills, whereas Davis, Thomson, Oxman, and Haynes (1995) emphasized the link between the education and the skills for continuing education. In previous work, we have described the education value compass, which uses learner knowledge, patient clinical outcomes, function, satisfaction, and costs as a framework for evaluating the educational endeavors that occur with clinical care (Ogrinc, Headrick, & Boex, 1999).
Each of these approaches (and others) is useful depending on the focus of the evaluation. Miller’s model is extremely helpful when identifying the level and type of knowledge and skill that learners are achieving. Kirkpatrick and Miller’s model together can provide a convenient matrix of identifying evaluation tools and the skills that learners achieve. The education value compass and Davis’s work emphasize a balanced set of measures including the practice level changes and clinical outcomes for patients. However, each model falls short in a key component: none identify the depth of contextual information about the education site that can be helpful when attempting to replicate findings to another setting. Although these models determine evaluation tools and instruments for educational research, understanding the contextual conditions that exist as part of an educational intervention is also a vital step.
Evaluation frameworks often seek to identify whether an intervention is successful or not. They ask the question, “Was this intervention successful?” which is a basic yes or no proposition. Rarely do educational interventions simply fit into this frame. Educational systems are not simple systems—they are complex, and each educational system exists in a context (Ogrinc et al., 2007). The interaction between the system and the context contributes to the complexity. One does not pull a lever as a teacher and expect a reliable, standard response by the students on the other end. This is particularly the case with clinical learning settings. The complexities of the educational system are commingled with the complexities of the clinical care setting, and this complexity often is underestimated by educational researchers. Complex systems require regular monitoring and evaluation to improve them, for complexity is marked by adaptable interactions that behave in nonlinear ways. It is not possible to “pull the lever” of educational intervention and anticipate an outcome at the other end of the “machine.” This complex level of interaction must be well thought out when evaluating educational interventions, particularly interventions in the clinical setting that teach improvement of care and patient safety.
An emerging model in the social sciences shows promise for the evaluation of educational interventions. This model is the realist evaluation that was developed by Pawson and Tilley (Pawson, 2006; Pawson & Tilley, 1997). The hallmark of this evaluation system is not to ask (and answer) a yes or no proposition, but rather to ask and answer, “What works, for whom, and in what circumstances?”
Overview of the Realist Evaluation
Pawson and Tilley (1997) primarily focus on the evaluation of programs related to crime reduction. They examined programs such as laws to reduce repeat sex offenders and the relationship between education and recidivism rates for prisoners. They were troubled by attempts to answer the basic yes or no question of evaluation and came to the realization that it is an insufficient and incorrect question. This question assumes a clear linear relationship between the intervention and the outcome in the system. Particularly concerning for them was that complex programs and policies in the social sciences do not fit this narrow view of cause and effect relationships, in part due to the “reflexive” nature of the intervention and context into which it is delivered.
Pawson and Tilley (1997) recommend that evaluations focus on an “explanatory quest.” Interventions need to be tested and refined so as to further understand how they work. The basic tenet of realist evaluation, “What works, for whom, and in what circumstances?” is intended to trace the limits of when and where interventions are effective. Sometimes Pawson and Tilley (1997) also add, “…and in what respects and how?” Their aim is the development of explanatory “mid-range” theory that allows hypothesis generation and testing. This is an iterative, continuous quest for explanation and hypothesis generation, which is a different framework then the absolute answers that educational researchers often seek. In this article, we will focus only on the main question.
Realist evaluation is migrating and expanding from its social science roots to other health professions. Initially described as a method for evaluating social work practice and policy (Kazi, 2003; Sanderson, 2003), others have described the realist evaluation as a key framework of mixed-method evaluation (McEvoy & Richards, 2006). Recently, the realist framework has been used to describe a cardiac rehabilitation program (Clark, Whelan, Barbour, & MacIntyre, 2005) and to perform a realist synthesis of school feeding programs (Greenhalgh, Kristjansson, & Robinson, 2007). The authors chose this method because it focuses on developing an explanatory model that encompasses the mechanisms of action as well as the context in which these mechanisms are activated. How is this method used?
The basic steps in the realist evaluation are listed in Table 1, and Table 2 shows the basic definitions. To illustrate how these terms are used, we have provided an example of a new clock that is purchased for use as an alarm clock. This is a mechanical clock that requires daily winding. The initial step in the realist evaluation involves eliciting working theories. Theories are defined as any intervention that may be effective, and in social change, interventions are considered “theory incarnate.” On eliciting these working theories, you then select one theory that seems particularly promising, essentially your first hypothesis to test. In this example, the theory is that a new alarm clock will help me wake up on time each morning.
Table 1: Basic Steps in a Realist Evaluation
Table 2: Realist Evaluation Definitions with Examples of a Mechanical Clock
The next step is to formalize the hypothetical context, mechanisms, and outcome patterns (CMOs). The context is a description of the preexisting features of a locality, a situation, and even a system in which an intervention is introduced. The context is not limited to just a description of the locale but is a thorough consideration of any element that might be relevant to the mechanisms. In this case, the alarm clock is brought to my home, which happens to be close to the ocean. The home does not have air conditioning, so the windows are often open in the summer months. Also, the clock needs to be wound each evening, a job favored by my 4-year-old son.
Mechanisms are the processes through which the intervention is implemented. For a social or educational clinical improvement project, this includes individual interpretations. Mechanisms focus on how an intervention works, and often the mechanisms are hidden from (or obscured from) those who observe the outcome patterns. The mechanical clock is a convenient example because of the multitude of processes and mechanisms that occur inside the clock. The gears, springs, rods, and lubricants are combined and act in many processes to produce the time that is seen on the face of the clock and to move the pin that rings the alarm bell each morning. Mechanisms are considered the pivot around which realist evaluation revolves.
Outcomes focus on both the intended and unintended consequences resulting from the activation of different mechanisms in different context. The realist evaluation seeks to understand the patterns of outcomes through the relationships between outcomes, mechanisms, and context. Realist evaluation does not explicitly attempt to control the outcomes. In our clock example (Table 2), three outcomes are mentioned: slow hands due to weather conditions, hands that stop due to overwinding, and the ultimate outcome of not waking on time. The pattern of outcomes is the important element. How often do the sea air and the overwinding combine to make me wake up late? How do I make sense of the combination of the sea air and humidity? How are my patterns changed because of these interactions of mechanisms? The mechanisms and context are combined to identify the possible relationship to the outcomes. This is explicitly not a cause-and-effect relationship, but it is used to anticipate and understand what is occurring in this particular situation. The context and mechanisms cannot be controlled out of the evaluation; rather, they are used to explain the outcome patterns.
The third step in the realist evaluation is to test the theory by implementing an intervention (Table 1). The intervention (or theory) is started. As the implementation is initiated, the presupposed context and mechanisms are observed, measured, and evaluated. As this is occurring, data are collected (step 4). Data may be quantitative or qualitative. Realist evaluation does not dictate what type of data should be used nor does it specify how to collect, store, or analyze data; however, it does recommend that combinations of data are best used to evaluate whether the presupposed CMOs hold true. The final step of realist evaluation is to present a refined (and improved) theory for future interventions. It is this step that cycles back to the initial selection of a promising theory. This test and measurement step may be noted as a weakness of the realist evaluation. Again, the realist evaluation does not specify when or how much or how long a theory should be tested or measured, but it does recommend that the information gathered is then used to define and inform the next theory and intervention. In the clock example, evaluating the new clock system might include a daily comparison of the time to a digital clock that is synched to cell phone towers (or some other standard), an assessment of how often I am late to work, and perhaps even an assessment of my nightly routine putting my 4-year-old son to bed.
There are two main useful tools for performing a realist evaluation. The first is a diagram of the relationship between the context, methods, and outcome patterns (Figure 1). This is a simplification of Pawson and Tilley’s diagram (1997) and does not contain all of the elements in their original diagram; however, it captures the essence of the relationship between CMOs. First, note there is a regular pattern of outcomes (O) represented by the horizontal line traveling left to right along the bottom part of the oval. These patterns of outcomes are acted on some mechanism (M), represented by the diagonal line. This mechanism does not exist in isolation but occurs in clear relation to the outer oval, the context (C). The context oval encompasses both the mechanism and the outcomes. This diagram shows the profound influence of the context on the mechanisms and the outcome patterns.
Figure 1. Graphical Representation of the Context, Methods, and Outcomes During a Test Situation. Refer to the Text for Description of the Relationship Between These Key Elements in a Realist Evaluation.
As described earlier, an understanding of the context, mechanisms, and outcomes are developed prior to the initiation of an intervention. These often are noted using a realist hypothesis grid (Table 3). The columns in the hypothesis grid are labeled as “Some Plausible Mechanisms,” “Some Potential Context,” and “Some Possible Outcomes.” The modifier in each of these—plausible, potential, possible—indicates these grids are completed prior to the intervention and the testing of the theory. The strength of this evaluation is that it invites project leaders and stakeholders to consider mechanisms, context, and outcomes at the initiation. This hypothesis grid is a detailed level of hypotheses generation and assists in developing your measures. There is no “right” number of CMOs to complete. Table 3 has six as an example, but a project might require fewer or more. The key element is that the grid initiates the consideration of the relationship between these domains.
Table 3: Realist Hypothesis Grid Used to Link Context, Mechanisms, and Outcomes Before Initiating an Intervention
Using a Realist Approach to Evaluate Teaching About Improvement of Care
Now we will examine how the realist evaluation framework can be used for an educational intervention. This example involves teaching improvement of care to internal medicine resident physicians and medical students through experiential learning at an academic medical center. To understand how the intervention was tested, it is necessary to understand the structure of the inpatient medicine service, an important component of the context. The service consists of four teams that care for general medicine inpatients. Each team is on call every fourth night and consists of one or two third-year medical students, one intern, one resident, and one attending physician. Educational sessions are available for all learners on all teams with a weekly morbidity and mortality conference, weekly grand rounds, and a daily noon conference. Each team also has individualized “attending physician teaching rounds” two times per week.
For several years, one of the authors (G.O.) has been teaching about improvement of care during attending rounds. Many different approaches had been tried: discussing and search for waste in our work, a mini root cause analysis, standardized discharge planning, and comparing and contrasting evidence-based medicine to improvement of care. These sessions were of variable value, but no detailed evaluation was conducted. Learners were not interested in identifying waste because they had little authority to remove it; discharge planning can be frustrating, and it is a large issue for a single, uniprofessional team to address in just a 4-week rotation; and root cause analysis is a useful technique but it is not part of the daily work for the team members. These prior educational interventions were based on theories that were considered to be helpful. On reflection of these prior attempts, a new theory emerged. Teaching and learning about improvement of care should to be focused at the appropriate level of the learner and integrated into the daily workflow. Using the realist approach, a new theory (intervention) to test emerged: Improvement of care can be integrated into the inpatient internal medicine rotation.
With a promising theory ready for testing, the next step is to create a realist hypothesis grid with CMOs (Table 4). The mechanisms focus on the teaching structures and processes that are available. Mechanisms are the true facts or “cause-effect” statements of how the system operates. Therefore, M1 and M2 are realities of the daily work. Learners expect to learn “facts” from the attendings. To focus on the health care system and improvement of care, there would need to be less teaching of medical facts and more discoveries about the patterns and outcomes of care. Also, the white board in the team room is used for tracking individual patient progress. Using it for tracking a group project that focused on our population of patients would be a different mechanism in the routine of the team. M3 focuses on the learners’ schedule. This was an important mechanism for finding an appropriate project. Because the learners’ “on-call shift” is completed at 1 p.m., the attending physician is then responsible to complete any “loose ends” for the patients. The outcome pattern is that there are many more attending-patient encounters on this post-call day. This interaction led the attending physician to notice that influenza vaccines are not easily ordered through the electronic medical record. In fact, the electronic ordering function led to a blind alley where a clinician might try to order a vaccine, but the order would not be executed.
Table 4: Initial Theory and CMO Hypothesis Grid for Understanding Teaching About Improvement of Care During an Inpatient Internal Medicine Rotation
The CMOs also may be displayed in the CMO oval diagram (Figure 2). The contextual factors surround the mechanisms and outcomes. The mechanisms may be added to the system as represented by the complete arrow (teaching the basics of improvement of care, effect of work hours restrictions on the learners and attending) or may be elements that are removed from the system as represented by the dotted arrow (less teaching about facts, not using the white board for individual patient reminders). This indicates an opportunity to “subtract” or “diminish” the currently operating mechanisms. The outcomes such as learner satisfaction, focus on improvement of care, and patient outcomes (percent of patients who received influenza vaccine on discharge) travel along horizontally near the bottom of the oval.
Figure 2. Hypothesis Generating CMOs (context, Methods, and Outcomes) for Teaching About Improvement Placed in Pawson and Tilley’s (1997) Oval Diagram.
The theory was implemented and tested for a 4-week period. An introductory lecture and discussion about improvement theory and methods initiated the discussion about improving the rate of influenza vaccines for our patients on discharge. The team initially had 40% influenza vaccine completion, but within 4 weeks, this increased to 100% of patients (Figure 3). This was quite a different focus for these learners. They tried many interventions including increased vigilance (not effective), changing the computer system (many months to occur), creating a standardized order (difficult to coordinate with nursing service), and using a checklist (effective). The learning about improvement of care and systems occurred over time, not necessarily during dedicated “teaching sessions.” When we consider the core question of the realist evaluation (“What works, for whom, and in what circumstances?”), we might arrive at the following conclusions:
- What: A clinically focused project, measurable and changeable at the individual team level.
- Whom: Medical students, residents, and an attending physician.
- Circumstances: Daily attention (almost) made improvement of care a part of the usual work pattern.
Figure 3. Example of the Changed Use of the White Board in the Team Room (M2 from Table 4). Notice the Emphasis on Measurement of the Outcomes, the Annotations to the Chart, and Notes About the Changes that Were Tried.
So the realist evaluation was able to prime the evaluation of this educational intervention. The realist framework did not specify what to measure or when or how often, but taking into consideration the mechanisms and contexts of this intervention led to a deeper understanding of this process. This level of understanding is more informative than a comparison of numbers from learner satisfaction ratings or simple learning outcomes of facts.
This depth of understanding is vital for the iterative nature of realist evaluation. For example, working to improve the influenza vaccine rate would not be possible every month on the inpatient medicine service (i.e., it would not be a reasonable choice for a team from March to September). What were the important elements to include when teaching improvement of care on the inpatient medical service? The intervention was sufficiently focused to include and attract many levels of learners. The improvement focus was clinically relevant and was important to the daily work of the team. These conclusions will inform the development of the next theory to test: pneumococcal vaccine rates, deep venous thrombosis prophylaxis, antibiotic timing, or chronic disease management on the inpatient service. Put together, they form a mid-level theory on attractive, interesting learning about improving patient care as part of “regular” learning opportunities. It arises from and builds a deeper understanding of the “usual” learning setting, mechanisms. It invites attention to the “real” work of change and innovation. The possibilities for clinical focus are almost unlimited, but the realist evaluation framework provides a grounding to develop the basic foundational elements of integrating teaching on an inpatient medical team.
Evaluation of educational interventions is challenging, and doubly so for clinical education evaluation. There are sufficient models to create measures of knowledge and skills, but the realist evaluation offers a framework that complements prior work. The realist evaluation does not give a yes or no answer as to whether a new curriculum is effective; rather, it assists the evaluator in finding the reasons why a program is successful (or not). This framework is vastly more helpful to educators, for we recognize that our educational systems (and our innovations) are often complex. The evaluation framework should be one that focuses on explanation, not adjudication. By understanding what works, for whom, and in what circumstances, we can build the knowledge to continue the development of educational interventions for learners in our own context and share those successes with others.
- Clark, A.M., Whelan, H.K., Barbour, R. & MacIntyre, P.D. (2005). A realist study of the mechanisms of cardiac rehabilitation. Journal of Advanced Nursing, 52, 362–371. doi:10.1111/j.1365-2648.2005.03601.x [CrossRef]
- Davis, D.A., Thomson, M.A., Oxman, A.D. & Haynes, R.B. (1995). Changing physician performance: A systematic review of the effect of continuing medical education strategies. The Journal of the American Medical Association, 274, 700–705. doi:10.1001/jama.274.9.700 [CrossRef]
- Greenhalgh, T., Kristjansson, E. & Robinson, V. (2007). Realist review to understand the efficacy of school feeding programmes. BMJ, 335(7625), 858–861. doi:10.1136/bmj.39359.525174.AD [CrossRef]
- Kazi, M. (2003). Realist evaluation for practice. The British Journal of Social Work, 33, 803–818. doi:10.1093/bjsw/33.6.803 [CrossRef]
- McEvoy, P. & Richards, D. (2006). A critical realist rationale for using a combination of quantitative and qualitative methods. Journal of Research in Nursing, 11, 66–78. doi:10.1177/1744987106060192 [CrossRef]
- Miller, G.E. (1990). The assessment of clinical skills/competence/performance. Academic Medicine, 65(9 Suppl.), S63–S67. doi:10.1097/00001888-199009000-00045 [CrossRef]
- Ogrinc, G., Headrick, L. & Boex, J. (1999). Understanding the value added to clinical care by educational activities. Value of Education Research Group. Academic Medicine, 74, 1080–1086. doi:10.1097/00001888-199910000-00009 [CrossRef]
- Ogrinc, G., West, A., Eliassen, M.S., Liuw, S., Schiffman, J. & Cochran, N. (2007). Integrating practice-based learning and improvement into medical student learning: Evaluating complex curricular innovations. Teaching and Learning in Medicine, 19, 221–229.
- Pawson, R. (2006). Evidence-based policy: A realist perspective. London: Sage.
- Pawson, R. & Tilley, N. (1997). Realistic evaluation. London: Sage.
- Sanderson, I. (2003). Is it ‘what works’ that matters? Evaluation and evidence-based policy-making. Research Papers in Education, 18, 331–345. doi:10.1080/0267152032000176846 [CrossRef]
Basic Steps in a Realist Evaluation
|These steps are iterative and after the cycle is completed, you start back at Step 1 with a new theory to test. These iterations are particularly fitting for educational programs that teach about the improvement of care.|
Elicit working theories. Select a promising theory.
Formalize hypothetical context, mechanisms, and outcome processes (CMOs).
Test the theory.
Collect data (qualitative, quantitative).
Present a refined (and improved) theory to inform future interventions.
Realist Evaluation Definitions with Examples of a Mechanical Clock
|Term||Definition||An example applied to the workings of a mechanical clock|
|Theory||Interventions are theory incarnate||This new alarm clock will ensure I awake on time|
|Theories are often multiple|
|Context||Preexisting features of a locality/situation/microsystem into which interventions are introduced||I live near the ocean|
|Those elements that are relevant to the operation of the mechanism||It is often humid|
|Much more than just the locale or the setting||The temperature can be > 90° for several weeks|
|Digs deep into those factors that impact the mechanisms||My 4-year-old son likes to wind the clock|
|Mechanism||Describes what it is about the interventions that bring about effects||Fine crafted and tuned parts|
|The process(es) through which subjects interpret and act on the interventions||Reliable, long-lasting spring|
|This is the pivot around which realist research revolves||Easy to use crank|
|Must be wound each day|
|Outcome patterns||Intended and unintended consequences resulting from the activation of different mechanisms in different contexts||High humidity and sea air cause the clock to run slow|
|Also includes deciphering the reasons behind why the outcomes occur||My son often overwinds the clock and the hands stop|
|I do not awake on time|
Realist Hypothesis Grid Used to Link Context, Mechanisms, and Outcomes Before Initiating an Intervention
|Some Plausible Mechanisms||Some Potential Contexts||Some Possible Outcomes|
Initial Theory and CMO Hypothesis Grid for Understanding Teaching About Improvement of Care During an Inpatient Internal Medicine Rotation
|Theory: Improvement of care can be integrated into the inpatient internal medicine rotation|
|Some Plausible Mechanisms||Some Potential Contexts||Some Possible Outcomes|
|M1: Less teaching of medical “facts”||C1: Expectation by the learners to get “the facts”||O1: Decreased learner satisfaction|
|M2: Use white board to monitor progress of improvement||C2: White board normally used for individual patient to-do list||O2: Daily conversations about the improvement project|
|O2: Missed action items for other patients|
|M3: Learners leave hospital at 1 PM after on-call night||C3: Regulations require 80-hour work week||O3: Attending has more patient interactions|