When documenting changes in functioning, evaluating the effects of an intervention on an individual’s life, or measuring the effectiveness of a program of care, there are a number of methods to assess specific areas of behavior. These methods include self-report, caregiver report, and family report of person-specific behaviors. Self-report of behaviors by an older adult with dementia is often not feasible in light of significant cognitive decline and communication difficulties, and ratings made by others can introduce measurement errors. These include poorly defined behaviors, unstandardized and variable opportunity for observation of behavior, unclear reference periods in looking retrospectively at an individual’s behavior, and errors in recall of the behavior and its context. Retrospective report is also limited in providing accurate information on the frequency and duration of behaviors.
Review of Direct Observation Measures
A literature review was conducted of direct observation measures of individual behavior for use in assessing individuals with dementia. The literature from 1990 through 2007 in the AgeLine, PsycLIT, PubMed, and Sociological Abstracts databases was searched using the keywords dementia and observation, behavior observation, behavior assessment, behavior evaluation, observational measurement, observational assessment, quality of life assessment, affect observation, pain observation, and pain assessment. In addition, relevant references to observational measures cited in the literature obtained from these searches were gathered.
As mentioned above, methods of behavior observation for use with older adults with dementia are often developed specifically for the purposes of a particular project. A number of project-specific methods of observing behavior may be used to evaluate the effects of an intervention, such as interdisciplinary behavior management (Hughes & Medina-Walpole, 2000), music (Ragneskog, Kihlgren, Karlsson, & Norberg, 1996), psychosocial management (Doyle, Zapparoni, O’Connor, & Runci, 1997), behavioral activity (Swift, Harrigan, Cappelleri, Kramer, & Chandler, 2001), and domus philosophy of care (Dean, Briggs, & Lindesay, 1993; Lindesay, Briggs, Lawes, MacDonald, & Herzberg, 1991). These methods were developed for study-specific purposes, are not easily applicable for generalized use, and do not comprehensively assess methodological rigor. For the purposes of this review, observational measures that can be readily used in other studies, have been evaluated methodologically for use with individuals with dementia, and are in English were included. The measures are described briefly in this article, and their methodological rigor is documented in Tables 1 through 4.
Table 4: Characteristics and Psychometric Properties of Direct Observation Measures of Individuals’ Quality of Life (QOL)
To aid readers, definitions of the kinds of validity assessed are provided. Content validity recognizes the need to ensure that the entire domain of behaviors targeted is represented. Convergent validity confirms predicted relationships between the targeted measure and other methods and measures of related behaviors and constructs. It includes concurrent validity, which focuses on the relationship between two measures that are thought to measure the same behavior. Divergent validity is demonstrated when no relationship is found between measures thought to be unrelated. Construct validity is difficult to define because it is thought of as any process of validation of the theory underlying what is measured. This can be done by looking at the pattern of convergent and divergent validity in an attempt to evaluate whether the tool appears to be measuring the construct it claims to. Determinations of construct validity are often based on cluster and factor analysis, as well as the ability of the measure to discriminate between known groups or conditions in predicted ways.
Behavior observation measures are presented and discussed using the following domains: negative and positive behavior, negative and positive affect, functional behavior, and quality of life.
Negative and Positive Behavior
A number of measures of direct observation of individual behaviors are currently used in older adults who have dementia (Table 1). The Agitation Behavior Mapping Instrument (ABMI) is a measure of direct observation of agitated behaviors in the natural environment (Cohen-Mansfield, Werner, & Marx, 1989, 1992; Cohen-Mansfield & Libin, 2004). The tool includes the dimensions of aggressive, physically nonaggressive, and verbally agitated behaviors. Frequency and intensity of agitated behavior is recorded using either a checklist or laptop computer. Behavioral mapping is used to provide the environmental context and triggers within which to frame the behavior. Raters observe for 3 minutes at preselected times or events, and the authors recommend at least 23 observations per individual being assessed. Intensive training of observers is needed to achieve reliability. Good interobserver agreement was obtained with the measure, as well as convergent validity with the Cohen-Mansfield Agitation Inventory (CMAI) (Cohen-Mansfield, Marx, & Rosenthal, 1989).
The Agitated Behavior Scale (ABS) was originally developed to measure agitation through direct observation in individuals following traumatic brain injury (Corrigan, 1989). The measure was revised for use with individuals with dementia (Bogner, Corrigan, Bode, & Heinemann, 2000; Hall & Hare, 1997) on the basis of the assumption that agitation is the same phenomena across conditions. Raters record the presence or absence of attentional and mood difficulties, physical and verbal agitation, uncooperative and aggressive behavior, self-abuse, and positive affect and behaviors. Observations occur for 5 seconds of every minute for 10 to 20 minutes at preselected times. Training includes three sessions using videotapes to establish definitions of agitated and positive behaviors. Interrater reliability is well established, and internal consistency is demonstrated in the brain injury population. Item selection was based on item differentiation using ratings of individuals with traumatic brain injury, with no demonstration of content validity in individuals with dementia. Convergent validity was established in those with brain injury. Factor analyses demonstrate a consistent general construct and two latent variables of “directed” and “nondirected” agitation. Construct validity was demonstrated in the dementia population through sensitivity of positive behaviors to a video-respite intervention.
The Computer-Assisted Data Collection (Burgio et al., 1994; Holm, Rogers, Burgio, & McDowell, 1999; Pollock et al., 1997; Rogers et al., 1999) focuses on the effects of multiple environmental factors, including staff behavior, on resident behavior. A laptop computer is used to record the frequency and duration of disruptive vocalizations, duration of activities of daily living (ADLs), number of caregiver assists, requests for help, and variables identifying environmental context. Between 12 and 24 samples of behavior, 30 minutes in duration, are observed over a 2-week to 3-week period, distributed randomly from 8:00 a.m. to 8:00 p.m or structured to include morning, afternoon, and evening time periods. Intensive training and practice is required to obtain adequate reliability. Interobserver reliability was demonstrated, and convergent validity was established with cognitive functioning (i.e., Mini-Mental State Examination [MMSE]; Folstein, Folstein, & McHugh, 1975) and ADL functioning.
The Cohen-Mansfield Agitation Inventory (CMAI-R) revised the staff-reported CMAI to be used in real-time observation of a specific care event (Whall, 1999). The observation measure was developed to better understand and describe need-driven, dementia-compromised behaviors. Observations are videotaped, and raters record the frequency and duration of agitation. Observations of less than 30 minutes (7 to 10 minutes) during a care event are recommended to increase the chance of observing needdriven, dementia-compromised behaviors and avoid rater drift. Training entails the use of adult learning principles to rate videotapes and is accomplished in two to three brief sessions of less than 30 minutes each. Aides and experts are required to reach and maintain a 0.80 level of agreement. Interobserver reliability was demonstrated to be fair. Good convergent validity was established using staff-reported behavior and agitation ratings.
The Disruptive Behavior Scale (DBS) was developed to assess the intensity of observed disruptive behavior in geropsychiatric patients (Beck, Heithoff, et al., 1997; Beck et al., 1998). Disruptive behavior is defined in terms of its result, leading to negative consequences for the resident, caregiver, or another resident. Physically aggressive and nonaggressive, vocally aggressive, and agitated behaviors are included. The frequency of disruptive behaviors is documented each hour during 8-hour shifts over 2 to 7 consecutive days. Intensity of behaviors documented is determined by item weights. Training includes discussion and review of videotapes, 7 days of practice, and weekly retraining sessions. Reliability is monitored every other month by comparing the DBS forms with disruptive behaviors recorded in the medical record. Interrater reliability was good when data was log-transformed to improve agreement on high-frequency behaviors. Content validity was ensured. The combined frequency and intensity score better predicted nurses’ perceived disruptiveness than did the presence or absence of disruptive behaviors, frequency only, or intensity only. Convergent validity was established through a number of significant correlations with other measures. All four categories of disruptive behaviors were significantly intercorrelated.
The Empirical Behavioral Pathology in Alzheimer’s Disease Rating Scale (E-BEHAVE-AD) is an observational scale based on the caregiver report measure BEHAVE-AD (Auer, Monteiro, & Reisberg, 1996; Reisberg et al., 1987). The intensity of agitated behavior is recorded using a checklist that includes categories of paranoid/delusional ideation, hallucinations, activity disturbance, aggressivity, affective disturbance, and anxieties/phobias. Clinicians make observations during an informal dialogue in a 20-minute clinical interview. This is based on the theory that expressed emotion in the caregiver is a predictor of agitated behavior in individuals with dementia. Minimal training is needed because illustrative examples accompany the scale. Interrater reliability was good. Convergent validity was established with caregiver report (BEHAVE-AD; Reisberg et al., 1987). In general, caregivers reported more behavioral symptoms than the clinicians observed.
The Environment-Behavior Interaction Code (EBIC) (Morgan & Stewart, 1998; Stewart, Hiscock, Morgan, Murphy, & Yamamoto, 1999) is based on theories of personenvironment interaction and personal space and provides codes for ongoing interaction between the person and environment. Using a handheld computer, a nonparticipant observes and records in real time the frequency, intensity, and location of disruptive (including aversive, harmful, agitated), neutral, and positive behavior, according to topography (i.e., physical social, physical asocial, verbal, vocal, nonverbal) and a priori environmental influence (i.e., positive, neutral, negative). The initiator and recipient, degree of attentiveness, and presence of activity and restraints are also recorded. Observations are made during 4 to 18 periods (25 to 30 minutes in length) during a meal and when interactions are likely. Training is rigorous. Tests of interobserver reliability were good. Moderate test-retest reliability was demonstrated. Content and construct validity were also demonstrated.
The Engagement Scale targets the observation of engagement in older adults with dementia to evaluate the effects of an activities intervention (Judge, Camp, & Orsulic-Jeras, 2000). Engagement is targeted because its absence is the most frequently exhibited problem behavior and can lead to other behavioral problems. The duration of four main categories of engagement are recorded: constructive, passive, non-engagement, and self-engagement. Observations are made for 10 minutes, twice in the morning and twice in the afternoon, during regularly scheduled activities. Good interrater agreement was obtained. The construct was validated, as the scale discriminated between baseline and posttest results of those in a Montessori treatment group and between observations made during a Montessori group and a general activity programming group, and higher constructive engagement was demonstrated during a Montessori activity than in general program activities.
McCann, Gilley, Hebert, Beckett, and Evans (1997) constructed a direct observation measure of behavior to compare with staff-reported behavior in long-term care facilities. Observers record the frequency of the resident’s level of alertness; facial affect; ratings of aggressive, injurious, and disruptive behaviors; and the context surrounding these behaviors. Each resident is observed for 5 consecutive minutes during five randomly chosen times, with each minute considered a discrete interval for recording presence or absence of the behaviors. Training includes didactic sessions and practice observations until good interrater reliability is achieved. The range of behaviors targeted for measurement were developed from clinical and empirical data on psychiatric inpatients and outpatient clinical evaluations of individuals with Alzheimer’s disease, and were revised based on direct observations of nursing home residents with dementia. Convergent validity was established with staff-rated behaviors. Direct observation was demonstrated to be a more sensitive measure of behaviors than are staff ratings for all behaviors except cursing and physical aggression, which staff report at a higher rate, likely because of their effects on staff.
The Natural Observer was created to evaluate the effects of intergenerational programming with nursing home residents who have dementia (Ward, Los Kamp, & Newman, 1996). The system of observation targets positive observable behavior, such as eye contact, smiling, touching, rhythmic hand movement, verbal expression, hand extension, head nodding, and laughing. Raters record the presence of behaviors on a checklist only once during an observation. Observations are made for 30 seconds, either videotaped or recorded by hand, and are conducted every other week for 6 weeks. Some training is required but is not described. Interrater reliability is good. The ratings made in real time and using videotape recordings demonstrated similar effects of intergenerational programming but were not formally compared. Trends were found toward more observed positive behaviors during activities with children, but differences were not significant, other than less head nodding in the presence of children. The authors concluded that the measure needed to include behaviors indicating attentiveness.
The Observational Scale was developed to measure the full range of problematic behaviors in individuals with dementia (Ward, Murphy, Procter, & Weinman, 1992). This scale records the presence of attention to environment, posture, mobility, communication, informal recreation, personal care, eating, aggression, repetitive activity, inappropriate activity, and mood extremes. Observations are made during 10 periods (14 minutes in length) throughout the week during preselected and randomly selected times, alternating 10 seconds of observation and 10 seconds of recording. Training entails agreement of category definitions and criteria when observing videotaped behavior. Agreement between observers was good. Low frequency categories, such as physical aggression and lying down, demonstrated poor agreement. Videotaped recordings of patients on the wards were used to derive the categories of behavior, ensuring content validity. Convergent validity was not established for the full range of behaviors observed, but observations combined to produce a feeding score were associated with physical deterioration, staff-reported feeding score, and physical disability score.
The Observer computer software system is used to manage the data obtained from recording streams of behaviors on a handheld event recorder (Van Haitsma, Lawton, Kleban, Klapper, & Corn, 1997) to study the ecology of older adults with dementia. ADLs, pathological and social behaviors, and affective states (described under Apparent Affect Rating Scale) are directly observed in real time; their frequency and duration are recorded continuously using a handheld recorder and are identified within social context and physical location. Sixteen observations are made in 10-minute periods over 3 weeks, during early and late morning, late afternoon, and evening, avoiding mealtime and morning and evening care. Training involves monitoring reliability and honing criteria after discussion of discrepancies for 40 to 50 observations. Interobserver agreement was good. Convergent validity was demonstrated with cognitive functioning (Mattis Dementia Rating Scale; Mattis, 1988), Multi-Observational Scale for Elderly Subjects (MOSES; Helmes, Csapo, & Short, 1987) withdrawal and depression scales, activity participation ratings, and staff-rated physical agitation (CMAI; Cohen-Mansfield, Marx, et al., 1989), but not with verbal agitation and aggression. Staff report of depression was negatively related to observed participation in enriching activities but not social interaction. The nurse aide ratings and activity therapist ratings of activity participation differed in terms of convergence with active gazing (aides’ ratings not related) and group activity (therapists’ ratings not related).
The Overt Agitation Severity Scale (OASS) was developed as a rating scale of agitation that would reduce the overlap between agitated behavior and symptoms of anxiety, mood, and other disorders (Yudofsky, Kopecky, Kunik, Silver, & Endicott, 1997). The frequency and intensity of body movements and vocalizations are recorded. Observations are made for 15 minutes from a distance of 20 or more feet at four preselected times of agitation and nonagitation. Interobserver reliability is good, and internal consistency was demonstrated across times and raters. Content validity was established. There was strong convergence with the Pittsburg Agitation Scale (Rosen et al., 1994). Divergent validity was established through low but significant correlations with staff-rated aggression.
The Pittsburg Agitation Scale (PAS) was developed as a concise, easy-to-use rating scale of agitation that can be completed by clinical staff on the basis of direct observation (Rosen et al., 1994). Observers rate aberrant vocalizations, motor agitation, aggressiveness, and resting care on a scale of 0 to 4, according to intensity, disruptiveness within the environment, and ease with which the behavior can be redirected. The scale does not describe specific behaviors or provide a complete profile of behavior problems. Scores reflect the most aberrant behavior during the period of observation, which is typically the target of clinical intervention. The intensity score is specific to the environment, meaning a behavior may be scored differently based on the environment in which it was observed. Two to five people are rated simultaneously for 1 to 8 hours. For the study, observations were made during three 8-hour shifts per day over 7 days. The PAS takes less than 1 minute to complete and can be administered by clinical staff with no training other than reading brief instructions. Interrater reliability was obtained. Convergent validity was established with real-time, computer-recorded behaviors of aberrant vocalizations, motor agitation, and aggressiveness in a restricted sample of individuals with dementia and behavior disturbances. Construct validity was demonstrated through its sensitivity to behaviors that lead to physical restraint or as-needed medication.
The Patient Behaviour Observation Instrument (PBOI) (Bowie & Mountain, 1993, 1997) was aimed at improving the categorization and measurement of behavior in individuals with dementia. Using a handheld computer, observers record the duration of ADLs; participation in/reception of care; social engagement; abnormal motor activity; and antisocial, inappropriate, and neutral behavior. Each individual is observed for 5 minutes every hour between 8:00 a.m. and 9:00 p.m. An unspecified but “extensive” amount of practice is needed for familiarization. Interobserver reliability and content validity were established. Construct validity was established through its sensitivity in distinguishing between wards rated on environmental quality.
The Resistiveness to Care Scale in individuals with dementia of the Alzheimer’s type (RTC-DAT) was created to evaluate behavioral approaches for reducing resistiveness to care, the most frequent disruptive behavior evoked in those with dementia by caregiving encounters (Mahoney et al, 1999). The RTC-DAT is a context-sensitive instrument measuring rapid changes in individual behavior. It records the duration and intensity of each behavior and is completed by someone who is not a direct caregiver. Observations are typically made during the first 5 minutes of 4 to 12 instances of care provision; 5 minutes being the average duration of the shortest ADL: toilet use. Observers received intensive training until percentage agreement exceeded 95%. Interrater reliability and internal consistency were established. Content validity was ensured a priori. Convergent validity was established with the Discomfort Scale for individuals with dementia of the Alzheimer’s type (DS-DAT; Hurley, Volicer, Hanrahan, Houde, & Volicer, 1992), observer analogue rating of resistiveness, speech assessment, nurse-rated dementia severity, and the Katz Index of ADLs (Katz, Ford, Moskowitz, Jackson, & Jaffee, 1963), but not with cognitive functioning (MMSE; Folstein et al., 1975). Three factors were found, but the authors recommended using one score, as the factors do not have satisfactory reliability to be used as subscales.
The Scale for Observation of Agitation in Persons with Dementia of the Alzheimer’s type (SOAPD) is a measure of observed agitation designed to be uncomplicated to administer (Camberg et al., 1999; Hurley et al., 1999). The SOAPD targets behaviors that reflect an unpleasant state of excitement and communication of personal discomfort, are unrelated to known physical needs, and remain after reducing internal and external stimuli, making a distinction between agitation and the evoked behavior of resistiveness. The duration of agitated body movements and vocalizations is recorded by hand, and intensity is rated for two items. Observations are made during four 5-minute rating periods at 20-minute intervals over 3 or more hours at preselected times, such as after morning care, before lunch, after lunch, and before dinner for 3 days. The SOAPD requires intensive manualized training and practice observations. Interobserver agreement was adequate for the total score and ranged from fair to good for the seven items. Internal consistency was adequate. Content validity was ensured, and convergent validity was established with staff analogue rating of agitation and the CMAI (Cohen-Mansfield, Marx, et al., 1989). Two factors emerged: physical and verbal agitation. However, the scale was not sensitive to kind of activity intervention.
Smallwood, Irvine, Coulter, and Connery (2001) developed a Short Observational Tool (SOT) to assess five behavior dimensions of individuals with dementia, including neutral behavior, motor behavior, self-care, external behavior, and inappropriate behavior. Fifteen-minute videotaped observations are sampled twice during preselected times of the day, and behavior is recorded every 30 seconds. The tool demonstrated good interrater reliability. Divergent validity between three of the five behavioral dimensions of the tool was established, as 8 of the 10 correlations between dimensions were not significant, and motor and inappropriate behavior were the only dimensions positively correlated.
Timewand observation (Bridges-Parlet, Knopman, & Thompson, 1994) is a method that focuses on assessment of physically aggressive behavior, distinguishing it from agitation in terms of its unique therapeutic and safety issues and separate treatment indicators. A handheld bar code reader is used to record the frequency and sequential order of behaviors and environmental characteristics. The frequency of aggressive and other idiosyncratic (person-specific) behaviors, ADLs, and the environmental context in which the behaviors occur are recorded every minute as present or absent. Observations are typically made during four 2-hour intervals within a 1-week period during times when aggressive behavior typically did and did not occur. Tests of interobserver reliability were adequate to good. Convergent validity was loosely established with behaviors recorded in a staff diary.
Negative and Positive Affect
The direct observation method has also been devoted to measuring a specific aspect of behavior—the nonverbal communication of affect. Traditionally, self-report is used to assess emotional experience, but this is difficult to undertake in individuals with severe cognitive impairment who do not reliably report internal states verbally. The direct observation of nonverbal behaviors and facial expressions can be used to measure emotion in these individuals (Table 2).
The Apparent Affect Rating Scale (AARS) was constructed to assess the full range of affect in individuals with dementia (Lawton, Van Haitsma, & Klapper, 1996; Lawton, Van Haitsma, Perkinson, & Ruckdeschel, 1999). Observers record the duration of positive and negative emotional states, targeting facial expressions, body language, and nonword vocalizations. Ratings are made using either a checklist or a handheld event recorder. The authors recommend 9 to 19 observations (5 minutes in length) over 2 or more days. Specified times or events are chosen for observation on the basis of the specific aims for its use. One month of training is required, although less training (i.e., 30 minutes plus nine practice observations) can be provided to certified nurse assistants (CNAs) using the checklist format. Interobserver reliability was good, although lower between research assistants and CNAs. Convergent and divergent validity were demonstrated through significance in 13 of the 15 hypothesized relationships for negative affect, 9 of the 13 hypothesized relationships for positive affect; of the relationships hypothesized not to differ from zero, none of the 15 involving negative affect and only 2 of the 20 involving positive affect were significantly related. Construct validity was established by demonstrating significant differences in observation of affect states in various settings, and some support was demonstrated for the two-factor structure of affect without the inclusion of anger.
The Apparent Emotion Rating Scale (AERS) was constructed to assess positive and negative emotional states in individuals with cognitive impairment (Snyder et al., 1998). It is similar in conceptualization of emotion to the AARS but differs in format and scoring. Observations are made in one 5-minute to 10-minute observation during a predetermined time of day. A small amount of training is needed. Interrater reliability estimates were demonstrated for all emotions except anxiety; test-retest reliability was adequate, but internal consistency was lacking. Convergent validity was demonstrated with depression and morale. Construct validity was established through patterns of correlations among the emotions rated and negative relationship with cognitive impairment.
The Checklist of Nonverbal Pain Indicators (CNPI) provides a user-friendly way of documenting nonverbal behavioral pain indicators. It is based on the University of Alabama-Birmingham (UAB) Pain Behavior Scale (Richards, Nepomuceno, Riles, & Suer, 1982) and modified for nonverbal older adults with cognitive impairment, eliminating items that require ambulation and standing (Feldt, 2000; Nygaard & Jarland, 2006). Training includes instructions from nursing staff, which are not detailed. Observations are made during a period of rest and transfer over a 2-day period, although the period of rest was dropped due to infrequent pain behaviors. Interrater reliability and test-retest reliability was established. Content validity was ensured through an extensive literature review of existing pain scales that focus on pain behaviors typical in older adults with dementia, including vocalizations, facial grimacing, bracing, rubbing, restlessness, and vocal complaints. CNPI scores at rest and during movement were correlated with a self-report pain scale, and scores during movement were correlated with staff analogue rating of agitation, but its relationship with an interview-based assessment of pain was not reported.
The Discomfort Scale for persons with Dementia of the Alzheimer’s Type (DS-DAT), also known as the Hospice Approach to Discomfort Scale or the “Hurley Scale” (Hurley et al., 1992; Krulewitch et al., 2000; Volicer et al., 1994), documents the intensity of the presence of discomfort during direct observation, resulting in a numerical score of 0 to 27. The scale has been adapted to help identify when to begin the Assessment of Discomfort in Dementia protocol (Kovach, Weissman, Griffie, Matson, & Muchka, 1999), which is a systematic tool to help nurses make a differential assessment and develop a treatment plan for both physical pain and affective discomfort in individuals with dementia. Those administering the DS-DAT memorize the items during rigorous training sessions. Training for use of the entire protocol includes skills training and a practice period of 1 to 2 months. Observations are made for 5-minute intervals during a specific time or event that might cause discomfort. Good interrater reliability and internal consistency were documented. Content validity was ensured. Convergent validity was established with disease progression, physical disability, speech assessment, and cognitive functioning. Construct validity was demonstrated through the scale’s ability to distinguish between a fever state and health and through decline in discomfort following the use of the treatment protocol. However, the raters were not blind, as they were the same nurses who administered the protocol.
The Feeling-Tone Questionnaire (FTQ) was designed to detect symptoms of depression among older adults with communication impairment, specifically those with severe dementia, and assesses a broad range of feelings, including positive, neutral, and negative (Toner, Teresi, Gurland, & Tirumalasetti, 1999). The developers took a multisource approach; information is obtained through self-report and direct observation. The FTQ is administered during any 5-minute to 10-minute period, and display of emotion and interaction style is observed and rated according to intensity. Interobserver reliability and test-retest reliability were adequate, and internal consistency was good. Content validity was established. Convergent validity was demonstrated with an observation measure of affective behavior designed as a validation measure specifically for the study, and with physicians’ ratings of depressive symptoms.
The Maximally Discriminative Facial Movement Coding System (MAX) uses computerized technology to record affect (Magai, Cohen, Culver, Gomberg, & Malatesta, 1997; Magai, Cohen, Gomberg, Malatesta, & Culver, 1996). This system uses an anatomically and theoretically based coding scheme of facial movement, recorded on a continuing, realtime basis. The duration and frequency of the targeted affective states of positive and negative affect are recorded using a laptop computer. It is notable that anxiety/fear is not assessed. Two observers are needed: one to code the brow/eye region and the other to code the mouth region. Observations are made during a 20-minute family visit. Training with practice videotapes is required until the two raters achieve a kappa of at least 0.80. Good interrater reliability was demonstrated for each emotion. Convergent validity was established with family ratings of interest, joy, and anger; caregiver ratings of interest and joy; between observations of joy and premorbid attachment security; and between observations of sadness and premorbid hostility. However, the predicted relationship between premorbid hostility and expressions of anger, contempt and disgust were not confirmed. The communication value of facial expressions was validated, as no significant differences were found across cognitive stages of functioning for all emotions, except a significant reduction of joy in the last stage of cognitive decline.
The Modified Behavior Rating Scale (MBRS) was developed to directly observe several classes of emotional responses in individuals with severe dementia (Spaull, Leach, & Frampton, 1998). Raters record the amount of passive contentment, attending to the environment, happy mood, interest level, time spent showing interest, bodily movement, and interaction with others on the basis of video recordings. Thirty-second observations are typically made, following 30 seconds of recording, during a 10-mintue to 20-minute period. Significant interrater reliability was obtained for all responses. The MBRS was sensitive to changes in behavior before, during, and after a Snoezelen intervention (Chung & Lai, 2002). In contrast, comparisons of Dementia Care Mapping (DCM) (Bredin, Kitwood, & Wattis, 1995) scores before and after the intervention did not demonstrate a significant change in well-being.
The Non-communicative Patient’s Pain Assessment Instrument (NOPPAIN) (Horgas, Nichols, Schapson, & Vietes, 2007; Snow et al., 2004) is an observational measure of pain designed to be brief, to require little training, and to be administered by a nursing assistant. It includes intensity ratings of pain words, pain facial expressions, pain noises, bracing, rubbing, and restlessness. Pain is observed during a 10-minute period of rest and movement. Training includes six videotapes and practice observations to achieve rater agreement. Interrater reliability and test-retest reliability were established. Content validity was ensured. Convergent validity was established with other pain behavior ratings, and with self-reported pain only in cognitively intact individuals, not in cognitively impaired individuals. Construct validity was demonstrated through discrimination of videotapes demonstrating different levels of pain as determined by experts.
The Pain Assessment in Advanced Dementia (PAINAD) scale was developed as a clinically relevant and easy-to-use pain assessment tool for individuals with advanced dementia (Lane et al., 2003; Leong, Chong, & Gibson, 2006; Warden, Hurley, & Volicer, 2003). The scale rates five pain affect and behavior states on an intensity scale of 0 to 2, and it was later transposed to a categorical scale on the basis of its relationship with other pain measures. A 2-hour training curriculum for licensed practical nurses and nursing assistants is used. Observations are made on three preselected occasions during 5-minute intervals (no activity, pleasant activity, and caregiving activity). Interrater reliability was demonstrated using nonsignificant t tests to compare mean ratings among raters. Cronbach’s alpha coefficient was low, demonstrating poor internal consistency, which is not unusual for a 5-item scale. Content validity was ensured. Convergent validity was documented with the DS-DAT, visual analogue scales, and nurse ratings of pain. The PAINAD was able to discriminate between no activity, pleasant activity, and unpleasant activity, and demonstrated decreases in pain before and after receiving pain medication. Factor analysis confirmed that the PAINAD measures one single construct.
The Positive Response Schedule (PRS) for Severe Dementia was developed as an observational measure aimed at documenting affective well-being in individuals with severe dementia (Perrin, 1997). It is based on engagement theory (Godlove, Richard, & Rodwell, 1982; McFadyen, 1984) and Ekman’s theory of emotional expression (Ekman, Levenson, & Friesen, 1983). The presence of behavior is assessed by documenting only the first occurrence in a given observation. The percentage of observations in which a behavior was present is calculated. Observations are conducted in vivo or are videorecorded for a period of approximately 20 to 60 minutes, based on the length or hypothesized effect of the intervention to be evaluated. Observations are made during 20 to 30 seconds and are followed by 10-second to 30-second recording periods. Interobserver reliability was initially poor but improved after modifications. Convergent validity was not addressed, as no known psychological indexes thought to measure engagement existed at the time. Content validity is understood in the context of engagement theory and nonverbal expression. The PRS was found to be sensitive to changes following a Snoezelen intervention but was not statistically examined.
The Pain Assessment for the Dementing Elderly (PADE) is an observation measure of pain in individuals with advanced dementia (Villanueva, Smith, Erickson, Lee, & Singer, 2003). It is based on the assumption that nonverbal behavior can indicate the presence of pain that interferes with daily activities and can be reliably observed. The measure consists of 24 items rated on a Likert scale to assess physical signs, functional abilities, and caregiver judgment. Care providers require minimal training. Observations are made for 5 to 10 minutes during unspecified times or events over 10 nonconsecutive days. The tool demonstrated good internal consistency and interrater reliability and acceptable test-retest reliability. Content validity was established. Convergent validity was established for two of the three subscales of the CMAI, and construct validity was demonstrated through the measure’s ability to discriminate between individuals taking psychoactive medications and those not.
The purpose of the UAB Pain Behavior Scale is to offer a reliable observational pain scale designed for professionals to use on rounds and to be sensitive to small change (Richards et al., 1982). It was not developed for use with older adults but has been identified as relevant and potentially useful. According to the authors, “minimal” training is needed for accurate results. Raters record the frequency and/or intensity of pain behaviors. Observations are conducted for 5 minutes during periods of activity. The measure has good interrater and test-retest reliability. Content validity was drawn from clinical descriptions shared among pain centers, but the tool is based on the most salient items determined by the authors. The tool demonstrated good convergent validity with other pain measures at discharge from a pain treatment program but not before. Divergence from measures of well behavior was established.
ADLs are often measured using either self-report or caregiver report. Directly observing an individual’s ability to perform ADLs is beneficial to determine the length of time spent on tasks and to limit caregiver bias by directly assessing proficiency. Occupational therapists have a number of assessment measures that systematically assess ADLs by observing functioning using standardized instructions and tasks. This review will not include these measures but will focus on observation measures used to document functioning observed in the natural environment (Table 3).
Table 3: Characteristics and Psychometric Properties of Direct Observation Measures of Individuals’ Activities of Daily Living (ADLs)
The Beck Dressing Performance Scale (BDPS) is used in Strategies to Promote Independence in Dressing (SPID), which focuses on eliminating excess disability by supporting cognitively impaired individuals’ retained ability to dress (Beck, 1988; Beck, Heacock, et al., 1997; Vogelpohl, Beck, Heacock, & Mercer, 1996). The BDPS was designed to address the limitations caused by cognitive deficits related to dressing and the assistance most often needed by cognitively impaired individuals. The BDPS aims to rate the amount of support and time needed for each dressing component. Observations of dressing are videotaped for 4 to 5 minutes twice per week, for a total of 4 to 12 observations. Raters were trained until interobserver agreement was at or above r = 0.80. Content validity of the multiple steps involved and the hierarchy of kinds of assistance needed was determined through task analysis. No significant relationship was found with overall cognitive functioning (MMSE; Folstein et al., 1975; or Neurobehavioral Cognitive Status Examination [NCSE]; Kiernan, Mueller, Langston, & Van Dyke, 1987), but a significant negative relationship to language functioning (NCSE language) was found. The BDPS documented improvement in independence with the SPID intervention and a slight but significant increase in dressing time (of 1 minute), validating the construct underlying the measure.
The InfoAid system (Holmes, Teresi, Lindeman, & Glandon, 1997; Holmes, Teresi, Ramirez, & Goldman, 1997) was developed to monitor the amount of staff time spent on ADLs with residents with dementia. The system is designed to continuously record direct and indirect care services provided, by whom, for whom, and the duration by scanning bar coded sheets with a bar code reader or data wand. Caring for multiple residents simultaneously and multiple-person assists are accounted for. Bar code sweeping becomes part of the service-providing act and is thought to be less prone to the sampling biases associated with other techniques. Traditional assessment of reliability is difficult because the behaviors monitored are not expected to be consistent or stable. Assessment of interobserver reliability is needed but is complicated with considerations of invasion of privacy. Observation of personal care was significantly related to staff report of ADLs, but observation of mobility assistance, psychological support, medication administration, training, and assessment/planning was unrelated to ADL staff report. Duration of personal care was not related to age, gender, continence, speech, behavior, or mood. InfoAid was able to detect more provision of personal care in special care units than in general long-term care units.
The Observational List for early signs of Dementia (OLD) is used by general practitioners to assess early signs of dementia in the areas of forgetting, repeating, language, understanding, orientation to time and place, daily living, social withdrawing, confabulation, and dependence (Hopman-Rock, Tak, & Staats, 2001). The measure is completed by a general practitioner after observation of patients. Notably, interrater reliability was not assessed. Internal consistency was demonstrated. An expert panel created the list of behaviors observed, ensuring content validity. Convergent validity was established with a cognitive screen and informant rating scale and a verbal memory scale but was not found with informant ratings of ADLs and instrumental ADLs. Divergent validity was established through no relationship with Geriatric Depression Scale (GDS) (Yesavage et al., 1982–1983) scores, as predicted, because individuals with depression were excluded from the study. Construct validity was established through a factor analysis, yielding two factors. Incremental validity was established as the OLD identified patients who exhibited signs that were missed and those with no signs that were falsely identified based on the general practitioner’s opinion.
The Self-Feeding Assessment Tool (SFAT) was developed to distinguish small differences in function during the feeding process and differentiate ability to initiate action from ability to respond to command (Osborn & Marshall, 1993). Kind of assistance is ranked on a 5-point scale, including unassisted, verbal prompt, nonverbal prompt, physical guiding, and full assistance. Individuals are observed for an undetermined length of time both at the beginning and the end of two meals because level of assistance often changed during the course of a meal. Interrater reliability of a videotaped simulation demonstrated perfect agreement. Content validity was supported and compared with more detailed, standard ADL scales. Convergent validity was demonstrated between cognitive impairment and both feeding ability and performance. Construct validity was demonstrated by the ability of the SFAT to discriminate between self-feeding capability and actual performance in drinking liquids and eating pureed food. In addition, those who had been in the nursing home the longest had the lowest mealtime performance.
Quality of Life
Most of those who are interested in assessing quality of life (QOL) in individuals with dementia choose to combine an observational measure with caregiver report measures, interviews, chart data, and self-report when possible. Few have attempted to standardize such attempts at assessing as broad a construct as QOL using observational measures (Table 4). A number of measures focus on the overall milieu in an attempt to assess QOL, and these measures will be included in a future review article on the topic.
The Observing Quality of Life in Dementia (OQOL-D) scale was designed as a useful and practical measure of resident QOL to be used in a partnership between researchers and service providers to inform dementia care practice by evaluating the effects of interventions, tracking progress, and evaluating care plans (Edelman, 2005; Edelman & Fulton, 2004). Nurses, CNAs, and activity staff act as staff participants in the environment and observe residents over 4 to 5 consecutive days during six to eight selected times or events and rate their experience on a 7-point scale, ranging from extremely pleasant to extremely unpleasant. The length of the observation is not standard and can be adjusted to fit the goal of the project. Training is provided to reach a minimum of 80% rater agreement, which is reassessed every 6 months. Convergent validity has been demonstrated with DCM (Bredin et al., 1995), staff and resident Quality of Life in Alzheimer’s Disease (QOL-AD; Logsdon, Gibbons, McCurry, & Terri, 2000, 2002) ratings, depression, cognitive functioning, and dependent ADLs. The staff provided ongoing input and feedback about the ease of use of the instrument and its relevance to dementia-related quality of life, indicating content validity. The ratings are understood by service provider staff in relation to their practical and personal knowledge of the residents, and they respond appropriately.
The Psychological Well-Being in Cognitively Impaired Persons (PWB-CIP) scale contains 11 items that measure positive and negative affect states and engagement behaviors (Burgener & Twigg, 2002; Burgener, Twigg, & Popovich, 2005). It is a relatively narrow measure of QOL, focusing only on affect and behavior. The primary caregiver is the observer and completes the scale for 5 to 10 minutes after observing the individual over 24 hours. Internal consistency was good at both baseline and follow up. Stability in scores was demonstrated. Content validity was ensured through use of Lawton’s (1997) theoretical model of QOL, literature review, and national expert input, and was demonstrated with a content validity index. Convergent validity was demonstrated with measures of cognition, depression and personal distress, quality of relationship with caregiver, pleasant activities, productive behaviors, and personality at baseline and 18-month follow up. The construct of well-being as a distinct concept was verified by its stability in contrast with increased depression and functional disability over time. Factor analysis revealed two constructs: positive and negative well-being. Internal consistency was demonstrated for both subscales.
The QUALIDEM is a dementia-specific observational assessment of QOL (Ettema, Droes, de Lange, Mellenbergh, & Ribbe, 2007a, 2007b) that was developed from the authors’ experience of 40% of nursing home residents being unable to answer self-report questions. The measure is based on an adaptation-coping model and evaluates nine subscales of QOL: care relationship, positive affect, negative affect, restless tense behavior, self-image, social relations, social isolation, feeling at home, and having something to do. CNAs score the QUALIDEM for 15 minutes after a 2-week observational period. Training includes lecture and written instructions. Good internal consistency, interrater reliability, and test-retest reliability were established. Content validity was achieved. Construct validity was demonstrated through Mokken scale analysis to demonstrate unidimensionality of subscales, and moderate relationships between subscales support the concept of different but related QOL domains, with the exception of a negative relationship between “feeling at home” and “having something to do.” Convergent and divergent validity were established with predicted significant negative relationships between selected QUALIDEM subscales and nonsignificant relationships between other selected subscales and a behavioral rating scale, behavioral and psychological symptom questionnaire, and a depression rating (Cornell Scale for Depression in Dementia; Alexopoulos, Abrams, Young, & Shamoian, 1988). Positive relationships were demonstrated with nurse-rated QOL for most subscales and with family-rated QOL for some subscales, but the QUALIDEM did not correlate well with selfreport of QOL other than positive affect.