Research in Gerontological Nursing

Focus on Methods 

Biobehavioral Measures as Outcomes: A Cautionary Tale

Christine R. Kovach, PhD, RN, FAAN, FGSA; Diana Lynn Woods, PhD, RN, APRN-BC, FGSA; Elizabeth C. Devine, PhD, RN; Brent R. Logan, PhD; Hershel Raff, PhD


This article discusses the use of biobehavioral measures as outcomes for health care intervention studies. Effect size (ES) values for salivary cortisol and observation-based measures of pain and agitation were examined. Effects pre to post treatment were assessed separately for nursing home residents with and without acute psychotic symptoms. This study revealed large positive effects on both pain and agitation measures in the group with acute psychotic symptoms and small-to-medium positive effects on these same measures in the group without acute psychotic symptoms. In both of these groups, the ES values were not consistently positive on the cortisol measures. Prior to determining whether a measure can be used to estimate minimum clinically important differences, it is essential to consider if the biomarker will be responsive to therapy in the populations and contexts being studied.

[Res Gerontol Nurs. 2014; 7(2):56–65.]

Dr. Kovach is Professor, College of Nursing, and Dr. Devine is Professor Emeritus, University of Wisconsin-Milwaukee, Dr. Logan is Professor, Medical College of Wisconsin, and Dr. Raff is Professor of Medicine, Surgery, and Physiology, Medical College of Wisconsin, Endocrine Research Laboratory, Aurora St. Luke’s Medical Center, Aurora Research Foundation, Milwaukee, Wisconsin. Dr. Woods is Associate Professor, Azusa Pacific University School of Nursing, Azusa, California.

The authors have disclosed no potential conflicts of interest, financial or otherwise. This study was funded by the U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, National Institute of Nursing Research grants 5R01NR07765 and 1P20NR010674.

Dr. Kovach was not involved in the peer review or decision making process for this manuscript.

Address correspondence to Christine R. Kovach, PhD, RN, FAAN, FGSA, Professor, College of Nursing, University of Wisconsin-Milwaukee, 1921 E. Hartford Avenue, Milwaukee, WI 53201-0413; e-mail:

Received: May 15, 2013
Accepted: October 04, 2013
Posted Online: October 24, 2013


This article discusses the use of biobehavioral measures as outcomes for health care intervention studies. Effect size (ES) values for salivary cortisol and observation-based measures of pain and agitation were examined. Effects pre to post treatment were assessed separately for nursing home residents with and without acute psychotic symptoms. This study revealed large positive effects on both pain and agitation measures in the group with acute psychotic symptoms and small-to-medium positive effects on these same measures in the group without acute psychotic symptoms. In both of these groups, the ES values were not consistently positive on the cortisol measures. Prior to determining whether a measure can be used to estimate minimum clinically important differences, it is essential to consider if the biomarker will be responsive to therapy in the populations and contexts being studied.

[Res Gerontol Nurs. 2014; 7(2):56–65.]

Dr. Kovach is Professor, College of Nursing, and Dr. Devine is Professor Emeritus, University of Wisconsin-Milwaukee, Dr. Logan is Professor, Medical College of Wisconsin, and Dr. Raff is Professor of Medicine, Surgery, and Physiology, Medical College of Wisconsin, Endocrine Research Laboratory, Aurora St. Luke’s Medical Center, Aurora Research Foundation, Milwaukee, Wisconsin. Dr. Woods is Associate Professor, Azusa Pacific University School of Nursing, Azusa, California.

The authors have disclosed no potential conflicts of interest, financial or otherwise. This study was funded by the U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, National Institute of Nursing Research grants 5R01NR07765 and 1P20NR010674.

Dr. Kovach was not involved in the peer review or decision making process for this manuscript.

Address correspondence to Christine R. Kovach, PhD, RN, FAAN, FGSA, Professor, College of Nursing, University of Wisconsin-Milwaukee, 1921 E. Hartford Avenue, Milwaukee, WI 53201-0413; e-mail:

Received: May 15, 2013
Accepted: October 04, 2013
Posted Online: October 24, 2013

In recent years, there has been increased emphasis on including biological measures in health care research and in evaluating patient outcomes. The National Institute of Nursing Research (2011) has emphasized the need for more biobehavioral research, and the Institute of Medicine (IOM, 2010) released a report discussing the issues associated with using biological measures as surrogate endpoints for patient outcomes. Biobehavioral research is grounded in the premise that understanding and influencing health outcomes is enhanced through examining the links between biological, psychosocial, and behavioral factors and health status/outcomes (Pellmar, Brandt, & Baird, 2002). This is a worthy goal; however, the success of biobehavioral science in making new inroads that influence health outcomes is dependent on understanding and attending to the complexities of a host of specialized conceptual and methodological challenges associated with using biological measures in research.

As noted in the IOM report cited above, consistent scientific processes and frameworks are needed to ensure the rigorous and transparent use of biological measures. In addition to possibly being invasive and costly, given the state of the science, the selection and interpretation of biological measures must be done with caution. For some biological measures, we are lacking strong evidence on the natural history of variability over time and on predictors of variability in the measure. There is a need to build a strong research base on the ability of biological measures to capture meaningful change and to determine minimum clinically important difference (MCID). The purpose of this article is to explore some issues regarding the use of biological measures as outcomes in nursing studies. Data from a study testing the effects of a nurse assessment and treatment protocol used to treat nursing home (NH) residents with dementia are presented (Kovach, Logan, et al., 2006). The purpose of these results is not to look for treatment effects; rather, it is to highlight the specific methodological issue of measurement responsiveness to therapy. Effect sizes (ES) for salivary cortisol and observer-rated pain and agitation are presented for two groups of NH residents who were being assessed and treated for new problems over 6 weeks. For this secondary analysis, we compared NH residents who developed new acute psychotic symptoms to NH residents who developed other new problems (e.g., arthritic pain, constipation, infection). Because acute psychosis is associated with high stress, these two groups provided an opportunity to examine outcomes following treatment in two different populations.

Minimum Clinically Important Difference and Responsiveness to Change

The determination of MCID is needed for evaluating the clinical meaningfulness of statistically significant research findings. MCID is defined as an estimation of the smallest difference in a measurable clinical parameter that indicates a meaningful change in a health care outcome (Kiley, Sri Ram, Croxton, & Weinmann, 2005). The development and use of standardized MCIDs will increase the evidence base for nursing practice, assist in making decisions regarding resource allocation, and allow clearer interpretation of research findings.

While MCIDs have value, there are notable challenges to developing metrics for them (Gilbert, Brown, Cappelleri, Carlsson, & McKenna, 2009; Jordon, Dunn, Lewis, & Croft, 2006). Even with fairly well-accepted biological measures, it is possible that as science develops we will learn that a measure may be able to detect change under some conditions but not others. Therefore, it is important to consider theory, previous research, and the expected change from treatment when deciding if it is reasonable to select a particular biological measure for a specific study.

An essential quality of the measure used to develop the MCID is the ability to accurately detect health change in response to a specific treatment or stimuli. The measure should validly represent the intended construct. While determining this is not trivial, it can be particularly challenging for biological measures, which may have well established measurement validity for the biological phenomenon being measured (e.g., it validly measures the serum level of a hormone) but which may or may not have well-established measurement validity for the intended outcome construct of the study (e.g., stress). In addition, the outcome measure used should remain relatively unchanged if the treatment is not administered (i.e., it must have test-retest reliability), there must be room for change on the measure (e.g., it should not have a floor or ceiling effect), and the timing of measurement must be appropriate for the onset and duration of the intervention’s effect.

Once a measure is deemed reasonable, pilot work can estimate the direction and magnitude of changes on the measure following the proposed treatment or stimuli condition. This estimate can be used in ad hoc power analysis and to determine whether the estimated amount of change is clinically significant.

A statistically significant change in group scores on a biomarker following treatment does not necessarily indicate an important change in health or quality of life. For a biomarker to qualify as a MCID, the change in score needs to relate to other measures (Jordon et al., 2006). For example, in people with dementia, these might include family report of improvement or changes in psychotropic drug use. The change in the biomarker should also not have a negative consequence on the overall health status of the individual. For example, ezetimibe (Zetia®) significantly lowers cholesterol but is related to severe hepatic side effects (Stolk, Becx, Kuypers, & Seldenrijk, 2006).

Background to Study of People with Advanced Dementia

The Serial Trial Intervention (STI) is a decision-support tool to address the problem of underassessment and under treatment of newly emerging problems in NH residents with advanced dementia who are unable to clearly or consistently report symptoms verbally (Kovach, Noonan, Schlidt, Reynolds, & Wells, 2006). The estimates of treatment effect used in this study are based on data from a parent study that compared the effects of two versions of a nursing assessment and treatment protocol (5-step and 9-step STI) on stress, pain, and agitation among NH residents with advanced dementia (Kovach et al., 2012).

The 5-step version of the STI was tested for the outcomes of pain and agitation and found to be effective when compared with standard care (Kovach, Kelber, Simpson, & Wells, 2006). However, it was also found that thoughtful follow through was oft en lacking, as evidenced by failure to get effective treatments scheduled for regular use and failure to add adjunctive and preventive therapies to the therapeutic regimen (Kovach, Cashin, & Sauer, 2006). The 9-step version attempted to address these deficits by adding steps that directed nurses to provide more long-term and more comprehensive treatments and to perform more thorough evaluations (Kovach et al., 2012).

The physical condition and functional limitations of people with advanced dementia pose challenges to measurement of variables of interest in nursing research. People with advanced dementia have communication and cognitive deficits that preclude the use of most self-report measures. Proxy reports from family and professional caregivers on measures such as pain are highly biased. For example, multiple studies have found that family and professional caregivers grossly underestimate pain and depression in people with dementia (Cohen-Mansfield & Lipson, 2002; Horgas & Dunn, 2001). Also, invasive measures or those that involve application of sensors to the body may induce a response that confounds the measurement (Aung et al., 2008).

The use of physiological variables to measure indices of stress and depression has been driven, in part, by technological advances that enable the noninvasive measurement of some biological measures. Cortisol can be easily accessed by saliva collection that can be accomplished in everyday contexts with minimal interruption to the natural flow of activities. This is an important consideration for debilitated NH residents. Several studies have demonstrated a strong correlation (r = 0.805) between time-matched salivary and plasma cortisol (Hellhammer, Wüst, & Kudielka, 2009; Kirschbaum & Hellhammer, 1994; Pruessner et al., 1997). Saliva collection has been shown to be well tolerated by NH residents (Woods et al., 2008). The acceptability of saliva collection is important so cortisol levels are not confounded by an induced stress response from sample collection. These samples are also stable for up to 2 weeks at room temperature (Malamud, 1992), thus eliminating the need for immediate freezing.

Key conceptual issues in attempting to incorporate cortisol as an outcome measure in intervention studies include justifications for (a) cortisol as a marker of physical and emotional stress; (b) the ability of people with dementia to experience stress from a threat or challenge; and (c) the relationship of treatment of the threat or challenge to lower stress. While salivary cortisol has been used as a measure in psychophysiological research for the past 20 years (Kirschbaum & Hellhammer, 1989), this measure has begun to be used more recently in nursing and dementia care research (Hanrahan, McCarthy, Kleiber, Lutgendorf, & Tsalikian, 2006; Herrington, Olomu, & Geller, 2004; Williams, Hagerty, & Brooks, 2004; Woods & Dimond, 2002; Woods & Martin, 2007). Salivary cortisol is frequently used as a biomarker of psychological stress (Piazza, Almeida, Dmitrieva, & Klein, 2010), and measurement of cortisol in saliva provides a reliable marker of stress resulting from both physical discomfort and emotional stress (Tsigos & Chrousos, 2011; Weibel, 2003; Young, Abelson, & Lightman, 2004).

Stress is defined as a physiological response of the body to situations or stimuli that are perceived as dangerous or threatening. People with dementia have a decreased threshold for tolerating such stressors (Hall & Buckwalter, 1987; Lawton, 1986). Multiple studies have shown an association between stress, emotional arousal, and salivary cortisol levels (Blair et al., 2008; El-Sheikh, Erath, Buckhalt Granger, & Mize, 2008; Fortunato, Dribin, Granger, & Buss, 2008). Age can also influence cortisol values. The most consistent finding with advanced age is an increase in the nighttime nadir, which flattens the slope (Fiocco, Wan, Weekes, Pim, & Lupien, 2006; Ice, 2005; Raff et al., 1999; Smyth et al., 1997). Treatment of physical and emotional stress is associated with lowering of cortisol level (Carlson, Speca, Patel, & Goodey, 2004; Gaab, Blättler, Menzi, Pabst, Stoyer, & Ehlert, 2003; Woods & Dimond, 2002).


Design and Sample

Study participants for this secondary analysis were from 12 NHs, all of which had a typical schedule of early awakening (5:00 a.m. to 8:00 a.m.) and early bedtime (5:40 p.m. to 8:00 p.m.). The inclusion criteria from the parent study were that the person had a diagnosis of a dementing illness, had no other chronic psychiatric diagnosis, had no known diseases of the hypothalamic-pituitary-adrenal (HPA) axis, was not taking a corticosteroid agent, had a pre-study length of stay in the NH of at least 4 weeks, and had no diagnosed acute illness at pretesting. Participants from the two study conditions were combined because the direction and magnitude of effects for the 2-week time period examined in the data used for this study were similar across the two versions of the protocol.

The study was approved by the Institutional Review Board. Written consent was obtained from the NH resident’s guardian, and verbal assent was obtained from each resident asked to participate. While all participants provided assent, five participants clearly refused or expressed displeasure regarding saliva collection. Samples were not collected when a participant refused or expressed any sign of displeasure regarding the data collection.

It is worth noting that as is typical for NH residents, many patients were being treated for common conditions that included infection (e.g., of the skin, urinary tract, respiratory tract, or gastrointestinal tract), pain (e.g., musculoskeletal or neuropathic), mobility or body alignment problems, skin breakdown, constipation, leg swelling from venous insufficiency, impacted cerumen, poor dentition, and dysphagia. For the current study, two samples were identified: participants being treated for new acute problems without acute psychosis (n = 95) and participants being treated for new acute psychotic symptoms (n = 16).

Thirty-one NH residents from the parent study were excluded in this study: 10 were in the dying process, 9 were hospitalized for acute illness (e.g. stroke, bradycardia, heart failure, unresponsive episode), 6 had saliva containing exogenous hydrocortisone documented by liquid chromatography/tandem mass spectrometry (Raff & Singh, 2012), and 6 had missing saliva samples due to collection refusal (n =5) or quantity not sufficient (n =1).

Measures and Procedures

Salivary cortisol was used to measure HPA axis response to stress (Raff, 2000). The Wisconsin Agitation Intensity (WAI) Scale (Kovach et al., 2004), used to measure agitated behavior, is an observational scale that uses number, duration, and intensity of behaviors as parameters for agitation severity. Possible scores range from 0 to 100. Interrater reliability using this tool is 0.95 to 0.98, and the scale demonstrated responsiveness to change following intervention (Kovach, Noonan, et al., 2006). Pain was assessed using the 9-item Discomfort-DAT scale (Hurley, Volicer, Hanrahan, Houde, & Volicer, 1992). Possible scores range from 0 to 75. This is an observational tool that assesses overall level of discomfort rather than pain in response to acute stimuli. Internal consistency alpha coefficients between 0.86 and 0.89 (Hurley et al., 1992) and interrater reliability of 0.9 (Kovach, Logan, et al., 2006) have been reported. In the latter study, which was a randomized experiment, the tool demonstrated responsiveness to change following intervention.

Saliva and observation-based measures of agitation and pain were collected on the same weekday, both before the nurses began using the assessment and treatment protocol and 2 weeks after treatment was initiated. Three nurse research assistants, not employed by the NHs, were trained by the principal investigator to screen for eligibility, to conduct unobtrusive observations of agitation and pain, and to collect the saliva samples. Data collectors were blinded to study conditions.

Measures of pain and agitation were collected at least 30 minutes past the time of any potentially discomfort or stress-producing event (e.g., bath, medical examination). Since agitation typically fluctuates during the day, multiple measurements at various time points are recommended to provide a reasonable measure of overall agitation (Kovach et al., 2004). Eight agitation measures were taken per day (two during breakfast, midmorning, before and after dinner). To capture possible differences in overall level of discomfort in the morning and afternoon, two Discomfort-DAT measures were taken per day. To derive an overall assessment of daily pain and agitation, scores were averaged to yield one pretest and one posttest score on each measure for each study participant. Evidence supports the use of a similar physiological stimulus such as food intake during regular mealtimes to stimulate the HPA axis and control for some intra-individual variation in diurnal rhythm (Gibson et al., 1999; Rosmand, Holm, & Björntorp, 2000). It is also recommended that saliva for cortisol assay be obtained close to awakening and bedtime to capture diurnal variation in cortisol levels (Stone et al., 2001). Because many NH residents retire to bed early in the evening, 45 minutes after dinner was a feasible data collection time that is close to nadir. Four saliva samples were collected (±15 minutes) from under the tongue: 30 minutes after waking (Time 1 waking), 45 minutes after breakfast (Time 2 morning), and 45 minutes before (Time 3 afternoon) and after dinner (Time 4 evening). Following collection, samples were inserted into a cryogenic vial or storage tube, centrifuged, batched by subject, and frozen at −20° C.

Saliva samples were analyzed using a cortisol enzyme-linked immunosorbent assay cleared by the U.S. Food and Drug Administration (Salimetrics, LLC, State College, PA; Raff, Homar, & Skoner, 2003). Samples from each NH resident were assayed in one batch. The intraassay imprecision, expressed as a coefficient of variation (CV) was 5.2% at 3.1 (SD = 0.2) nmol/L (n = 10) and 2.6% at 10.4 (SD = 0.3) nmol/L (n = 10). Intrassay imprecision is the analytical error for samples analyzed within a single run of the assay. In the current study, samples from each resident were analyzed within one run to minimize interassay imprecision described below. It is typical of immunoassays to have higher CVs at the lowest concentrations of the analyte measured. It is important to note that an intraassay CV of <6% is excellent and provides highly reliable, objective results. Interassay (total) imprecision (CV) was 11% at 2.8 (SD = 0.3) nmol/L (n = 10), 11% at 10.1 (SD = 1.1) nmol/L (n = 10), and 6.9% at 25.0 (SD = 1.7) nmol/L (n = 10) (Raff et al., 2003). The interassay imprecision accounts for the variability from run to run of the assay. Note, in this case, that the interassay imprecision was ≤11%, which is also excellent for an immunoassay of this type. The measurement of salivary cortisol was done objectively, as the assay technician was blinded to the sample identifier. There are several pre-analytic and analytic factors that can confound the measurement of salivary cortisol (Raff, 2000, 2004, 2012, 2013). They include intersubject biological variability, incorrect sampling time, contamination of the saliva sample with topical hydrocortisone as described elsewhere in this article, concomitant acute stress unrelated to the study design, and improper handling of the sample. These can all potentially contribute to variability over and above the inherent analytic variability of the assay itself.

The effect size (ES) value used in this study is based on Cohen’s (1969) statistic delta (the standardized mean difference). For statistic delta, Cohen defines ES values as small (0.20), medium (0.50), and large (0.80). To facilitate the discussion of ES values, in this report, ES values will be referred to as negligible (≤0.19), small (0.20 to 0.30), small to medium (0.31 to 0.49), medium (0.50 to 0.79), and large (0.80 or larger).

Data Analysis

Cortisol measures used for analysis included waking (Time 1), morning (Time 2), afternoon (Time 3), and evening (Time 4). Slope was calculated using least squares curve to the four data points for each participant (Adam & Kumari, 2009), and area under the curve (AUC) was calculated with respect to ground using the trapezoidal rule and the four data points for each participant (Fekedulegn et al., 2007). Prior to calculating ES values, log transformations were completed for cortisol levels, Time 1 to Time 4, to normalize the data for linear analysis. Pain and agitation scores were normally distributed. ES values were calculated for each variable by dividing the mean difference scores (i.e., the posttest mean minus pretest mean) by the pooled, within-group standard deviation. The order of subtraction was determined to yield a positive value when, on average, there was less of the attribute after the intervention (e.g., less stress, anxiety, or pain) and a negative value when the reverse was true. Two-sample t tests were used to compare the mean difference scores between residents with and without acute psychotic symptoms.

Nine residents began to receive a benzodiazepine medication between the pre- and posttest measure. Benzodiazepines suppress the basal and stress-related activation of the HPA system (Grottoli et al., 2002). Because ES values for the five residents in the group with acute psychotic symptoms and four in the group without acute psychotic symptoms were substantially different than for those who did not receive a benzodiazepine, these nine residents were dropped from the study. Using mixed-model regression analysis (Davis, 2002), the possible confounding effect of other factors (including Mini Mental State Examination score [Folstein, Folstein, & McHugh, 1975], wake time, use of narcotic agents, and use of 5- or 9-step STI) were examined, and no interactions with time were found, indicating that none of these variables significantly affected the pre-post ES.


A total of 102 NH residents were included in the final analysis. Eleven of the 102 were being treated for acute psychotic symptoms. Residents in both the groups with and without acute psychotic symptoms had typical NH demographics: They were women (78% and 83%), very old (mean age = 86, SD = 7.7 years and mean age = 89, SD = 5.0 years), and had severe dementia (median MMSE score 5 and 10), respectively.

Agitation and pain were the two behavioral measures used in this study. For both of these behavioral measures, the effect of the treatment was large for the group with acute psychotic symptoms (ES = 1.58 for agitation and 1.89 for pain), whereas the effect was small to medium for the group without acute psychotic symptoms (ES = 0.33 for agitation and 0.42 for pain) (Table). The difference in the magnitude of the treatment effect between those with and without acute psychotic symptoms is noteworthy (e.g., 1.58 versus 0.33 on agitation), and the mean differences are statistically significant (p = 0.001 for agitation and p = 0.005 for pain). In contrast, for the six cortisol measures, the treatment effect was variable, with a high ES value of 0.95 and a low ES value of −0.19 for those with acute psychotic symptoms, and a high ES value of 0.31 and a low ES value of −0.14 for those without acute psychotic symptoms. Only the mean change in waking cortisol (Time 1) was statistically significantly different between those with and without acute psychotic symptoms (p = 0.022).

Mean Differences, Standard Deviations, and Effect Sizes for Study Variables


Mean Differences, Standard Deviations, and Effect Sizes for Study Variables

For AUC, a global measure of the cortisol response (Pruessner, Kirschbaum, Meinlschmid, & Hellhammer, 2003; Stewart & Seeman, 2000), the effect of the treatment was large in those with acute psychotic symptoms (ES = 0.95) but negligible in those without acute psychotic symptoms (ES = 0.07). For the group with acute psychotic symptoms, there was a small-to-medium ES in waking cortisol, which decreased following treatment (ES = 0.46). This dampening of the waking cortisol flattened the slope (ES = −0.19). For those without acute psychotic symptoms, the treatment yielded a negligible increase in waking cortisol level (ES = −0.14), and a small decrease in evening cortisol level (ES = 0.21). These changes increased the slope, yielding an ES for slope that is small to medium (ES = 0.31).


This study revealed large positive effects on both observational measures in the group with acute psychotic symptoms and small-to-medium positive effects on these same measures in the group without acute psychotic symptoms. In both of these groups, the ES values were not consistently positive on the cortisol measures. The mean differences in the change from pre to post are statistically significantly different between those with and without acute psychosis on agitation, pain, and waking cortisol level. These findings underscore questions about the complexity inherent in using hormonal measures as outcome variables for any intervention. Especially for older adults, between-individual heterogeneity is high, resulting in a decreased ability to determine intervention ES with smaller samples (Smyth et al., 1997; Stone et al., 2001). The higher ES values for the group with acute psychotic symptoms could be explained, in part, by (a) there being greater room for change in a group likely to be more stressed due to their acute psychosis, (b) statistical regression toward the mean, or (c) error as a result of selection bias from a small sample. Statistical regression toward the mean refers to the tendency for higher scores to have a more positive random error that can push the score up at the pretest, and this random error tends to be less extreme on later measures (Shadish, Cook, & Campbell, 2002). The difference also could be due to a floor effect. Participants without acute psychotic symptoms could have habituated to the stress of their chronic conditions and thus have less room for change on a measure of physiological stress.

Measurement error could have influenced ES values, although consistent procedures were used for observations and to collect saliva samples in the same manner and from the same location in the mouth. It is noteworthy that measures were taken only during the daytime, and nighttime treatment response (e.g., decreased arthritic pain, hallucinations at night) may have been more sensitive to cortisol changes in the daytime. Cortisol levels can also be affected by multiple non-HPA axis factors. In this study, we found and controlled for the confounding effect of benzodiazepine use. We tested and did not find that cognitive examination score, wake time, use of narcotic agents, and use of 5- or 9-step STI were confounding influences. It is possible that other factors in the NH or within the individual, such as environmental stressors, caregiver demeanor, or other medications, could have acted as confounding variables and influenced the ES values obtained.

The current study found a large ES using AUC in those with acute psychotic symptoms compared with those without acute psychotic symptoms. Interpreting these results is challenging, given that AUC is a measure of total cortisol secreted throughout the day.

The clinical significance of the ES values for the differences in slopes must be interpreted with caution. While it is noteworthy that the slopes for the groups with and without acute psychotic symptoms differ in direction, it is essential to note that the mean differences in slopes are minimal and the standard deviations are extremely small (Table). When standard deviations are very small, relatively modest mean differences can yield large ES values. Cortisol slope has been used most extensively to ascertain the effect of an intervention on stress (Sephton, Sapolsky, Kraemer, & Spiegel, 2000); however, the calculation of slope may not provide the best measure when few points are used. Ice (2005) examined factors associated with cortisol levels measured eight times per day starting at 8:00 a.m. and every 2 hours until 10:00 p.m. Factors such as positive affect, positive mood state, age, and time of day were all significantly associated with cortisol level. These factors can have a major influence on slope calculations, given that the slope is frequently determined using only two or three daily cortisol levels. Increased age is frequently associated with a flatter slope and an overall higher mean cortisol, thought to be associated with a higher evening cortisol level (Magri et al., 2006; Wilkinson et al., 2001), although Giubilei et al. (2001) demonstrated elevated levels of both morning and evening cortisol in people with Alzheimer’s disease compared with controls. The small difference in slope exhibited in the current study may result from the advanced age of participants, decreased awakening cortisol levels, or increased evening (Time 4) cortisol levels.

As one considers the similarity and differences in ES values for NH residents with or without symptoms of acute psychosis, there is yet another factor worth noting. Contrary to conventional wisdom, estimates of ES values from extremely small studies are likely to yield larger effect than those from studies with larger samples. This has been demonstrated by theoretical mathematical analysis (Hedges, 1981) and is described as the effect of sample bias by Hedges and Olkin (1985) who advocated correcting for the effect of sample bias when one tests the homogeneity of ES values across a series of studies.

Currently, it is unclear which cortisol variables are best used under which circumstances. It is also unclear which specific cortisol variables should be targeted for any given treatment. Lupien, McEwen, Gunnar, and Hiem (2009) suggested that future studies must pay attention to altered cortisol patterns and pattern consistency. More recently, measures such as group-based trajectory methods, a form of mixture modeling, have been used to identify and describe the distinct trajectories or patterns of change that exist within a population. Each unique trajectory is assumed to belong to a group, with the members of each group following a given response pattern. Although originally introduced to examine patterns of change over many years (Lacourse, Nagin, Tremblay, Vitaro, & Claes, 2003), these procedures can also be applied in cortisol analysis (Van Ryzin, Chatham, Kryzer, Kertes, & Gunnar, 2009) in which change is examined over the course of a day. This method, designed for heterogenous populations, can identify atypical patterns over time and thus may provide insight into characteristics that may be modified by a specific treatment.

In this study, cortisol was measured only in individuals with dementia residing in a NH before and after a nursing assessment and treatment protocol. To build on this, other research is needed to examine cortisol variables and attributions of meaning of findings in at least five situations for this and other populations: (a) stable description of groups or subgroups, (b) description of response to acute stressors, (c) description of altered diurnal cortisol rhythms as indicators of HPA axis integrity and chronic stress, (d) response to treatment for an acute stressor, and (e) response to treatment for stressors that are more chronic or habituated. There is also a need for data supporting the impact of changes in cortisol and other biomarkers on actual health outcomes.

Conclusion and Recommendations

Incorporating biobehavioral measures into nursing intervention studies offers important opportunities for understanding the complex mechanisms that interact to influence health behavior, health outcomes, and the effectiveness of health care interventions. Enthusiasm for the use of these measures must be accompanied by appropriate cautionary effort to maximize the match between the biomarker, the target concept, and the study design, as well as to minimize potential sources of error variance. Given the lack of knowledge at this point about many issues that surround the use of biobehavioral measures in nursing intervention studies, our recommendation is that pilot studies provide needed data on procedural issues, possible confounding variables, responsiveness to change, and within- and between-subject variance when the intervention is not occurring. It is also important for future studies to comprehensively record and report measurement procedures to assess the quality of the measurement tool in capturing responsiveness to change under specific conditions and conduct analysis of the influence of multiple potential confounding variables. This article adds to an emerging literature that documents an array of issues that must be carefully considered when using biobehavioral measures. The use of biobehavioral measures without attention to whether the biomarker will be responsive to therapy in the populations and contexts being studied will compromise the accumulation of knowledge regarding the effectiveness of health care interventions on health outcomes.


  • Adam, E.K. & Kumari, M. (2009). Assessing salivary cortisol in large-scale, epidemiological research. Psychoneuroendocrinology, 34, 1423–1436. doi:10.1016/j.psyneuen.2009.06.011 [CrossRef]
  • Aung, A.P.W., Fook, F.S., Jayachandran, M., Song, Z., Biswas, J., Nugent, C. & Yap, L. (2008, July). Smart wireless continence management system for elderly with dementia. In Proceedings of the IEEE 10th International Conference on e-Health Networking, Application & Services, HealthCom, pp. 33–34. doi:10.1109/HEALTH.2008.4600105 [CrossRef]
  • Blair, C., Granger, D.A., Kivlighan, K.T., Mills-Koonce, R., Willoughby, M., Greenberg, M.T. & Fortunato, C.K. (2008). Maternal and child contributions to cortisol response to emotional arousal in young children from low-income, rural communities. Developmental Psychology, 44, 1095–1109 doi:10.1037/0012-1649.44.4.1095 [CrossRef]
  • Carlson, L.E., Speca, M., Patel, K.D. & Goodey, E. (2004). Mindfulness-based stress reduction in relation to quality of life, mood, symptoms of stress and levels of cortisol, dehydroepiandrosterone sulfate (DHEAS) and melatonin in breast and prostate cancer outpatients. Psychoneuroendocrinology29, 448–474. doi:10.1016/S0306-4530(03)00054-4 [CrossRef]
  • Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press.
  • Cohen-Mansfield, J. & Lipson, S. (2002). Pain in cognitively impaired NH residents: How well are physicians diagnosing it?Journal of the American Geriatrics Society, 50, 1039–1044. doi:10.1046/j.1532-5415.2002.50258.x [CrossRef]
  • Davis, C.S. (2002). Statistical methods for the analysis of repeated measurements. New York: Springer-Verlag.
  • El-Sheikh, M., Erath, S.A., Buckhalt, J.A., Granger, D.A. & Mize, J. (2008). Cortisol and children’s adjustment: The moderating role of sympathetic nervous system activity. Journal of Abnormal Child Psychology, 36, 601–611 doi:10.1007/s10802-007-9204-6 [CrossRef]
  • Fekedulegn, D.B., Andrew, M.E., Burchfiel, C.M., Violanti, J.M., Hartley, T.A., Charles, L.E. & Miller, D.B. (2007). Area under the curve and other summary indicators of repeated waking cortisol measurements. Psychosomatic Medicine, 69, 651–659. doi:10.1097/PSY.0b013e31814c405c [CrossRef]
  • Fiocco, A.J., Wan, N., Weekes, N., Pim, H. & Lupien, S.J. (2006). Diurnal cycle of salivary cortisol in older adult men and women with subjective complaints of memory deficits and/or depressive symptoms: Relation to cognitive functioning. Stress, 9, 143–152. doi:10.1080/10253890600965674 [CrossRef]
  • Folstein, M.F., Folstein, S.E. & McHugh, P.R. (1975). “Mini-mental state.” A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12, 189–198. doi:10.1016/0022-3956(75)90026-6 [CrossRef]
  • Fortunato, C.K., Dribin, A.E., Granger, D.A. & Buss, K.A. (2008). Salivary alpha-amylase and cortisol in toddlers: Differential relations to affective behavior. Developmental Psychobiology, 50, 807–818 doi:10.1002/dev.20326 [CrossRef]
  • Gaab, J., Blättler, N., Menzi, T., Pabst, B., Stoyer, S. & Ehlert, U. (2003). Randomized controlled evaluation of the effects of cognitive-behavioral stress management on cortisol responses to acute stress in healthy subjects. Psychoneuroendocrinology, 28, 767–779. doi:10.1016/S0306-4530(02)00069-0 [CrossRef]
  • Gibson, E.L., Checkley, S., Papadopoulos, A., Poon, L., Daley, S. & Wardle, J. (1999). Increased salivary cortisol reliably induced by a protein-rich midday meal. Psychosomatic Medicine, 61, 214–224.
  • Gilbert, C., Brown, M.C.J., Cappelleri, J.C., Carlsson, M. & McKenna, S.P. (2009). Estimating a minimally important difference in pulmonary arterial hypertension following treatment with sildenafil. Chest, 135, 137–142 doi:10.1378/chest.07-0275 [CrossRef]
  • Giubilei, F., Patacchioli, F.R., Antonini, G., Sepe Monti, M., Tisei, P., Bastianello, S. & Angelucci, L. (2001). Altered circadian cortisol secretion in Alzheimer’s disease: Clinical and neuroradiological aspects. Journal of Neuroscience Research, 66, 262–265. doi:10.1002/jnr.1219 [CrossRef]
  • Grottoli, S., Giordano, R., Maccagno, B., Pellegrino, M., Ghigo, E. & Arvat, E. (2002). The stimulatory effect of canrenoate, a miner-alocorticoid antagonist, on the activity of the hypothalamus-pituitary-adrenal axis is abolished by alprazolam, a benzodiazepine, in humans. The Journal of Clinical Endocrinology & Metabolism, 87, 4616–4620. doi:10.1210/jc.2002-020331 [CrossRef]
  • Hall, G.R. & Buckwalter, K.C. (1987). Progressively lowered stress threshold: A conceptual model for care of adults with Alzheimer’s disease. Archives of Psychiatric Nursing, 1, 399–406.
  • Hanrahan, K., McCarthy, A.M., Kleiber, C., Lutgendorf, S. & Tsalikian, E. (2006). Strategies for salivary cortisol collection and analysis in research with children. Applied Nursing Research, 19, 95–101. doi:10.1016/j.apnr.2006.02.001 [CrossRef]
  • Hedges, L.V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107–128. doi:10.2307/1164588 [CrossRef]
  • Hedges, L.V. & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
  • Hellhammer, D.H., Wüst, S. & Kudielka, B.M. (2009). Salivary cortisol as a biomarker in stress research. Psychoneuroendocrinology, 34, 163–171 doi:10.1016/j.psyneuen.2008.10.026 [CrossRef]
  • Herrington, C.J., Olomu, I.N. & Geller, S.M. (2004). Salivary cortisol as indicators of pain in preterm infants: A pilot study. Clinical Nursing Research, 13, 53–68. doi:10.1177/1054773803259665 [CrossRef]
  • Horgas, A.L. & Dunn, K. (2001). Pain in nursing home residents. Comparison of residents’ self-report and nursing assistants’ perceptions. Journal of Gerontological Nursing, 27(3), 44–53.
  • Hurley, A.C., Volicer, B.J., Hanrahan, P.A., Houde, S. & Volicer, L. (1992). Assessment of discomfort in advanced Alzheimer patients. Research in Nursing & Health, 15, 369–377. doi:10.1002/nur.4770150506 [CrossRef]
  • Ice, G.H. (2005). Factors influencing cortisol level and slope among community dwelling older adults in Minnesota. Journal of Cross-Cultural Gerontology, 20, 91–108. doi:10.1007/s10823-005-9085-5 [CrossRef]
  • Institute of Medicine. (2010). Evaluation of biomarkers and surrogate endpoints in chronic disease. Retrieved from the National Academies Press website:
  • Jordon, K., Dunn, K.M., Lewis, M. & Croft, P. (2006). A minimal clinically important difference was derived for the Roland-Morris Disability Questionnaire for low back pain. Journal of Clinical Epidemiology, 59, 45–52. doi:10.1016/j.jclinepi.2005.03.018 [CrossRef]
  • Kiley, J.P., Sri Ram, J., Croxton, T.L. & Weinmann, G.G. (2005). Challenges associated with estimating minimal clinically important differences in COPD—The NHLBI perspective. COPD, 2, 43–46. doi:10.1081/COPD-200050649 [CrossRef]
  • Kirschbaum, C. & Hellhammer, D.H. (1989). Salivary cortisol in psychobiological research: An overview. Neuropsychobiology, 22, 150–169. doi:10.1159/000118611 [CrossRef]
  • Kirschbaum, C. & Hellhammer, D.H. (1994). Salivary cortisol in psychoneuroendocrine research: Recent developments and applications. Psychoneuroendocrinology19, 313–333. doi:10.1016/0306-4530(94)90013-2 [CrossRef]
  • Kovach, C.R., Cashin, J.R. & Sauer, L. (2006). Deconstruction of a complex tailored intervention to assess and treat discomfort of people with advanced dementia. Journal of Advanced Nursing, 55, 678–688. doi:10.1111/j.1365-2648.2006.03968.x [CrossRef]
  • Kovach, C.R., Kelber, S.T, Simpson, M. & Wells, T. (2006). Behaviors of nursing home residents with dementia: Examining nurse responses. Journal of Gerontological Nursing, 32(6), 13–21.
  • Kovach, C.R., Logan, B.R., Noonan, P.E., Schlidt, A.M., Smerz, J., Simpson, M. & Wells, T. (2006). Effects of the serial trial intervention on discomfort and behavior of nursing home residents with dementia. American Journal of Alzheimer’s Disease and Other Dementias, 21, 147–155. doi:10.1177/1533317506288949 [CrossRef]
  • Kovach, C.R., Noonan, P.E., Schlidt, A.M., Reynolds, S. & Wells, T. (2006). The serial trial intervention: An innovative approach to meeting needs of people with dementia. Journal of Gerontological Nursing, 32(4), 18–27.
  • Kovach, C.R., Simpson, M.R., Joosse, L., Logan, B.R., Noonan, P.E., Reynolds, S.A. & Raff, H. (2012). Comparison of the effectiveness of two protocols for treating nursing home residents with advanced dementia. Research in Gerontological Nursing, 5, 251–263 doi:10.3928/19404921-20120906-01 [CrossRef]
  • Kovach, C.R., Taneli, Y., Dohearty, P., Schlidt, A.M., Cashin, S. & Silva-Smith, A.L. (2004). Effect of the BACE intervention on agitation of people with dementia. The Gerontologist, 44, 797–806. doi:10.1093/geront/44.6.797 [CrossRef]
  • Lacourse, E., Nagin, D., Tremblay, R.E., Vitaro, F. & Claes, M. (2003). Developmental trajectories of boys’ delinquent group membership and facilitation of violent behaviors during adolescence. Developmental Psychopathology, 15, 183–197. doi:10.1017/S0954579403000105 [CrossRef]
  • Lawton, M.P. (1986). Environment and aging. Albany, NY: Center for the Study of Aging.
  • Lupien, S.J., McEwen, B.S., Gunnar, M.R. & Hiem, C. (2009). Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nature Reviews Neuroscience, 10, 434–445 doi:10.1038/nrn2639 [CrossRef]
  • Magri, F., Cravello, L., Barili, L., Sarra, S., Cinchetti, W., Salmoiraghi, F. & Ferrari, E. (2006). Stress and dementia: The role of the hypothalamic-pituitary-adrenal axis. Aging Clinical and Experimental Research, 18, 167–170. doi:10.1007/BF03327435 [CrossRef]
  • Malamud, D. (1992). Saliva as a diagnostic fluid. BMJ, 305, 207–208 doi:10.1136/bmj.305.6847.207 [CrossRef]
  • National Institute of Nursing Research. (2011). Bringing science to life: NINR strategic plan. Retrieved from
  • Pellmar, T.C., Brandt, E.N. Jr.. & Baird, M.A. (2002). Health and behavior: The interplay of biological, behavioral, and social influences: Summary of an Institute of Medicine report. American Journal of Health Promotion, 16, 206–219. doi:10.4278/0890-1171-16.4.206 [CrossRef]
  • Piazza, J.R., Almeida, D.M., Dmitrieva, N.O. & Klein, L.C. (2010). Frontiers in the use of biomarkers of health in research on stress and aging. Journals of Gerontology. Series B, Psychological Sciences and Social Sciences, 65, 513–525 doi:10.1093/geronb/gbq049 [CrossRef]
  • Pruessner, J.C., Kirschbaum, C., Meinlschmid, G. & Hellhammer, D.H. (2003). Two formulas for computation of the area under the curve represent measures of total hormone concentration versus time-dependent change. Psychoneuroendocrinology, 28, 916–931. doi:10.1016/S0306-4530(02)00108-7 [CrossRef]
  • Pruessner, J.C., Wolf, O.T., Hellhammer, D.H., Buske-Kirschbaum, A., von Auer, K., Jobst, S. & Kirschbaum, C. (1997). Free cortisol levels after awakening: A reliable biological marker for the assessment of adrenocortical activity. Life Sciences, 61, 2539–2549. doi:10.1016/S0024-3205(97)01008-4 [CrossRef]
  • Raff, H. (2000). Salivary cortisol: A useful measurement in the diagnosis of Cushing’s syndrome and the evaluation of the hypothalamic-pituitary-adrenal axis. The Endocrinologist, 10, 9–17. doi:10.1097/00019616-200010010-00004 [CrossRef]
  • Raff, H. (2004). The role of salivary cortisol determinations in the diagnosis of Cushing’s syndrome. Current Opinion in Endocrinology and Diabetes, 11, 271–275. doi:10.1097/ [CrossRef]
  • Raff, H. (2012). Cushing’s syndrome: Diagnosis and surveillance using salivary cortisol. Pituitary, 15, 64–70 doi:10.1007/s11102-011-0333-0 [CrossRef]
  • Raff, H. (2013). Update on late-night salivary cortisol for the diagnosis of Cushing’s syndrome: Methodological considerations. Endocrine, 44, 346–349 doi:10.1007/s12020-013-0013-0 [CrossRef]
  • Raff, H., Homar, P.J. & Skoner, D.P. (2003). New enzyme immunoassay for salivary cortisol. Clinical Chemistry, 49, 203–204. doi:10.1373/49.1.203 [CrossRef]
  • Raff, H., Raff, J.L., Duthie, E.H., Wilson, C.R., Sasse, E.A., Rudman, I. & Mattson, D. (1999). Elevated salivary cortisol in the evening in healthy elderly men and women: Correlation with bone mineral density. Journals of Gerontology. Series A, Biological Sciences and Medical Sciences, 54, M479–M483. doi:10.1093/gerona/54.9.M479 [CrossRef]
  • Raff, H. & Singh, R.J. (2012). Measurement of late night salivary cortisol and cortisone by LC-MS/MS to assess preanalytical sample contamination with topical hydrocortisone. Clinical Chemistry, 58, 947–948 doi:10.1373/clinchem.2012.182717 [CrossRef]
  • Rosmond, R., Holm, G. & Björntorp, P. (2000). Food-induced cortisol secretion in relation to anthropometric, metabolic and haemodynamic variables in men. International Journal of Obesity and Related Metabolic Disorders, 24, 416–422. doi:10.1038/sj.ijo.0801173 [CrossRef]
  • Sephton, S.E., Sapolsky, R.M., Kraemer, H.C. & Spiegel, D. (2000). Diurnal cortisol rhythm as a predictor of breast cancer survival. Journal of the National Cancer Institute, 92, 994–1000. doi:10.1093/jnci/92.12.994 [CrossRef]
  • Shadish, W.R., Cook, T.D. & Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.
  • Smyth, J.M., Ockenfels, M.C., Gorin, A.A., Catley, D., Porter, L.S., Kirschbaum, C. & Stone, A.A. (1997). Individual differences in the diurnal cycle of cortisol. Psychoneuroendocrinology, 22, 89–105. doi:10.1016/S0306-4530(96)00039-X [CrossRef]
  • Stewart, J. & Seeman, T.E. (2000). Salivary cortisol measurement. Retrieved from the MacArthur Research Network on SES & Health website:
  • Stolk, M.F., Becx, M.C., Kuypers, K.C. & Seldenrijk, C.A. (2006). Severe hepatic side effects of ezetimibe. Clinical Gastroenterology and Hepatology, 4, 908–911. doi:10.1016/j.cgh.2006.04.014 [CrossRef]
  • Stone, A.A., Schwartz, J.E., Smyth, J., Kirschbaum, C., Cohen, S., Hellhammer, D. & Grossman, S. (2001). Individual differences in the diurnal cycle of salivary free cortisol: A replication of flattened cycles for some individuals. Psychoneuroendocrinology, 26, 295–306. doi:10.1016/S0306-4530(00)00057-3 [CrossRef]
  • Tsigos, C. & Chrousos, G.P. (2011). Hypothalamic-pituitary-adrenal axis, neuroendocrine factors and stress. Journal of Psychosomatic Research, 53, 865–871. doi:10.1016/S0022-3999(02)00429-4 [CrossRef]
  • Van Ryzin, M.J., Chatham, M., Kryzer, E., Kertes, D.A. & Gunnar, M.R. (2009). Identifying atypical cortisol patterns in young children: The benefits of group-based trajectory modeling. Psychoneuroendocrinology, 34, 50–61 doi:10.1016/j.psyneuen.2008.08.014 [CrossRef]
  • Weibel, L. (2003). [Methodological guidelines for the use of salivary cortisol as biological marker of stress.]Presse Médicale, 32, 845–851.
  • Wilkinson, C.W., Petrie, E.C., Murray, S.R., Colasurdo, E.A., Raskind, M.A. & Peskind, E.R. (2001). Human glucocorticoid feedback inhibition is reduced in older individuals: Evening study. Journal of Clinical Endocrinology and Metabolism, 86, 545–550.
  • Williams, R.A., Hagerty, B.M. & Brooks, G. (2004). Trier social stress test: A method for use in nursing research. Nursing Research, 53, 277–280. doi:10.1097/00006199-200407000-00011 [CrossRef]
  • Woods, D.L. & Dimond, M. (2002). The effect of therapeutic touch on agitated behavior and cortisol in persons with Alzheimer’s disease. Biological Research for Nursing, 4, 104–114. doi:10.1177/1099800402238331 [CrossRef]
  • Woods, D.L., Kovach, C.R., Raff, H., Joosse, L., Basmadjian, A. & Hegadoren, K.M. (2008). Using saliva to measure endogenous cortisol in nursing home residents with advanced dementia. Research in Nursing & Health, 31, 283–294 doi:10.1002/nur.20254 [CrossRef]
  • Woods, D.L. & Martin, J. (2007). Cortisol and wake time in nursing home residents with behavioral symptoms of dementia. Biological Research for Nursing, 9, 21–29. doi:10.1177/1099800407303982 [CrossRef]
  • Young, E.A., Abelson, J. & Lightman, S.L. (2004). Cortisol pulsatility and its role in stress regulation and health. Frontiers in Neuroendocrinology, 25, 69–76. doi:10.1016/j.yfrne.2004.07.001 [CrossRef]

Mean Differences, Standard Deviations, and Effect Sizes for Study Variables

Acute Psychotic Symptoms (n = 11)No Acute Psychotic Symptoms (n = 91)
VariableMean DifferenceSDESMean DifferenceSDEStp Value
Time 1 waking cortisola0.420.910.46−0.090.61−0.142.330.022
Time 2 morning cortisola0.290.540.540.030.710.041.130.263
Time 3 afternoon cortisola0.290.580.510.040.840.050.900.372
Time 4 evening cortisola0.030.730.040.150.700.210.510.609
Slope for cortisola−0.010.07−
AUC for cortisola3.523.700.950.405.920.071.600.115
Observed agitation31.8220.191.587.5422.860.333.360.001
Observed pain23.9012.661.897.7318.340.422.840.005


Sign up to receive

Journal E-contents