Journal of Nursing Education

Major Article 

Perspectives on Statistics Education: Observations From Statistical Consulting in an Academic Nursing Environment

Matthew J. Hayat, PhD; Sarah J. Schmiege, PhD; Paul F. Cook, PhD


Statistics knowledge is essential for understanding the nursing and health care literature, as well as for applying rigorous science in nursing research. Statistical consultants providing services to faculty and students in an academic nursing program have the opportunity to identify gaps and challenges in statistics education for nursing students. This information may be useful to curriculum committees and statistics educators. This article aims to provide perspective on statistics education stemming from the experiences of three experienced statistics educators who regularly collaborate and consult with nurse investigators. The authors share their knowledge and express their views about data management, data screening and manipulation, statistical software, types of scientific investigation, and advanced statistical topics not covered in the usual coursework. The suggestions provided promote a call for data to study these topics. Relevant data about statistics education can assist educators in developing comprehensive statistics coursework for nursing students. [J Nurs Educ. 2014;53(4):185–191.]

Dr. Hayat is Assistant Professor, College of Nursing, Rutgers University, Newark, New Jersey; Dr. Schmiege is Assistant Professor, Department of Biostatistics and Informatics, and Dr. Cook is Associate Professor, College of Nursing, University of Colorado, Anschutz Medical Campus, Aurora, Colorado.

The authors have disclosed no potential conflicts of interest, financial or otherwise.

Address correspondence to Matthew J. Hayat, PhD, Assistant Professor, College of Nursing, Rutgers University, 180 University Avenue, Office 322, Newark, NJ 07102; e-mail:

Received: May 07, 2013
Accepted: October 16, 2013
Posted Online: March 21, 2014


Statistics knowledge is essential for understanding the nursing and health care literature, as well as for applying rigorous science in nursing research. Statistical consultants providing services to faculty and students in an academic nursing program have the opportunity to identify gaps and challenges in statistics education for nursing students. This information may be useful to curriculum committees and statistics educators. This article aims to provide perspective on statistics education stemming from the experiences of three experienced statistics educators who regularly collaborate and consult with nurse investigators. The authors share their knowledge and express their views about data management, data screening and manipulation, statistical software, types of scientific investigation, and advanced statistical topics not covered in the usual coursework. The suggestions provided promote a call for data to study these topics. Relevant data about statistics education can assist educators in developing comprehensive statistics coursework for nursing students. [J Nurs Educ. 2014;53(4):185–191.]

Dr. Hayat is Assistant Professor, College of Nursing, Rutgers University, Newark, New Jersey; Dr. Schmiege is Assistant Professor, Department of Biostatistics and Informatics, and Dr. Cook is Associate Professor, College of Nursing, University of Colorado, Anschutz Medical Campus, Aurora, Colorado.

The authors have disclosed no potential conflicts of interest, financial or otherwise.

Address correspondence to Matthew J. Hayat, PhD, Assistant Professor, College of Nursing, Rutgers University, 180 University Avenue, Office 322, Newark, NJ 07102; e-mail:

Received: May 07, 2013
Accepted: October 16, 2013
Posted Online: March 21, 2014

Statisticians working in an academic nursing program often provide statistical consulting services to faculty and students. Within a nursing college or school, it is commonplace for the consulting role to include involvement on a variety of study topics and types of projects. For example, some nursing faculty may investigate different-aged populations, such as pediatrics, young adults, older adults, or geriatrics. Practice topics may focus on patients in an inpatient, outpatient, school, day care, workplace, or nursing home setting. Each faculty member’s area of interest may cover one or more domains and classes of nursing outcomes, such as functional, physiological, psychosocial, behavioral, knowledge, family, and community health (Moorhead, Johnson, Maas, & Swanson, 2008). Examples of study types are human subjects research, basic science research with animal models, quality improvement, and program evaluation. Because the discipline of nursing is broad and because statisticians are often not nurses themselves, the statistician usually has neither the content expertise nor adequate opportunity to develop extensive content knowledge about each faculty member’s investigative area. Instead, with such varied project topics, the consulting statistician in an academic nursing environment often collaborates with the nurse investigator, using principles of the scientific method as common ground between the investigator’s specific area of research and the statistical methods required for the research.

Statistical or methodological consultation may be requested at any stage in the life of a study. Most statisticians encourage early involvement with study design and planning and a primary focus on the research question of interest, which guides the study design and necessitates an appropriate analysis for the question to be adequately answered. This encouragement stems from the fact that although a data analysis can always be revisited, weaknesses in the study design cannot be changed or fixed after data collection has commenced. Common statistical consultation service requests include assistance with study design, instrument development, data collection system design, sample size determination, data entry and validation techniques, data management and data quality assurance, and data analysis and interpretation. Statistics education is one of many core requirements in most degree programs and may entail a one-semester course in an undergraduate, master’s, or Doctor of Nursing Practice (DNP) degree program, and two or more semesters in a PhD program. However, statistics courses may not be taken at the time the nursing student most needs the information (e.g., statistics courses may be taken several years before the student begins his or her dissertation analyses), and the authors have observed in many consultations and collaborations with nurse faculty and graduate students that many of the practical methodology and statistics questions addressed by consultants receive limited coverage in nurses’ statistics coursework.

As in any discipline, nursing students and faculty evaluate much new knowledge through the lens of “what does this have to do with nursing?” Nursing, as a discipline, is focused on the intersection of a person, his or her environment, and health outcomes, along with nursing practice (Fawcett, 1984). This means that nurses are concerned about people’s physiology, their behavior, the surroundings and systems that influence them, their health, and their interactions with nursing professionals. The addition of time to this paradigm has been proposed by Henly, Wyman, and Findorff (2011), as time is an important factor in considering changes to health status, an addition that even more strongly suggests the need for advanced statistical methods and models, such as multilevel modeling, to handle multiple observations within individuals. Many topics can be related back to some aspect of this nursing metaparadigm; it takes only some creativity on the instructor’s part to make the linkages.

The purpose of this work is to give the authors’ perspectives on gaps and challenges in statistics education for nursing students as identified through statistical consulting experiences in an academic nursing environment. The authors of this article are experienced statistics educators and consultants, with faculty appointments in the academic nursing field. Each is involved in a multitude of studies with nursing faculty and students. The consulting role varies with each project and may entail a collaborative role as an equal-contributing member of the research team, a leading role on the project, or a more limited helper role in performing data analysis (Hunter, 1981). From their numerous consulting experiences, the authors have observed several topics and trends that suggest gaps and challenges in statistics education for nurses. This work illuminates the challenges observed by the authors as a result of significant applied experience in statistical consulting with nurse investigators. Recommendations are provided for enhancing statistics education curricula to address some of these challenges.

Data Management

Data management encompasses all aspects of working with study data, including database construction, architecture, formatting, administration, security, quality, and integrity. Of no surprise, statistical consultants field many data management questions as health care environments produce increasingly complex data and store them in relational-database electronic health record systems (EHRs).

Recommendation 1: Provide More Instruction About and Practice With Relational Databases and Database-Construction Methods

Introductory statistics courses usually present an overview of data types, levels of measurement, and types of variables. To best prepare nursing students for a possible career in nursing research, this basic terminology and basic operations content should be integrated with hands-on computer training in a database software system. However, the authors frequently hear from nurse investigators that their statistics education did not include enough discussion of how to work with relational databases. Data extracted from large databases are often difficult to work with. For example, one of the authors consulted with a nurse researcher at a children’s hospital who wanted to analyze effects of surgery and ventilation practices on the feeding behavior of newborns in a neonatal intensive care unit (Sables-Baus, Kaufman, Cook, & da Cruz, 2011). In this case, patient demographic data were stored in one table of the EHR, data related to hospital admission and discharge date in a second table, surgical procedures in a third, and ventilation data in a fourth. Furthermore, information about feeding was stored in a nursing “process-flow” table, where fields labeled “type of procedure” and “duration of procedure” contained data on nursing procedures of all varieties. This analysis then required compression of complex relational tables into single lines of data for each patient: patient age at the time of surgery required comparison of date fields across two tables; percent of time on room air required comparison of multiple date and time fields within the table on ventilation procedures; and time from surgery to first oral feeding required extraction of specific events from the nursing process-flow table, followed by a comparison of date and time fields across tables. Complex database queries of this type are absolutely essential for anyone wishing to analyze the wealth of data available in EHR systems. In this case, the final statistical analysis took only approximately 30 minutes, but the database queries consumed dozens of hours and required a series of meetings over many months to extract the right information from the database in a usable form.

Properly setting up a database is critical for researchers collecting their own data; it is the experience of the authors that database construction is a topic not usually covered adequately in nursing students’ introductory statistics coursework. Although elemental aspects of creating a database (e.g., setting the field type for correct measurement levels, designing validation rules to prevent out-of-range values, specifying logical field names) may be covered in introductory curricula, somewhat more advanced facets need to be added to implement complex data-collection strategies. One of these more advanced operations involves setting up appropriate linkages between multiple tables. Data collection is often complex, particularly for longitudinal studies, and it may not be feasible or desirable to create a single large data table that includes all variables and time points. Instead, there are often practical and analytical advantages to creating multiple smaller tables that can then be linked in logical ways. For example, in a sleep study where one of the authors recommended separate tables for participant demographics (collected once), paper-and-pencil questionnaire responses (collected at the beginning, middle, and end of the study), sleep log diary entries (collected daily but aggregated weekly for 6 weeks in the intervention phase of the study), and actigraphy data that recorded participants’ movement on an ongoing basis throughout the day (collected on a minute-by-minute basis for a period of several weeks, but aggregated daily and weekly). Such rich data allow for complex and interesting analyses (Matthews, Schmiege, Cook, Berger, & Aloia, 2012) but require careful thought about data formatting, storage, and aggregation. More curricular exposure to principles of database construction and linkages between multiple data tables will give nursing students facility in performing these complex analyses.

Recommendation 2: Place Additional Emphasis on Data Integrity and Compliance

Although the significance of data integrity is likely familiar to everyone, the severe consequences of failing to maintain it must be stressed in any introductory statistics curriculum. The details of the Health Insurance Portability and Accountability Act (HIPAA) should be discussed in detail within the context of education about data management, as consideration of data integrity is critical at the point of constructing and linking databases. Students and investigators should understand the importance of de-identifying data to ensure that protected health information (PHI) is kept secure, and they need to understand what specific data fields are considered PHI under U.S. law. For example, a patient’s birthdate is PHI whereas age usually is not, but when patients are aged 81 and older, their age might also be considered PHI because fewer patients are in that demographic group. A common misconception is that de-identified data must also be anonymous, which is not necessarily true—in fact, it often helps researchers to retain an arbitrary research identifier that could be used by an authorized person to link research datasets back to other clinical records. Data security is also an important topic and includes issues such as data access, confidentiality, encryption, and security. Each of the authors has experienced unfortunate incidents, such as receiving identified data, witnessing identified data sent through unsecure e-mail, or discovering inconsistent identifiers generated in the database creation steps, and used across data tables that constitute serious threats to data integrity.

Data Screening and Manipulation

Prior to any statistical analysis, data screening and cleaning are required to eliminate out-of-range values. For example, one of the authors worked on a length-of-stay analysis for a health care system that believed it had a significant problem with psychiatric hospitalization. The mean length of stay (LOS) across hundreds of patients was found to be more than 150 days—clearly a quality problem if true, but this value did not fit at all with clinical staff members’ experience.

Recommendation 3: Teach Visualization Techniques and Provide State-of-the-Science Methodology for Handling Missing or Non-Normal Data

In the LOS study, a simple boxplot visualization of the data revealed a data entry error: one patient’s hospitalization date had been typed as 1/1/1000 instead of 1/1/1999. This single patient, with a record indicating hospitalization for nearly 1,000 years, was the source of the entire apparent quality problem. The organization called off its planned quality improvement effort after the error was corrected, and the true LOS was revealed to be 7 days. The simple error in this case was difficult to detect because the sheer size of the organization’s database made screening by hand impossible, but a basic visualization approach readily solved the problem.

Data cleaning and manipulation can be a process analogous to screening, diagnosis, and treatment in primary care (Van den Broeck, Cunningham, Eckels, & Herbst, 2005). For example, data visualization techniques, such as boxplots, crosstabs, and scatterplots, are often helpful first steps in screening data for potential problems. In the diagnostic phase, nurses can determine whether data problems are expected chance occurrences—as in the data entry error mentioned—or systematic problems in the dataset. For instance, in a recent analysis focused on patients’ efforts to cope with HIV disease, it was discovered that the coping questions had been asked only when patients answered “yes” to a question about whether they had successfully coped with a stressor. Therefore, the coping scale contained no information about coping efforts that had been unsuccessful, which was an important systematic omission that was not fixable and limited our interpretation of the results.

For the treatment phase of data cleaning, a good statistics curriculum teaches future nurses when and how to handle challenges, such as missing data and data non-normality. Too many health care investigators in all disciplines rely on outdated techniques for handling missing data, such as simple deletion of missing cases, imputation methods like mean substitution, or last observation carried forward. The use of these outmoded methods can lead to considerable bias. More sophisticated, modern missing data techniques, such as multiple imputation or full-information maximum likelihood, produce more valid conclusions and are consistent with intent-to-treat principles for randomized controlled trial research (Moher et al., 2010).

Non-normal data can also be transformed in many cases, although selection of a specific approach depends on the circumstances and can have drawbacks, such as making results noninterpretable on the original scale used in the study. For example, one of the authors worked on a study of early childhood development in which the dependent variable was highly skewed (Alhusen, Hayat, & Gross, 2013). Direct transformation of the raw dependent variable values did not adequately address the non-normality of the data. To get around this difficulty, another type of generalized linear model—gamma regression with a log link—was used to model the data. This corrected the non-normality problem but presented a challenge with interpretation, as the regression coefficients were directly interpreted as the change in the natural logarithm of the outcome variable for a 1-unit change in the independent variable. Transforming the model results into a meaningful unit necessitated a change in thinking about regression coefficients, from that of additive change in the raw-outcome measures to that of multiplicative change. Education on these topics is needed to maximize comprehension of the results sections of many publications in the nursing and health literature.

Statistical Software

In the past, statistics curricula emphasized statistical methods, mathematical calculations, and reliance on tables of distributions displayed in the appendices of statistics textbooks. Statistical software and the Internet have shifted the focus of statistics education to tasks of a practical nature (Chance, 1997). Web-based statistical tools, e-books, statistical software, and applets are available and may be used to enhance teaching and learning by allowing for a hands-on, practical approach.

Recommendation 4: Teach the Principles Underlying the Operations

Although a goal-oriented approach may ultimately be most appropriate and useful for nurse researchers, it is still important to teach the “why” of a statistical application, in addition to the “how.” For instance, we have tried teaching the formula for a given statistical procedure (e.g., t test) prior to calculating the statistic using a software program; students can be given small datasets requiring them first to calculate the answer by hand before relying on the answer provided by the computer. Because formulas on their own might not be readily interpretable by students without a strong math background, any presentation of formulas can be accompanied by demonstrations of the underlying logic (e.g., analysis of variance, an exercise that guides students through the logic behind sums of squares) or an Excel® spreadsheet that helps the student to perform some of the intermediate math steps, such as subtracting each score from the mean without jumping ahead to the final result. As statistical software becomes more user friendly, it becomes even more critical for students to have an understanding of the statistics underlying program operation. In the absence of a fatal input error, software programs will nearly always give a result, but the interpretability of the result depends on whether the input data and syntax were appropriate for the given research question. As an example, in a consulting appointment, one of the authors was viewing output from an exploratory factor analysis that, on first glance, seemed difficult to interpret. Upon further inspection, it emerged that the patient’s unique identifying number had also been included as a possible indicator of the factor, influencing the eigenvalues and loadings for all other items and rendering the results meaningless. The ease with which investigators can run a software program and yet still obtain erroneous output underscores the importance of comprehensive statistical education around reviewing and interpreting statistical output. In another application, one of the authors noticed unusually large mean values for several continuous measurements for cardiovascular laboratory values (e.g., blood pressure, hemoglobin, LDL cholesterol) in statistical output provided by an investigator. It turned out that missing values were coded in the dataset as 999 but were included as observed data in the data analysis.

Recommendation 5: Include in the Curriculum Some Basic Syntax and Code-Writing Instruction

This problem—the ease with which statistical software may produce output, even when given faulty input—may further be compounded by the popularity of menu-driven options (as opposed to syntax) in many popular software programs. Although the menu-driven option may be attractive in terms of initial ease of use, this approach can present challenges both for data management and for documentation of data manipulation and analysis. In the long-run, syntax may be more efficient because (a) analyses can easily be rerun by reapplying the same syntax to a given dataset and (b) reapplying and rerunning provides standardized documentation of all study data manipulations and analyses conducted. Training in writing code or syntax is strongly recommended, and this will become even more important if and when journals adopt guidelines around the reproducibility of research findings by requiring data or software code (or both) to accompany articles accepted for publication (Laine, Goodman, Griswold, & Sox, 2007; Peng, 2009). Although there certainly is a learning curve for using syntax, instruction on syntax may actually serve to complement and augment material in statistical theory and applications and help to improve understanding of statistical concepts. Syntax options are available in most major software programs. For example, SPSS® includes a comprehensive syntax-based interface, in addition to the menu-driven point-and-click interface.

Types of Scientific Investigation

The structure and needs of statistics education in academic nursing have changed dramatically in recent years. Most notably, a growth explosion has occurred with the number of DNP degree programs. The number of DNP programs has exponentially increased in the past 5 years, from 53 programs with 1,874 enrollees in 2007 to 211 programs with 11,575 enrollees in 2012 (Kirschling, 2013). The statistics education needs in DNP education are vastly different from that of the research-focused doctorate (PhD).

Recommendation 6: Educate DNP Students on Quality Improvement and Evaluation, Educate PhD Students on Research, and Educate Both Groups on Statistical Literacy

Applications of statistics vary with different types of scientific investigations. DNP students are usually interested in program evaluation and quality improvement, whereas PhD students focus on research (Nelson, Cook, & Raterink, 2013). To assess statistics education and make appropriate curriculum decisions, it is essential to understand the many distinctions and critical differences between these different types of scientific investigation (Cook & Lowe, 2012).

On the basis of the DNP curriculum essentials, DNP students are not generally trained in research (American Association of Colleges of Nursing [AACN], 2006), so they would not typically need to know how to conduct predictive procedures such as multiple regression, logistic regression, survival analysis, or structural equation modeling. PhD students are trained to answer research questions that generate new knowledge (AACN, 2010). Therefore, we argue that quantitatively focused PhD nursing students do need to know these procedures and others, such as missing data imputation and multilevel modeling, as part of their scientific role. DNP students certainly need to know enough about these procedures to understand their use in the clinical literature, but our experience is that when DNP students stray into predictive questions in their own work, the faculty should gently guide them back to quality improvement topics (Nelson et al., 2013). For example, in the current hospital-accountability climate, many of the authors’ DNP students express an interest in “developing a valid fall-risk assessment tool,” but this question would be the basis for a research project requiring more advanced statistical procedures because many possible predictors might be screened and the question’s answer is unknown. Quality improvement questions such as “implementing a program to reduce falls” instead use or adapt other best practice fall-risk assessments already available in the literature.

The mainstay of traditional statistics education has been in the two general areas of descriptive statistics and statistical inference. Descriptive statistics are applicable to any type of quantitative data—small or large, random or not, quality improvement or research. They may be used with any type of scientific investigation as an approach to data reduction, exploration, and analysis. Statistical inference, which includes classical hypothesis testing (p values) and confidence intervals, is appropriate only when inferring from a sample to a larger unknown and unseen population of interest (Hayat, 2010). Nursing students at all levels need to be able to read and understand the nursing and health literature. Therefore, despite differences noted previously in the roles of PhD and DNP nursing students, statistics education at all levels needs to include instruction on how to interpret hypothesis tests and confidence intervals. The concepts of power and effect size are deeply entwined with any discussion of statistical significance testing, use of research for evidence-based practice, or meta-analysis, so both DNP and PhD students need to have an understanding of these concepts as well.

Advanced Topics

The time constraints and pressures to fit clinical and didactic courses into each nursing degree program have resulted in a need to pick and choose which topics are covered. It is our experience that many faculty and students do not receive sufficient training on more advanced statistical topics. The basic data management and manipulation topics described previously are foundational for anyone who wants to work effectively with large, complex datasets generated in health care settings. Both descriptive and inferential statistics are important for all doctorally prepared nurses, although their specific application may differ depending on the type of science (research or quality improvement) appropriate to the degree. Beyond these topics, the authors have found that quantitatively focused nurse investigators at the PhD level often need (a) psychometric analysis, (b) procedures for handling correlated observations, and (c) statistical modeling to fully carry out their programs of research.

Recommendation 7: Educate Quantitatively Focused PhD Students on Psychometric Analyses, Procedures for Handling Correlated Observations, and Statistical Modeling

Psychometric analyses are essential for any research involving surveys or questionnaires because the properties of the measurement tool must be reexamined in each new population (Sechrest, 2005). Researchers also commonly want to develop new instruments; this can be for the sake of having a good predictive tool in a research area where one does not exist, as in a study of an instrument to predict baccalaureate-level nurses’ success in their first year of practice after school (Casey et al., 2012). Alternatively, new instruments can be incidental to the process of answering a larger research question, as in a study of whether career ladder programs for practicing nurses increase job satisfaction and retention (Cook & Nelson, 2008). Yet, many nurse investigators, even at the PhD level, are lacking any formal coursework on psychometric procedures, such as exploratory factor analysis, intraclass correlations to establish test–retest reliability, Cronbach’s alpha reliability coefficients, confirmatory factor analysis, or item response theory.

Correlated observations are the norm in many areas of nursing science, including both research on health care systems (where scores from individual patients or nurses are clustered within hospital units or unit-level scores are clustered within hospitals) and research on individual patients (where repeated measures from the same person over time produce a set of observations clustered within patients). Typical procedures for addressing correlated observations—averaging them at the higher level or ignoring the clustering—produce either dramatically reduced or overinflated power, respectively, and have a serious biasing effect on the obtained results (Hayat & Hedlin, 2012). As one example, one of the authors (M.J.H.) was consulted on a 1-year-old data analysis for a multisite study. The data were originally analyzed using logistic regression analysis, which assumes independent observations. The initial findings with this approach did not align with previous findings in other studies and were not clinically sensible. Thus, the principal investigator (M.J.H.) did not publish the study findings for 1 year. Fresh examination of the data analysis revealed substantial site differences that had not been accounted for in the original analysis. A new data analysis that considered the site differences, with the use of marginal models (generalized estimating equations), revealed sensible and clinically meaningful findings. Assumptions about covariance structure (e.g., compound symmetry, variance components, autoregressive), time intervals (e.g., fixed versus random frequency of data collection points), and handling of data within clusters (e.g., uncentered versus group-mean centered versus grand-mean centered) can also have serious implications for the obtained results. One example was seen in a recent analysis (Cook, Aagaard, & Schmiege, 2013) where two of the authors initially obtained very different results evaluating multilevel models in two different statistical programs (SAS® and HLM). It was later discovered that two different approaches were being used to center the variables. Modern statistical methods that account for correlated observations are accessible and readily available (Gueorguieva & Krystal, 2004), and many publications in the nursing and health literature make use of these methods. This topic was deemed important enough that a panel of statistics experts in a recent publication recommended that statistical modeling techniques to account for correlated data be included as a core required statistics educational component for PhD nursing students (Hayat, Eckardt, Higgins, Kim, & Schmiege, 2013).

Finally, statistical modeling is an essential topic for PhD-prepared nurse investigators, who often seek to answer complex questions about multiply determined phenomena influenced by many small effects (Lewontin, 1993). On the basis of the experience of the authors, most nursing doctoral students learn multiple regression and analysis of covariance as techniques for working with multiple predictor variables, although their level of mastery varies. Many learn logistic regression and survival analysis or Cox proportional hazards regression. Only some learn structural equation modeling, although its application for confirmatory factor analysis is a key component of psychometric analysis, as noted above. The concept of creating statistical models that plausibly account for observed data, and comparing data across models to determine which has a better fit, is in line with the goals of nursing inquiry. However, relatively few nurse investigators with whom the authors have collaborated have had the needed statistical skills for this type of advanced modeling.

Modifications To The Statistics Infrastructure Within Nursing Education

Because needed statistical skills are not always available in nursing schools, students are often encouraged to take statistics courses in different departments within their university. This is beneficial in the sense of cross-disciplinary dialogue, but it may leave something to be desired in terms of statistical education.

Recommendation 8: Create Partnerships and Joint Appointments

When taking a course in a mathematics or biostatistics department, many of our nursing students have reported that the course was too mathematically based or technical, not conceptual enough, or too focused on randomized controlled trial methodology for their needs. Statistical techniques, such as structural equation modeling, are more often utilized in the social sciences, but those disciplines are not always represented on a health sciences campus and may not offer students the needed conceptual applications to health care issues. In addition, although public health programs are often conceptually a good fit, Master of Public Health curricula may not offer statistics at the highest level required for nursing research. In the authors’ experience, a partnership or joint faculty appointment with another discipline may be needed for nursing programs to bring the necessary statistical expertise in house.

Recommendation 9: Establish Research Assistantships

All students can likely benefit from exposure and hands-on experience with applied research, including participation in the formulation of a research question, construction of an appropriate study design, and applied data analysis. One of many possible effective approaches the authors have observed for helping nursing students gain additional experience in statistics is through research assistantships (RA), where PhD students take on the role of statistical consultant for other students and sometimes for faculty. Through their own experiences, the authors of the current study have found that one of the most effective ways to become proficient at a new statistical method is by working through the process of carrying out the analysis for the first time. Bringing nursing students into a statistical consultant role as RAs allows them to gain hands-on experience with messy, real-world data situations that are not typically found in standard class examples or homework problems and to gain in-depth experience with a wide variety of statistical techniques beyond what they can get from classroom instruction. The authors have found this approach to be most successful when the student RAs are mentored by a committed faculty member and have the opportunity to receive support and feedback in their role as a trainee statistical consultant. This avenue for hands-on practical research experience may be of interest only to selected students. One challenge unique to the nursing profession is hiring student RAs who are already licensed professionals based on their undergraduate degree; typical RA stipends are unlikely to attract these students on a purely financial basis given that they can receive a much higher salary in clinical practice.

Recommendation 10: Do Not Settle for Software Monoculture

The use of various statistical software tools, rather than a single point-and-click program, is another way to help students to feel comfortable with data and to develop a deeper understanding of statistical concepts. The authors have observed a discipline-specific tendency toward the use of certain statistical software packages. For example, SPSS is commonly used in the classroom and in practice in nursing, psychology, and other social sciences, whereas SAS and R statistical software seem to be more common in a public health or medical setting. Statistics education using only a single software package may be a barrier for students taking courses outside of nursing; for instance, some of our students have wanted additional course-work in a topic such as logistic regression but have struggled with advanced courses in another department using SAS when all of their previous statistical coursework used only SPSS. Additional exposure to specialized software is also needed, such as experience with hierarchical linear modeling or structural equation modeling software in PhD-level courses.


The authors, who work as statistical consultants to nursing faculty and students, identified several topics that suggest a need for changes in nursing education. To address these common issues that are identified based on our observations from statistical consulting, the following recommendations are offered: All nurse investigator training, whether at the PhD or DNP level, should include more training and practice in database manipulation, data screening, and procedures to improve data quality. We recommend that nurse investigators be exposed to and ideally practice using a variety of statistical software programs. We believe that both PhD- and DNP-prepared nurses should receive education in descriptive and inferential statistics. Further, we suggest that PhD-prepared nurses receive additional training in psychometric procedures, approaches to clustered data, and statistical modeling and prediction.

Although recommendations are offered based on the authors’ observations and experiences, well-planned and well-conducted studies are needed to obtain data to confirm the gaps and challenges identified by the authors. If the gaps and challenges are indeed as present as the authors have observed, meeting the statistical training needs of nursing programs may require innovative arrangements, such as hiring non-nurses with statistical expertise as nursing faculty or partnering with other disciplines to offer statistical training specifically designed for nursing students. Sending nurses for training in other disciplines’ statistics courses with no special preparation has not been a successful strategy, in the authors’ experience. Overall, based on the authors’ significant experience, it appears that many nurse investigators require a higher level of statistical training than that which they currently receive. Because nurses are interested in complicated questions about the interactions between people, environments, health, time, and nursing, and because they work with complex data about real-world patients, nurses have a great need for advanced statistical training. The complex questions addressed by nursing science require the highest level of statistical competence.


A great need exists to provide comprehensive statistics education to nursing students. The views of the authors expressed in this article result from years of experience providing statistical consulting to nurse investigators and may be informative to faculty who determine the guidelines of statistics education for nursing students. Effective development of statistics knowledge and a statistical mindset helps to increase the rigor of nursing research, enhance the application of scientific knowledge in practice, and lead to improved competitiveness for funding opportunities among nurse scientists. High-quality training in statistics is beneficial in developing well-rounded nurse researchers who can effectively collaborate in a multidisciplinary context and contribute to the development of a more effective health care system.


  • Alhusen, J.L., Hayat, M.J. & Gross, D. (2013). A longitudinal study of maternal attachment and infant developmental outcomes. Archives of Women’s Mental Health, 16, 521–529. doi:10.1007/s00737-013-0357-8 [CrossRef]
  • American Association of Colleges of Nursing. (2006). The essentials of doctoral education for advanced nursing practice. Retrieved from
  • American Association of Colleges of Nursing. (2010). The research-focused doctoral program in nursing: Pathways to excellence. Retrieved from
  • Casey, K., Fink, R., Jaynes, C., Wilson, V., Campbell, L. & Cook, P. (2012). Readiness for practice: The senior practicum experience. Journal of Nursing Education, 50, 646–652. doi:10.3928/01484834-20110817-03 [CrossRef]
  • Chance, B.L. (1997). Experiences with authentic assessment techniques in an introductory statistics course. Journal of Statistics Education, 5(3). Retrieved from
  • Cook, P.F., Aagaard, L. & Schmiege, S.J. (2013, April). Everyday stress and coping in HIV: Results from momentary ecological assessment. Presented at the Western Institute of Nursing (WIN) Conference. , Anaheim, CA. .
  • Cook, P.F. & Lowe, N.K. (2012). Differentiating the scientific endeavors of research, program evaluation, and quality improvement studies. Journal of Obstetric, Gynecologic, and Neonatal Nursing, 41(1), 1–3. doi:10.1111/j.1552-6909.2011.01319.x [CrossRef]
  • Fawcett, J. (1984). The metaparadigm of nursing: Present status and future refinements. Journal of Nursing Scholarship, 16, 84–87. doi:10.1111/j.1547-5069.1984.tb01393.x [CrossRef]
  • Gueorguieva, R. & Krystal, J.H. (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. Archives of General Psychiatry, 61, 310–317. doi:10.1001/archpsyc.61.3.310 [CrossRef]
  • Hayat, M.J. (2010). Understanding statistical significance. Nursing Research, 59, 219–223. doi:10.1097/NNR.0b013e3181dbb2cc [CrossRef]
  • Hayat, M.J., Eckardt, P., Higgins, M., Kim, M. & Schmiege, S. (2013). Teaching statistics to nursing students: An expert panel consensus. Journal of Nursing Education, 52, 330–334. doi:10.3928/01484834-20130430-01 [CrossRef]
  • Hayat, M.J. & Hedlin, H. (2012). Modern statistical modeling approaches for analyzing repeated-measures data. Nursing Research, 61, 188–194. doi:10.1097/NNR.0b013e31824f5f58 [CrossRef]
  • Henly, S.J., Wyman, J.F. & Findorff, M.J. (2011). Health and illness over time: The trajectory perspective in nursing science. Nursing Research, 60(3, Suppl.), S5–S14. doi:10.1097/NNR.0b013e318216dfd3 [CrossRef]
  • Hunter, W.G. (1981). The practice of statistics: The real world is an idea whose time has come. American Statistician, 35, 72–76.
  • Kirschling, J.M. (2013). Designing DNP programs to meet required competencies—Context for the conversation. Retrieved from the American Association of Colleges of Nursing Web site:
  • Laine, C., Goodman, S.N., Griswold, M.E. & Sox, H.C. (2007). Reproducible research: Moving toward research the public can really trust. Annals of Internal Medicine, 146, 450–453. doi:10.7326/0003-4819-146-6-200703200-00154 [CrossRef]
  • Lewontin, R.C. (1993). Biology as ideology: The doctrine of DNA. New York, NY: Harper Perennial.
  • Matthews, E.E., Schmiege, S.J., Cook, P.F., Berger, A.M. & Aloia, M.S. (2012). Adherence to cognitive behavioral therapy for insomnia (CBTI) among women following primary breast cancer treatment: A pilot study. Behavioral Sleep Medicine, 10, 217–229. doi:10.1080/15402002.2012.666220 [CrossRef]
  • Moher, D., Hopewell, S., Schulz, K.F., Montori, V., Gøtzsche, P.C., Deveraux, P.J. & Altman, D.G. (2010). CONSORT 2010 explanation and elaboration: Updated guidelines for reporting parallel group randomised trials. BMJ, 340, c869. doi:10.1136/bmj.c869 [CrossRef]
  • Moorhead, S., Johnson, M., Maas, M. & Swanson, E. (Eds.) (2008). Nursing Outcomes Classification (NOC) (4th ed.). St. Louis, MO: Mosby/Elsevier.
  • Nelson, J.M. & Cook, P.F. (2008). Evaluation of a career ladder program in an ambulatory care environment. Nursing Economic$, 26, 353–360.
  • Nelson, J.M., Cook, P.F. & Raterink, G. (2013). The evolution of a doctor of nursing practice capstone process: Programmatic revisions to improve the quality of student projects. Journal of Professional Nursing, 29, 370–380. doi:10.1016/j.profnurs.2012.05.018 [CrossRef]
  • Peng, R.D. (2009). Reproducible research and biostatistics. Biostatistics, 10, 405–408. doi:10.1093/biostatistics/kxp014 [CrossRef]
  • Sables-Baus, S., Kaufman, J., Cook, P. & da Cruz, E.M. (2011). Oral feeding outcomes in neonates with congenital cardiac disease undergoing cardiac surgery. Cardiology in the Young, 22, 42–48. doi:10.1017/S1047951111000850 [CrossRef]
  • Sechrest, L. (2005). Validity of measures is no simple matter. Health Services Research, 40, 1584–1604. doi:10.1111/j.1475-6773.2005.00443.x [CrossRef]
  • Van den Broeck, J., Cunningham, S.A., Eckels, R. & Herbst, K. (2005). Data cleaning: Detecting, diagnosing, and editing data abnormalities. PLoS Medicine, 2, e267. doi:10.1371/journal.pmed.0020267 [CrossRef]


Sign up to receive

Journal E-contents