After addressing topics such as instrument validity and reliability, statistical significance, effect sizes, and power analysis in the past several Methodology Corner articles, I have received several questions about how to adequately address and evaluate these facets of research design within the context of a pilot study. These questions are timely given the highly variable way in which the term pilot study is used. In this month's Methodology Corner article, we review how the term pilot study is variously defined, the limitations of pilot studies, and recommendations regarding the role of pilot studies in nursing education research. The discussion and advice provided here amplifies and expands on that provided by Morin (2013), who provided a valuable discussion of the virtues and limitations of pilot studies for readers of the Journal of Nursing Education.
Decades after the concept of a pilot study was formalized and widely adopted by clinical and social scientists, there remains no universally agreed-upon definition. Early uses of term pilot study in connection to nursing or nursing education, such as the early study of nursing role functions by Pollak, Westoff, and Bressler (1953), cannot be easily distinguished based on scope and methods from what we now would label simply as research. That is, although many of these early studies were labeled as pilot studies, how the studies differed from other research not labeled as such—and published during the same time period—is unclear. However, in the decades following introduction of the term, conceptions of what constitutes a pilot study have become narrower, focusing more centrally on study procedures, feasibility, and other process-focused questions than on null hypothesis statistical testing, typically used to assess for preliminary evidence of intervention effectiveness (Morin, 2013). In an early example highlighting the shifting role that pilot studies should play in the broader research context, Moore and White (1965) described challenges they faced in a nursing education intervention study that could have been avoided if a more thorough pilot study were conducted. Moore and White noted, “the pilot study is needed to evaluate the practicality and feasibility of a complex research design, assess demands on the research team and participants, determine adequacy of equipment and instruments for collecting data, and identify hidden variables” (p. 43). That description remains as relevant today as it was at the time of publication more than 50 years ago.
In their widely cited work, Thabane et al. (2010) reported on how the term is defined by several influential entities such as the Centers for Disease Control and Prevention (CDC) and the National Science Foundation (NSF), among others. The NSF definition, described by Lancaster, Dodd, and Williamson (2004), acknowledges that although there is no formal methodological guidance on what constitutes a pilot study, there is agreement that pilot study aims and purposes should be clearly described and that pilot studies should focus primarily on study procedures rather than statistical hypothesis testing. Arain, Campbell, Cooper, and Lancaster (2010) likewise found that pilot studies continue to focus too much on null hypothesis statistical testing when the focus is more properly placed on questions of study feasibility and acceptability to target participants. Synthesizing a wide range of literature and practices addressing pilot studies, Thabane et al. (2010) concluded that pilot studies are best suited for determining the feasibility of larger studies and should be used cautiously to estimate treatment effect sizes or to derive estimates for use in power analysis or other related procedures.
One overarching consideration in the discussion pilot study definitions is that pilot studies originated from and gained prominence in primarily experimental research contexts. This explains the strong focus by Thabane et al. (2010), Arain et al. (2010) and Leon, Davis, and Kraemer (2011) on the role of pilot studies in preparation for randomized controlled trials of clinical interventions. Although the role pilot studies should play in nonexperimental research is much less well discussed in the literature, the principles that undergird cautions about the role of pilot studies in experimental designs apply just as well to descriptive, correlational, and cross-sectional research, as discussed further in this article. As Thabane et al. noted (2010), “the main goal of pilot studies is to assess feasibility so as to avoid potentially disastrous consequences of embarking on a large study” (p. 1) without adequate feasibility testing. Morin (2013) reiterated the important role that pilot studies play in identifying process-focused issues, which could derail researchers' ability to draw conclusions from larger, more highly-powered studies.
The primary limitation of pilot studies designed as smaller versions of a planned larger study is these studies are unlikely to adequately address important questions of feasibility while simultaneously placing too much focus on statistical findings. Describing their experiences as editors, Thabane et al. (2010) noted that the top reasons researchers label their studies as pilot studies include:
- The study was a small, single-site study with limited generalizability to other settings or subjects.
- Although the study used a small sample size, the sample size was larger than those used in other published reports, although still smaller than most power analyses would suggest is required.
- The study was conducted by a student or other learner, rather than a more experienced researcher (who is presumably capable of conducting a much more rigorous study).
Thabane et al. (2010) were emphatic that none of these reasons is justification for calling a study a pilot study, based on the prevailing view that pilot studies should focus on feasibility rather than on producing generalizable findings. One erroneous assumption researchers may make about pilot studies is that any effect size estimate (e.g., Cohen's d or Pearson's r) or p value found in a pilot study will be just as extreme—or more extreme—in a larger, more rigorously designed study. This assumption is based on a misunderstanding of statistical significance (Greenland et al., 2016), where factors such as sample size, instrument reliability, and the fidelity of study procedures all influence the statistical point estimates and corresponding p values produced in any study.
Thus, the limitations of pilot studies as smaller versions of larger studies rests primarily on statistical considerations which apply to all uses of a particular statistical test, no matter the sample size or research design being used. That is, each parametric statistical test carries with it a set of assumptions and requirements that, if violated, could substantively change the conclusions that researchers make based on their data. To illustrate this point, if a nursing education researcher conducts a small “pilot study” to examine reliability and validity evidence for a scale revised for use by nursing students from its original use in social work students, all the requirements and statistical test assumptions for calculating common metrics such as Cronbach's alpha, correlation coefficients, and even descriptive statistics must be met. Among the most frequent of parametric statistical test assumptions is that of normality—that the data under investigation are distributed in a way that approximates a normal, “bell-shaped” distribution. With small sample sizes (e.g., fewer than 50 subjects), this assumption is difficult to meet. In addition, when using instruments or scales with low or low-normal reliability estimates (Cronbach's α = .50 to .70), the resulting statistical point estimates become even more unstable and thus subject to error. This fact is the primary reason authors such as Thabane et al. (2010), Griffiths and Norman (2016), and Morris and Rosenbloom (2017) strongly suggest the focus of pilot studies be more about process and feasibility than about reaching conclusions about intervention effects or relationships among variables.
Synthesizing the work of others including Thabane et al. (2010), Arain et al. (2010), and more recently, Morris and Rosenbloom (2017), a working definition of pilot study for nursing education researchers is: a pilot study is research designed to gain information about the feasibility and acceptability of study procedures proposed to be used in a larger study designed and adequately powered to detect intervention effectiveness (in experimental designs) or to produce reliable point estimates of the relationships among variables in a defined population (in descriptive or correlational designs).
That definition acknowledges the inherent, primarily statistical limitations of small, single-site pilot studies—while also highlighting the truly useful information that pilot studies can provide to nursing education researchers. Just as the science of nursing education benefits from rigorously designed correlational and intervention studies, it similarly benefits from information about the extent to which interventions are acceptable to subjects, how much researcher effort is required to deliver various interventions, the amount of time required to recruit the desired sample size, and other important lessons learned from pilot work. This information is gleaned not from studies focusing on statistical hypothesis testing but on more holistic, theory-and research-informed pilot studies that focus on feasibility and practicality. Moore and White (1965) captured the central importance of pilot studies in noting, “The pilot study should not be slighted, as it may mean the difference between success and failure of a research project” (p. 43).
Based on the literature reviewed and points discussed in this article, the following recommendations are provided to nursing education researchers:
- Pilot studies should focus primarily on questions related to feasibility and acceptability by study subjects.
- Relevant descriptive statistics may be produced and should be reported, ideally alongside interval estimates (e.g., confidence intervals).
- The use of inferential statistical tests to estimate effect sizes in either intervention or correlational studies should be minimized or eliminated.
- Pilot studies are a valuable form of scholarship that should be published. Pilot studies are most appropriately disseminated as research briefs that contribute to the science of nursing education through their focus on improving the processes involved in conducting research in nursing education.
Morin (2013) noted that a properly designed pilot study “enhances the possibilities of generating nursing education research that can accelerate the forward movement of the science of nursing education—an outcome to which we are all committed!” (p. 548). This sentiment remains just as true 5 years after it appeared in the Journal of Nursing Education and has never been more important given our increasing focus on producing research findings capable of shaping the practice of nurse educators now and into the future.
Please send feedback, comments, and suggestions for future Methodology Corner topics to Darrell Spurlock, Jr., PhD, RN, NEA-BC, ANEF, at
- Arain, M., Campbell, M.J., Cooper, C.L. & Lancaster, G.A. (2010). What is a pilot or feasibility study? A review of current practice and editorial policy. BMC Medical Research Methodology, 10, 67. https://doi.org/10.1186/1471-2288-10-67 doi:10.1186/1471-2288-10-67 [CrossRef]
- Greenland, S., Senn, S.J., Rothman, K.J., Carlin, J.B., Poole, C., Goodman, S.N. & Altman, D.G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31, 337–350. https://doi.org/10.1007/s10654-016-0149-3 doi:10.1007/s10654-016-0149-3 [CrossRef]
- Griffiths, P. & Norman, I. (2016). Why was my paper rejected? Editors' reflections on common issues which influence decisions to reject papers submitted for publication in academic nursing journals. International Journal of Nursing Studies, 57, A1–A4. https://doi.org/10.1016/j.ijnurstu.2016.03.017 doi:10.1016/j.ijnurstu.2016.03.017 [CrossRef]
- Lancaster, G.A., Dodd, S. & Williamson, P.R. (2004). Design and analysis of pilot studies: Recommendations for good practice. Journal of Evaluation in Clinical Practice, 10, 307–312. https://doi.org/10.1111/j.2002.384.doc.x doi:10.1111/j..2002.384.doc.x [CrossRef]
- Leon, A.C., Davis, L.L. & Kraemer, H.C. (2011). The role and interpretation of pilot studies in clinical research. Journal of Psychiatric Research, 45, 626–629. https://doi.org/10.1016/j.jpsychires.2010.10.008 doi:10.1016/j.jpsychires.2010.10.008 [CrossRef]
- Moore, L. & White, G.D. (1965). The pilot study—Its value to researchers. Nursing Outlook, 13(12), 43.
- Morin, K.H. (2013). Value of a pilot study. Journal of Nursing Education, 52, 547–548. https://doi.org/10.3928/01484834-20130920-10 doi:10.3928/01484834-20130920-10 [CrossRef]
- Morris, N.S. & Rosenbloom, D.A. (2017). Defining and understanding pilot and other feasibility studies. American Journal of Nursing, 117(3), 38–47. doi:10.1097/01.NAJ.0000513261.75366.37 [CrossRef]
- Pollak, O., Westoff, C. & Bressler, M. (1953). Pennsylvania pilot study of nursing functions. Nursing Research, 2(1), 15–22. doi:10.1097/00006199-195306000-00004 [CrossRef]
- Thabane, L., Ma, J., Chu, R., Cheng, J., Ismaila, A., Rios, L.P. & Goldsmith, C.H. (2010). A tutorial on pilot studies: The what, why and how. BMC Medical Research Methodology, 10, 1. https://doi.org/10.1186/1471-2288-10-1 doi:10.1186/1471-2288-10-1 [CrossRef]