Since the development of the personal computer and the inception of the Internet, the world has experienced a boom in both technological advancements and the dissemination of information, spurring the beginning of the “age of information.”1,2 Health care has been no exception, with its integration of electronic health records (EHRs) that have been shown to improve health care documentation, assist in clinical decision making, reduce errors, and improve patient safety, patient satisfaction, and the standard of care.3–9 An unforeseen benefit of EHRs has been their use as electronic data warehouses for longitudinal clinical data.10–12 This function in data reuse has dramatically expedited scientific research and quality improvement through retrospective analysis.
Between 2000 and 2009, the number of orthopedic journal articles published and registered on PubMed increased from 2889 to 6909 per year, for a compound annual growth rate of 10.2%.13 Despite this rapid growth, data collection continues to be performed by manual chart review, whereby data are abstracted and recorded in electronic or paper spreadsheets by hand.7,9,11 As a result, modern-day studies are prone to human error, increasing the potential for type I and II errors. The limitations of manual chart review are further evidenced by the increased administrative cost of research personnel and the relatively inefficient speed of human transcription compared with automated processes.10,11 These issues undermine the benefits of health care big data and the ability to rapidly and reproducibly identify deficiencies to enact system-wide changes.7,9,11
Recent advances in EHR technologies have led to the development of new tools to improve the ease of access to structured clinical data. At the authors' institution, the recent integration of Caboodle (Epic, Verona, Wisconsin), an innovative dimensional database, allows for the rapid query of institutional data across all ambulatory and inpatient hospitals while circumventing the technical challenges of a relational database (eg, Clarity, Epic). The goal of this study was to assess the accuracy of manual chart review compared with modern-day electronic data collection. The authors hypothesized that electronic data queries would result in significantly shorter data collection times and fewer transcription errors.
Materials and Methods
This retrospective cohort study was performed at a single tertiary urban academic orthopedic institution. Patients included in this study were recruited from a quality improvement initiative investigating the underlying risk factors for venous thromboembolic events after any orthopedic surgical procedure. Before the initiation of the study, standardized data collection sheets were developed, with specific attention paid to patient demographics (medical record number [MRN], first name, last name, age, sex, height, weight, body mass index, smoking status, race, ethnicity, preferred language, marital status, religion, zip code, insurance type) and inpatient surgical data (encounter case serial number, American Society of Anesthesiologists [ASA] physical status score, type of surgery, admission date, date of surgery, discharge date, incision time, closure time, length of stay, and discharge disposition).
Data formatting was reviewed and discussed among the authors. Manual chart review was subsequently performed with the physician-facing EHR at the authors' institution, Hyperspace 2017 (Epic). Manual chart review was performed by a resident physician (A.A.A.) and a medical student (L.A.). Patient encounters for the 100 most recent venous thromboembolic events that occurred after a primary or revision total joint arthroplasty were identified with MRNs and dates of surgery. Medical record numbers and dates of surgery were provided by the authors' institution as part of a quality improvement initiative. Chart review was then performed by a resident physician and a medical student. All data were manually entered electronically into an Excel 2017 (Microsoft, Redmond, Washington) spreadsheet. Patients who underwent all other surgical procedures were excluded.
After manual chart review was completed, an electronic data query was performed with the authors' EHR data warehouse, Caboodle, version 15 (Epic), using Structured Query Language (SQL) Server Management Studio 2017 (Microsoft). Patients and surgical encounters were identified using the same combination of MRN and date of surgery as originally provided by the quality improvement department. Patients who could not be identified because of incorrect MRNs or incorrect surgical dates were identified with a manual chart review with Hyperspace. The SQL query was then performed to collect the same data variables as the manual chart review and verify the venous thromboembolic events.
Data obtained through both manual chart review and electronic query were transformed to improve consistency between the two sheets. For data obtained through manual chart review, categorical variables were transformed to align with standard naming schema, and misspelled categorical or date variables were corrected. Numerical (eg, ASA score, height, weight) and string variables (eg, names) were left uncorrected (Figure 1).
Categorical data transformation for demographics. Abbreviations: HHS, home health services; LTKA, left total knee arthroplasty; RTKA, right total knee arthroplasty; TKA, total knee arthroplasty.
For data obtained through electronic query, categorical variables were normalized to simplify grouping, as typically would be performed for statistical analysis (Figure 2).
Categorical data transformation for race and type of surgery.
Data sets were compared with an algorithm to check for identical matches with the Excel file using short lines of code. For example, Code Evaluating Transcription Error Rates: =IF(‘Hand Collection Sheet’!AK18=‘Electronic Collection Sheet’!Z18, “No error”, “Error”). All discrepancies between data sets were flagged as transcription errors as a result of manual chart review.
Descriptive statistics were performed. Random error rates were calculated for patient encounters and data variables. For patient encounters, an error percentage was calculated for each unique patient encounter. The total number of errors for a single patient encounter row was counted and divided by the number of data points collected for that patient encounter row. These percentages were then averaged to obtain the random error rate for patient encounters.
A similar calculation was performed for the random error rate for data variables. For a single data variable, the number of errors within a single data variable column was counted and divided by the total number of data points collected for that data variable column. The error rate for each data variable column was reviewed to identify potential systematic errors in data collection. The random error was subsequently calculated by averaging the individual error rates of all data variable columns. A second random error rate for the data variable columns was calculated by excluding columns that were identified as systematically erroneous.
In total, 27 variables were retrieved from 100 unique patient encounters between January 2014 and December 2017 (Table 1). The ASA score was recovered for only 38 of 100 patient encounters because of a change in database architecture for patient encounters before November 2015. Therefore, the ASA score was evaluated only for data points present from the electronic data query. Total time and average time per patient for manual data collection were 915 minutes and 10.3±3.89 minutes, respectively. Development of an SQL script took approximately 1 workday, and data collection time for the query was 58 seconds for all 100 patients.
Errors by Variable
The average transcription error rate was 9.19%±5.74% per patient encounter and 11.04%±21.40% per data variable. Further evaluation found incorrect patient age and ethnicity in 98% and 64% of cases, respectively. The error in age was mainly caused by inaccurate data collection, including current age as opposed to age at the time of surgery. This inaccuracy led to a systematic error in 7.41% (2 of 27) of the variables collected. When systematic errors were excluded, the random error rate was 5.79%±7.04% per patient encounter and 5.44%±5.63% per data variable.
Enactment of the American Recovery and Reinvestment Act of 2009, which required hospitals to transition to the EHR form of medical records, set the framework for the age of large-scale data deployment.11,14 Over time, it became clear that the use of EHR data would lead to improvements in health care documentation, patient safety, satisfaction, outcome measurement, follow-up, and feedback, and lead to higher-quality research.7–10,15,16 Although EHRs provide electronic data storage, orthopedic research depends on chart review and manual transcription of data from EHRs to clinical research forms. Further, although they apply abstract thinking skills, trained staff members who collect data manually remain prone to human error.7,9,10,15 This study investigated the incidence of human error in manual data collection compared with modern-day electronic data warehouse queries.
This study had several limitations. First, random errors during the EHR documentation process are unavoidable. Chart reviewers interpret what they are reviewing and thereby reduce the incidence of clinical documentation errors (eg, incorrect diagnosis), which is not possible with current electronic data queries. Electronic data queries enable users to perform big data extractions, mitigating the effects of low-incidence random errors with substantially larger sample sizes.7,9,15,16 Second, the current authors assumed that all errors were the result of improper manual chart review, although the skill set of researchers is highly variable and subject to interpretation. In the current study, data collection was performed by medically trained researchers; therefore, a high error rate was unlikely. In addition, a similar error rate is reported in the literature, and the absence of incorrectly categorized errors during the authors' random data check was unlikely to have played a significant role. Third, the authors assumed that the EHR is the gold standard. It is a limitation, but arguably there are more “checks” when data are entered into Epic. No one checks or verifies data that are collected by research fellows. Although 100% accuracy may not be possible, electronic data entry involves fewer steps than manual registration, which drastically reduces errors.
The authors found transcription error rates of 9.19%±5.74% per patient encounter and 11.04%±21.4% per data variable. This finding is consistent with previous studies that showed a transcription error rate of approximately 9% for demographic data only.11 Systematic errors also occurred in 7.41% (2 of 27) of variables. Systematic errors were judged on the basis of their substantially high error rates. The two variables that the authors selected had error rates of 64.00% for ethnicity and 98.00% for age. When systematic errors were excluded, the random error rate was 5.79%±7.04% per patient encounter and 5.44%±5.63% per data variable. Review of these data points indicated that age and ethnicity were recorded incorrectly by the two manual chart reviewers. Patient age was incorrectly recorded as the patient's age at the time of data retrieval rather than the age at the time of surgery. Patient ethnicity was recorded erroneously as “Not Hispanic or of Spanish Descent” for subjects who did not divulge this information. The intrinsic error that the authors found at the basis of this study increased the likelihood of type I and II errors, decreasing the power of studies that rely on manual chart review and placing scientific research at risk for wrongful interpretation.
In this study, the authors queried their institution's Caboodle dimensional database. The queries used the SQL computer programming language, which can be saved and shared, allowing users to reproduce and verify the programming logic of the study within seconds. In contrast, chart review for this study required an average of 9 minutes per row of 26 variables and a total time of more than 15 hours. The time-intensive nature of manual chart review makes it difficult to verify the integrity of all data points. Further, much of the coding used for this study was derived from the authors' established scripting library, allowing retrieval of specific data points to be modularized and assembled rapidly. The time-prohibitive nature of manual chart review limits the scalability of these studies. Development of the SQL script took approximately 1 workday by a relatively inexperienced user. More recent iterations of similar studies have been developed within minutes to hours, demonstrating the scalability of electronic data retrieval. For example, the SQL query can be modified with a few lines of code to identify all future patients with inpatient venous thromboembolic events as well as other quality metric disease, such as pneumonia.
Among other disciplines and competitive industries, big data is used to drive business operations. In competitive industries, big data is applied to mold business decisions and has led to improved profit, quality, and efficiency.6,10,17 In the 1990s, industries such as telecommunications, securities trading, and general merchandising invested heavily in electronic record-keeping infrastructure.6 This investment resulted in annual productivity growth of 6% to 8%, with at least one-third to one-fourth of this growth attributed to electronic data warehousing.6 However, the hotel industry underused its information technology investment and therefore did not see sizable increases in productivity.6 In the biopharmaceutical sector, big data has revolutionized the production of vaccines, hormones, and blood components by significantly increasing product yield.18,19 One study in particular showed that a biopharmaceutical manufacturer increased vaccine yield by more than 50%, with an estimated $5 to $10 million in annual savings through the application of big data analytics.18 In addition, banks such as Citibank, J. P. Morgan, and Wells Fargo employ big data analytics to boost productivity, reduce costs, cut inventories, and facilitate electronic commerce.19–21 In a study of data from 12 banks from 1989 to 1997, information technology was the highest marginal product among any other input factors, including labor costs, operating expenses, and interest expenses.21 In economics, big data has led to a transformative shift in research from small sample government surveys to near-universal population coverage, allowing researchers to examine variations in wages, health, productivity, education, and other measures across different subpopulations.17
As private data-driven industries continue to develop, the focus has shifted toward predictive analytics, including artificial intelligence and deep learning. Although these tools may be highly advantageous in medicine, the use of health care informatics remains rudimentary. Currently, at most institutions, data requests are made by physicians who are unaware of the structure of the data available. These requests are processed by data specialists who have limited familiarity with medical terminology and limited insight into hospital operations and activities that affect data structure and integrity. This disconnect between medicine and technology significantly restricts the potential of health care big data and places it into data silos, which are large databases with limited access. Therefore, the value of physician informaticists to act as liaisons and leaders in data-driven decision making cannot be overstated.
Manual chart review is the most prevalent method of data collection from modern-day EHRs. However, with an estimated error rate of 10%, manual chart review-based studies are prone to a higher incidence of type I and II errors. Computer-based tools for data query can improve the speed, reliability, reproducibility, and scalability of data retrieval, allowing hospitals to make more data-driven decisions.
- Humbert M. Technology and workforce: comparison between the information revolution and the industrial revolution. Dissertation. University of California; 2007.
- Castells M. The information age: economy, society and culture. Volume I. The rise of the network society. Journal of Marketing. 1997;61(4):96–97. doi:10.2307/1252090 [CrossRef]
- Garg AX, Adhikari NKJ, McDonald H, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005;293(10):1223–1238. doi:10.1001/jama.293.10.1223 [CrossRef] PMID:15755945
- Adler-Milstein J, Everson J, Lee S-YD. EHR adoption and hospital performance: time-related effects. Health Serv Res. 2015;50(6):1751–1771. doi:10.1111/1475-6773.12406 [CrossRef] PMID:26473506
- Kazley AS, Diana ML, Ford EW, Menachemi N. Is electronic health record use associated with patient satisfaction in hospitals?Health Care Manage Rev.2012;37(1):23–30. doi:10.1097/HMR.0b013e3182307bd3 [CrossRef] PMID:21918464
- Hillestad R, Bigelow J, Bower A, et al. Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. Health Aff (Millwood). 2005;24(5):1103–1117. doi:10.1377/hlthaff.24.5.1103 [CrossRef] PMID:16162551
- Katzan I, Speck M, Dopler C, et al. The Knowledge Program: an innovative, comprehensive electronic data capture system and warehouse. AMIA Annu Symp Proc. 2011;2011:683–692. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3243190&tool=pmcentrez&rendertype=abstract PMID:22195124
- Minassian VA, Yan X, Lichtenfeld MJ, Sun H, Stewart WF. The iceberg of health care utilization in women with urinary incontinence. Int Urogynecol J Pelvic Floor Dysfunct. 2012;23(8):1087–1093. doi:10.1007/s00192-012-1743-x [CrossRef] PMID:22527544
- Hersh WR, Weiner MG, Embi PJ, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013;51(8) (suppl 3):S30–S37. doi:10.1097/MLR.0b013e31829b1dbd [CrossRef] PMID:23774517
- Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. AMIA Summits Transl Sci Proc. 2010:1–5.
- Nordo AH, Eisenstein EL, Hawley J, et al. A comparative effectiveness study of eSource used for data capture for a clinical research registry. Int J Med Inform. 2017;103:89–94. doi:10.1016/j.ijmedinf.2017.04.015 [CrossRef] PMID:28551007
- Prokosch HU, Ganslandt T. Perspectives for medical informatics: reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48(1):38–44. doi:10.3414/ME9132 [CrossRef] PMID:19151882
- Lee KM, Ryu MS, Chung CY, et al. Characteristics and trends of orthopedic publications between 2000 and 2009. Clin Orthop Surg. 2011;3(3):225–229. doi:10.4055/cios.2011.3.3.225 [CrossRef] PMID:21909470
- Rosenbaum S. Law and the public's health. Public Health Reports. 2010;125(October):759–762.
- El Fadly A, Rance B, Lucas N, et al. Integrating clinical research with the Healthcare Enterprise: from the RE-USE project to the EHR4CR platform. J Biomed Inform. 2011;44(suppl 1):S94–S102. doi:10.1016/j.jbi.2011.07.007 [CrossRef] PMID:21888989
- Center for Drug Evaluation and ResearchCenter for Biologics Evaluation and ResearchCenter for Devices and Radiological Health. Use of electronic health record data in clinical investigations: guidance for industry. 2018. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/use-electronic-health-record-data-clinical-investigations-guidance-industry
- Einav L, Levin J. Economics in the age of big data. Science. 2014;346(6210). doi:10.1126/science.1243089 [CrossRef]
- Auschitzky E, Hammer M, Rajagopaul A. How big data can improve manufacturing. https://digitalstrategy.nl/wp-content/uploads/2014.01-A-How-big-data-can-improve-manufacturing-_-McKinsey-Company.pdf
- Marr B. The big data transformation. http://www.oreilly.com/data/free/files/the-big-data-transformation.pdf
- Srivastava U, Gopalkrishnan S. Impact of big data analytics on banking sector: learning for Indian Banks. Procedia Comput Sci. 2015;50:643–652. doi:10.1016/j.procs.2015.04.098 [CrossRef]
- Shu W, Strassmann PA. Does information technology provide banks with profit?Inf Manage. 2005;42(5):781–787. doi:10.1016/j.im.2003.06.007 [CrossRef]
Errors by Variable
|Variable||Data Points Available by Electronic Query||Manual Chart Review Errors||Error Rate|
|Encounter case serial number||100||7||7.00%|
|Patient medical record number||100||2||2.00%|
|Body mass index||100||16||16.00%|
|American Society of Anesthesiologists scoreb||38||0||0.00%|
|Incision time of day||100||0||0.00%|
|Closure time of day||100||2||2.00%|
|Length of stay||100||4||4.00%|
|Verification of pulmonary embolism||100||3||3.00%|