Corneal surgical correction of refractive errors is a widely performed procedure, with more than 200 million surgeries performed worldwide.1 Advances in technology such as corneal reshaping laser systems and eye tracking software have led to unprecedented successful results and a high degree of patient satisfaction.2 However, in some cases, undesired outcomes may lead to additional interventions.3
Identifying patients in whom achieving good surgical outcomes is less likely may be beneficial for both the patient and the surgeon. Indeed, several clinical factors affecting surgical outcome have been identified, such as high myopic refractive corrections, preoperative corneal curvature, and age.1,4
Decision trees are created using advanced data analysis software that contains mathematical algorithms used to transform heterogeneous data to a simple practical conclusion.5 Based on part of the data, the software learns these patterns and tries to classify the following data according to the learned patterns. Decision trees have been used in age-related macular degeneration (to classify and predict retinal drusen type),6 glaucoma (to diagnose, classify, and identify progression based on clinical and imaging data),7,8 and detection of keratoconus.9 The decision tree analysis may locate new patterns that are not obvious using traditional statistical methods.
Our aim was to build a decision forest, which is an ensemble of decision trees that may serve as a decision support tool with the hope that such a model may be used to estimate the risk of unsuccessful outcome based on the patient's preoperative demographic and clinical data. Therefore, the purpose of this study was to employ predictive machine learning models and large data sets for training to identify high-risk patients undergoing corneal laser refractive surgery.
Patients and Methods
The design of the study followed the tenets of the Declaration of Helsinki and the study protocol was approved by the local institutional review board committee.
Data were obtained through the computerized database registry of Care-Vision Laser Centers, Tel Aviv, Israel. The Database Registry includes patient demographic and clinical data variables, archived by an advanced computerized electronic record-keeping software system. Records are updated by the center's staff prospectively, during each patient's visit. For the current study, we used a computerized query to select patients with the following inclusion criteria: 17 to 40 years old, underwent LASIK or photorefractive keratectomy (PRK) treatment, and underwent at least 3 weeks of follow-up for LASIK and 3 months for PRK. Patients who were treated for monovision or hyperopia were excluded. Patients with a preoperative corrected distance visual acuity (CDVA) of 20/30 or worse were also excluded. Efficacy was calculated as the ratio of preoperative CDVA and postoperative uncorrected distance visual acuity (UDVA). Low efficacy was defined as less than 0.4 and high efficacy as 0.8 or greater.
Prediction of Efficacy
To predict the outcome of the procedure given the values of different variables, we used a regression tree. A regression tree recursively partitions the independent variables' space into subspaces such that each subspace constitutes a basis for a different prediction.10 A single regression tree usually has limited predicted performance.11 One way to improve the prediction performance is to build a regression forest, which combines the predictions of several regression trees into a final prediction.12 In this study, we used the Gradient Boosted Machine to build the forest in a stage-wise fashion.13 Specifically, the algorithm trains a sequence of regression trees, where each successive tree aims to predict the pseudo-residuals of the preceding trees assuming that the loss function is mean squared error. This method allows combining a huge number of regression trees with a small learning rate.13 For training the gradient boosted trees we employed the XGBoost (Extreme Gradient Boosting) algorithm, which is considered the state-of-the-art algorithm for training a decision forest.14 The hyperparameters of XGBoost were set to build a decision forest consisting of 6,000 trees. Each tree was trained on an independently drawn random sample of 50% taken from the training set and had a maximum tree depth of nine. To prevent over-fitting, we set the step size shrinkage (eta) to 0.01 and used a dropout algorithm to thin the forest.15 XGBoost provides each feature with gain, cover, and frequency indexes implying the relative contribution of the corresponding feature to the model.16 A higher value gain, when compared to another feature, implies it is more important for generating a prediction.13 The cover metric means the relative number of observations related to this feature.13 The frequency is the percentage representing the relative number of times a particular feature occurs in the trees of the model.13
To evaluate the generalized predictive performance of the trained model, a 10-fold cross-validation procedure was repeated five times.17 This 10-fold cross-validation is a standard procedure in the field of machine learning. In every round, the training set was randomly partitioned into 10 disjoint subsets. Nine of the subsets were used to train the model and the last subset was used for evaluating the performance. Each subset was used only once for testing and nine times for training. The purpose of this cross-validation procedure was to maximize the amount of training data available to achieve an unbiased estimate for the expected classification accuracy on new data.17
Receiver Operating Characteristic (ROC) Curve Analysis
The regression task for predicting the efficacy can be converted into a binary classification task with a goal of determining whether the procedure was successful according to acceptable threshold. For example, a patient with efficacy greater than 0.5 was considered to be a positive case and correspondently a patient with efficacy lower than 0.5 was considered to be a negative case. One common approach for estimating the predictive performance of a classifier is to use the ROC curve, which demonstrates the performance over a range of trade-offs between true-positive and false-positive error rates.18 Each point in the curve corresponds to a particular cut-off, having the false-positive value (1-specificity) as x-axis and the sensitivity value as y-axis. Points closer to the upper right corner correspond to lower cut-offs; points closer to the lower left corner correspond to higher cut-offs. The choice of the cut-off thus represents a trade-off between sensitivity and specificity. A low cut-off usually yields a higher sensitivity. Conversely, a high cut-off yields a lower false-positive rate at the price of lower sensitivity. In terms of classifier comparison, the best curve is the one that is leftmost, the ideal one coinciding with the y-axis. Thus, the area under the curve (AUC) is an accepted performance metric for an ROC curve.19 The AUC range is 0 to 1. The area under the diagonal is 0.5. This value represents a random classifier. On the other hand, a value of 1 represents an optimal classifier. Figure 1 presents the ROC curves obtained for the proposed model. The diagonal line indicates the curve is a random (null) model with an AUC of 0.5.
Receiver operating characteristic (ROC) curve for predicting efficacy of (A) 0.3, (B) 0.5, and (C) 0.8. AUC = area under the curve
All patients underwent either LASIK or PRK in a similar manner. The decision to perform LASIK or PRK was left to the discretion of the surgeon. The common practice in our institution is not to perform LASIK when the central corneal thickness is less than 500 μm. One drop of a topical anesthetic (benoxinate hydrochloride 0.4%) was instilled in the conjunctival fornix of the eye prior to surgery, after which an eyelid speculum was inserted. A Moria microkeratome (Moria SA, Antony, France) with a thickness plate of 90 μm was used to create the flap in LASIK cases and epithelial removal was performed via alcohol-assisted PRK (20% ethyl alcohol placed on the cornea for 15 seconds) in PRK cases. Following flap creation (LASIK) or epithelial removal (PRK), the WaveLight Allegretto Wave (EX200) excimer laser system (Alcon Laboratories, Inc., Hüenberg, Switzerland) was used. In all PRK cases, a sponge soaked with 0.02% mitomycin C was placed on the stroma for 20 to 60 seconds (depending on the amount of ablation) following excimer ablation and a contact lens was placed after rinsing the mitomycin C. Following surgery, patients were prescribed moxifloxacin 0.5% (four times a day), dexamethasone 0.1% (two or four times a day), and artificial tears (four times a day). Patients were examined at 1 day, 1 week, 1, 3, and 6 months postoperatively, and more if necessary.
Statistical analysis was performed using SPSS software (version 23; IBM Corporation, Armonk, NY). Clinical parameters distribution was tested for normality by the Shapiro–Wilk test. Because none of the continuous parameters distributed normally, we used the Wilcoxon signed-rank test and the Mann–Whitney U test for comparison of related and unrelated variables, respectively. We conducted the chi-square test for categorical variables using MedCalc software (version 12.5; MedCalc Software, Inc., Mariakerke, Belgium). Spearman correlation was calculated for continuous variables. A P value of less than .05 on a two-sided test was considered statistically significant.
The computerized query yielded 43,257 cases, of which 17,592 met our criteria for inclusion and exclusion. The patients' mean age was 26.5 ± 5.3 years (range: 20 to 40 years), 9,381 (53.5%) were men and 8,211 were women (46.5%), and 8,879 (50.5%) of cases included the right eye and 8,713 (55.5%) included the left eye. PRK was conducted in 7,829 (44.5%) of cases and LASIK in the remaining 9,763 (55.5%). Preoperative CDVA was 1.03 ± 0.18 decimal (Snellen 20/20) and UDVA was 0.0 ± 0.08 decimal (Snellen 20/200). The UDVA following surgery improved to 0.97 ± 0.21 decimal (Snellen 20/20.6) (P < .0001). Efficacy of 0.7 or greater and 0.8 or greater was achieved in 16,198 (92.0%) and 14,945 (84.9%) eyes, respectively. Efficacy of less than 0.4 and less than 0.5 was achieved in 322 (1.8%) and 506 (2.9%) eyes, respectively. Analysis by surgery type revealed that LASIK and PRK were effective, with both achieving very high efficacy (PRK: 0.97 ± 0.19, LASIK: 0.95 ± 0.22, P < .001).
Table 1 depicts the differences between the low efficacy and high efficacy groups. The groups had statistically significant differences, but were clinically similar. For example, patients in the low efficacy group were some-what older (27.2 ± 5.8 vs 26.5 ± 5.7 years, P = .01), with smaller scotopic pupil size (5.29 ± 1.30 vs 5.74 ± 1.10 mm, P = .001) and lower treatment parameters for both the sphere (−2.36 ± 2.19 vs −3.38 ± 2.04 D, P < .001) and the cylinder (−0.11 ± 0.78 vs −0.48 ± 0.77 D, P < .0001).
Biometrics, Refractive, and Treated Parameters for All Cases and by Efficacy
Correlations analysis revealed significantly decreased efficacy with increased age (r = −0.67, P < .001), central corneal thickness (r = −0.40, P < .001), mean keratometry (r = −0.33, P < .001), and preoperative CDVA (r = −0.47, P < .001). Efficacy increased with pupil size (r = 0.20, P < .001).
Based on clinical parameters available to us (demographic, biometric, and preoperative data), we predicted the postoperative efficacy for each patient. The predicted efficacy variable correlated well with the actual post-operative efficacy (r = 0.72, P < .001). The efficacy prediction variable was higher than 0.7 in only 173 cases (0.009% of our sample), whereas the actual efficacy was lower than 0.5. Moreover, frequency analysis of efficacy (actual efficacy – predictive efficacy) showed that 11,122 (63.2%) of all cases were within ±10%, 15,019 (85.37%) were within ±20%, and 16,297 (92.63%) were within ±30% (Figure 2). We could not find any clinical meaningful difference between patients with high inaccurate prediction (> ±30%) and those with a more accurate prediction.
Frequency histogram of the differences between actual and predictive efficacy.
Figure 1 presents the ROC curve obtained for the proposed model for efficacy of 0.3 (AUC = 0.943), 0.5 (AUC = 0.9113), and 0.8 (AUC = 0.887). We extracted the specificity and sensitivity from each graph at the point that maximizes their average (ie, [specificity + sensitivity] / 2). The results of each model were high (Table 2).
Area Under the Curve (AUC), Sensitivity, and Specificity at Various Prediction Efficacy Cut-off Points
Analysis of features that most affected the model for binary classification (achieved efficacy 0.5 or not) revealed that the preoperative subjective CDVA had the highest gain (ie, more important for generating the prediction) and was followed by surgical treatment parameters (the amount of sphere and cylinder treatment [D]).
In this study, a statistical classifier algorithm was used to maximize the use of big data. This study demonstrates several interesting clinical factors that may contribute to unsatisfactory results. Patients in the low efficacy group had statistically significant differences when compared with the high efficacy group, but the groups were similar clinically.
The top feature that most affected the model was the preoperative subjective CDVA. Aaron et al. suggested that retinal magnification due to moving the refractive correction from spectacle plane to the cornea and regression to the mean may explain why preoperative acuity predicts postoperative acuity in wavefront-guided LASIK.20
By using the ROC curve, we were able to demonstrate how well our model located cases that reached an efficacy of 0.3, 0.5, or 0.8. The high AUC values suggest a high prediction rate to assess a dichotomous outcome (a random mode will achieve a value of 0.5 with a maximum value possible of 1.0).21
It may be that our approach of big data analysis is most suitable to detect rare outcomes, as in modern refractive surgery. A single clinical parameter, by itself, may contribute to only a small risk for failure. However, using a comprehensive approach and modeling increment of risk induced by the various clinical parameters, this study was able to detect patients who may have a worse outcome. It should be noted that the data were generated from a single center only. Care should be taken when transferring our results to other institutions. Therefore, we advise surgeons to conduct individual statistical analysis to improve quality management and outcomes of their future surgeries. We believe that the high accuracy achieved by the automated classification resulted from evaluation of multiple clinical parameters. The automated classification may assist in the detection of high-risk patients.
- Vestergaard AH. Past and present of corneal refractive surgery: a retrospective study of long-term results after photorefractive keratectomy and a prospective study of refractive lenticule extraction. Acta Ophthalmol. 2014;92:1–21. doi:10.1111/aos.12385 [CrossRef]
- Moreno-Barriuso E, Lloves JM, Marcos S, Navarro R, Llorente L, Barbero S. Ocular aberrations before and after myopic corneal refractive surgery: LASIK-induced changes measured with laser ray tracing. Invest Ophthalmol Vis Sci. 2001;42:1396–1403.
- Moshirfar M, Simpson RG, Dave SB, et al. Sources of medical error in refractive surgery. J Refract Surg. 2013;29:303–310. doi:10.3928/1081597X-20130415-01 [CrossRef]
- Mimouni M, Vainer I, Shapira Y, et al. Factors predicting the need for retreatment after laser refractive surgery. Cornea. 2016;35:607–612. doi:10.1097/ICO.0000000000000795 [CrossRef]
- Larranaga P, Calvo B, Santana R, et al. Machine learning in bio-informatics. Brief Bioinform. 2006;7:86–112. doi:10.1093/bib/bbk007 [CrossRef]
- Thomas G, Grassi MA, Lee JR, et al. IDOCS: intelligent distributed ontology consensus system—the use of machine learning in retina drusen phenotyping. Invest Ophthalmol Vis Sci. 2007;48:2278–2284. doi:10.1167/iovs.06-1022 [CrossRef]
- Barella KA, Costa VP, Goncalves Vidotti V, Silva FR, Dias M, Gomi ES. Glaucoma diagnostic accuracy of machine learning classifiers using retinal nerve fiber layer and optic nerve data from SD-OCT. J Ophthalmol. 2013;2013:789129. doi:10.1155/2013/789129 [CrossRef]
- Sugimoto K, Murata H, Hirasawa H, Aihara M, Mayama C, Asaoka R. Cross-sectional study: does combining optical coherence tomography measurements using the ‘Random Forest’ decision tree classifier improve the prediction of the presence of perimetric deterioration in glaucoma suspects?BMJ Open. 2013;3:e003114. doi:10.1136/bmjopen-2013-003114 [CrossRef]
- Souza MB, Medeiros FW, Souza DB, Garcia R, Alves MR. Evaluation of machine learning classifiers in keratoconus detection from orbscan II examinations. Clinics (Sao Paulo). 2010;65:1223–1228. doi:10.1590/S1807-59322010001200002 [CrossRef]
- Rokach L. Decision forest: twenty years of research. Information Fusion. 2016;27:111–125. doi:10.1016/j.inffus.2015.06.005 [CrossRef]
- Kuhn L, Page K, Ward J, Worrall-Carter L. The process and utility of classification and regression tree methodology in nursing research. J Adv Nursing. 2014;70:1276–1286. doi:10.1111/jan.12288 [CrossRef]
- Deng H, Runger G, Tuv E, Vladimir M. A time series forest for classification and feature extraction. Information Sciences. 2013;239:142–153. doi:10.1016/j.ins.2013.02.030 [CrossRef]
- Natekin A, Knoll A. Gradient boosting machines, a tutorial. Frontiers in Neurorobotics. 2013;7:21. doi:10.3389/fnbot.2013.00021 [CrossRef]
- Chen T, Guestrin C. Xgboost: a scalable tree boosting system. arXiv preprint arXiv:160302754 2016.
- Rashmi K, Gilad-Bachrach R. Dart: dropouts meet multiple additive regression trees. arXiv preprint arXiv:150501866 2015.
- Babajide Mustapha I, Saeed F. Bioactive molecule prediction using extreme gradient boosting. Molecules. 2016;21:E983. doi:10.3390/molecules21080983 [CrossRef]
- Munk MR, Jampol LM, Simader C, et al. Differentiation of diabetic macular edema from pseudophakic cystoid macular edema by spectral-domain optical coherence tomography differentiation of DME and pseudophakic CME. Invest Ophthalmol Vis Sci. 2015;56:6724–6733. doi:10.1167/iovs.15-17042 [CrossRef]
- Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian Journal of Internal Medicine. 2013;4:627–635.
- Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition. 1997;30:1145–1159. doi:10.1016/S0031-3203(96)00142-2 [CrossRef]
- Aaron MT, Applegate RA, Porter J, et al. Why preoperative acuity predicts postoperative acuity in wavefront-guided LASIK. Optom Vis Sci. 2010;87:861–866. doi:10.1097/OPX.0b013e3181f6fb49 [CrossRef]
- Florkowski CM. Sensitivity, specificity, receiver-operating characteristic (ROC) curves and likelihood ratios: communicating the performance of diagnostic tests. Clin Biochem Rev. 2008;29(suppl 1):S83–S87.
Biometrics, Refractive, and Treated Parameters for All Cases and by Efficacy
|Parameter||All Cases||Low Efficacy||High Efficacy||P|
|% (no.)||100 (17,592)||1.83 (322)||84.95 (14,945)|
|Age (y)||26.52 ± 5.29||27.25 ± 5.86||26.47 ± 5.26||.02|
|Gender % female||46.7||50.0||46.5||.23|
|Eye % left||49.5||55.0||49.4||.05|
|Surgery % LASIK||55.5||61.2||54.59||.02|
|Preoperative CDVA (decimal)||1.02 ± 0.18||1.23 ± 0.67||1.00 ± 0.13||< .0001|
|Preoperative UDVA (decimal)||0.00 ± 0.08||0.02 ± 0.12||0.00 ± 0.07||.10|
|Postoperative UDVA (decimal)||0.97 ± 0.21||0.84 ± 0.19||0.98 ± 0.21||< .0001|
|Efficacy||0.96 ± 0.21||0.23 ± 0.09||1.02 ± 0.14||< .0001|
|Preoperative mean K (D)||43.51 ± 1.52||43.59 ± 1.65||43.50 ± 1.51||.43|
|Preoperative subjective cylinder (D)||−0.45 ± 0.77||−0.72 ± 0.85||−0.70 ± 0.82||.33|
|Preoperative subjective cylinder axis (D)||21.90 ± 26.67||26.08 ± 27.73||24.44 ± 26.50||.006|
|Preoperative subjective sphere (D)||−2.94 ± 1.89||−2.89 ± 1.79||−2.93 ± 1.88||.68|
|Preoperative spherical equivalent (D)||−3.71 ± 1.95||−3.72 ± 2.04||−3.69 ± 1.93||.80|
|Preoperative pachymetry (μm)||537.07 ± 35.48||529.18 ± 35.93||532.25 ± 35.16||.50|
|Scotopic pupil size (mm)||5.68 ± 1.12||5.29 ± 1.30||5.74 ± 1.10||.001|
|Optical zone (mm)||6.12 ± 0.35||6.03 ± 0.32||6.13 ± 0.35||< .0001|
|Treatment zone (mm)||8.24 ± 0.98||8.04 ± 1.03||8.23 ± 0.98||.02|
|Room humidity (%)||37.74 ± 1.46||38.29 ± 1.72||37.67 ± 1.44||< .0001|
|Room temperature (°C)||22.86 ± 1.27||22.28 ± 1.71||22.90 ± 1.24||< .0001|
|Actual treatment sphere (D)||−3.35 ± 2.07||−2.36 ± 2.19||−3.38 ± 2.04||< .0001|
|Actual treatment cylinder (D)||−0.45 ± 0.77||−0.11 ± 0.78||−0.48 ± 0.77||< .0001|
|Actual treatment cylinder axis (D)||28.74 ± 27.21||31.33 ± 28.14||28.61 ± 27.15||.14|
Area Under the Curve (AUC), Sensitivity, and Specificity at Various Prediction Efficacy Cut-off Points