From Wilmer Eye Institute (MM, FPK, IZ-G, HJ, RZ), Johns Hopkins School of Medicine, Baltimore, Maryland; and the Department of Physics (MM), University of Siena, Siena, Italy.
Supported in part by grants ROI EY0 16133 (RZ), ROI EY0 17053 (IZ-G), and core grant P30 EY01765 from the National Eye Institute Bethesda, Maryland, and the Wilmer Eye Institute Telemedicine Fund.
The authors have no financial or proprietary interest in the materials presented herein.
Address correspondence to Ran Zeimer, PhD, Wilmer Eye Institute, Wilmer/Woods 355, Johns Hopkins University, 600 N. Wolfe Street, Baltimore, MD 21287-9131. E-mail: firstname.lastname@example.org
A fundus camera or retinal camera is a specialized low-power microscope with an attached camera designed to photograph the interior of the eye, including the retina, optic disc, macula, and posterior pole (ie, the fundus). Fundus cameras are used by optometrists, ophthalmologists, and trained medical professionals for diagnosis, disease management, and screening programs. Despite their well-established utility, fundus cameras are not widely used in the offices of eye care providers because of their cost, need for pharmacological pupil dilation, and operation by an ophthalmic photographer or well-trained staff member. In this article, we will address the last obstacle and propose a method to simplify the operation in an effort to allow use by staff typically available in offices of eye care providers.
Fundus cameras differ from other imaging systems in that the illumination and image paths must pass through the pupil of the eye. This implies that, to acquire an image of the fundus, numerous conditions need to be met simultaneously: (1) alignment of the pupil center with the optical axis of the camera; (2) placement of the pupil at a well-defined working distance; (3) focusing of the fundus; and (4) imaging of the desired region of the fundus. The latter requirement can be achieved in most cases by presenting the subject with a fixation target. Methods to simplify focusing of the fundus are in use in systems such as auto-refractors and non-mydriatic fundus cameras. We have described a machine vision method that can be implemented in existing fundus cameras without modification of the optics.1 The goal of this report is to address the first two requirements by reporting on a method to track the pupil simultaneously in the horizontal, vertical, and longitudinal axes.
Eye-tracking has been developed for applications such as pupil monitoring, gaze assessment, and accurate pupil center determination.2–4 Several techniques have achieved satisfactory results in their goal of assessing the location of the pupil center along the vertical and horizontal axes, but they do not address the need of determining the longitudinal coordinate, which is necessary in fundus photography to position the eye at the proper working distance relative to the objective lens. Moreover, most algorithms have not been designed to provide the fast tracking necessary to image the fundus within the time delay for a voluntary eye movement (approximately 250 ms). Therefore, we report on an algorithm specifically geared to meet the positioning and time specifications of fundus imaging.
Patients and Methods
Parallax-Based Optical Alignment
As mentioned, proper alignment between the optics of the eye and fundus imaging instruments requires achieving, simultaneously, centering on the pupil and adjustment to a preset working distance (distance between the objective and the pupil).
The system is based on two cameras viewing the same field from two distinct directions (sometimes referred to as epipolar geometry5) as shown in the schematic drawing of Figure 1. The two cameras are coupled to the objective lens so that the three optical axes intercept at point F located at the nominal working distance. When the pupil is located at F, its images acquired by camera L and R are centered. A shift of the pupil horizontally (up/down of point F; Fig. 1) is accompanied by a shift of the pupil image in the same direction for both cameras. Similarly, a shift of the pupil vertically (out of the paper point F) yields an equal vertical shift in the images. In contrast, a shift away from the working distance yields opposite horizontal shifts in the images of the two cameras, right and left for cameras L and R, respectively. The shifts are opposite for a pupil position closer to the objective. Thus, positions of the pupil away from nominal point F can be derived from the deviations of the two pupil images from the center of their respective camera field of view.
Figure 1. The Schematic Drawing of the Tracking System. The Two Cameras (L and R) Are Coupled to the Objective Lens (O). The Three Optical Axes Intercept at Point F Located at the Nominal Working Distance.
In practice, there is a small offset from the camera center. The offset is measured during original assembly by focusing the illumination ring on a screen placed at the nominal working distance of the fundus camera and registering the coordinates of its center on the two webcams. The values are then entered in the calculations performed by the tracking routine.
The cameras can be low-cost digital cameras such as those used as webcams. In this report, webcams with a 640 × 480 pixels VGA CMOS sensor (WebCam Live! Creative Labs, Inc., Singapore) were used. To obtain the desired field of view, the original lenses were replaced with off-the-shelf 12-mm focal length lenses (Finite Conjugate Micro Video lens TECHSPEC; Edmund Optics Inc., Barrington, NJ). The infrared filter normally placed on the sensor was removed to extend the wavelength response curve toward the infrared.
The illumination was designed to avoid interference with the fundus imaging camera, optimize the contrast of the pupil, reduce image variability among subjects, and avoid reflections on the cornea. This was achieved by using infrared diode lasers (LTE-4228U; Liteon, Taiwan) delivering a beam parallel to the axis of the camera. The reflections typically seen on the cornea were cancelled by placing a polarizer (vis 700 BC3 C633; CODIXX, Barleben, Germany) in front of the diode lasers and cross polarized ones in front of the cameras.
Data Processing and Motor Control
The process, described by flow charts in Figures 2 and 3, consists of two functions: detection of pupil center (Fig. 2) and tracking (Fig. 3). As shown in Figure 2, images of 320 × 240 pixels size are continuously acquired by the two webcams (A and B). The region of interest (ROI) of 107 × 107 pixels on which the pupil center algorithm operates (shown by the gray squares in A and B) is extracted from the images (D). The autonomous pupil center algorithm, described in Figure 4, operates on the ROIs and derives the horizontal (xa and xb) and vertical (ya and yb) coordinates of the pupil center for images A and B, respectively, and displays the results as a mark on the pupil images Figure 4 (R). This algorithm is based on thresholding intensity profiles. In (O), the rows are summed up to generate an intensity profile that is thresholded at a preset value (30%). The coordinates of all of the points above threshold are collected in an array and their mean is taken as the x coordinate of the pupil center (xa or xb). The intensity of the pixels along a vertical center line passing through xa or xb is then plotted and the coordinates of all of the points with a profile exceeding a second threshold (50%) are collected in an array and their mean is taken as the ya or yb coordinates of the pupil center (P). The process ends with the pupil center (Q) being marked on the image (R).
Figure 2. Pupil Center Detection. An Automated Algorithm to Track the Pupil Was Developed on a Matlab (Mathworks, Natick, MA) Platform. Specifically, It Analyzes the Digital Images to Detect the Pupil Center and Moves the Optic System Through the Control of Stepper Motors. The Process Consists of the Following Steps: (C) Images Are Acquired by the Left (A) and Right (B) Webcams, Respectively, and (D) a Region of Interest (ROI) Is Defined. The Operator Confirms or Overwrites the Area to Compute and (E) the ROI Is Analyzed to Find the Pupil Center.
Figure 3. Pupil Tracking. (F) The Operator Accepts the Pupil Center Coordinates of Figure 2 (E). The Pupil Center Detection Algorithm Refreshes the Data (G). A Quality Control Algorithm Checks that the Coordinates Are Within a Tolerance Value from the True Center. If So, the Operator Is Given Permission to Activate the Fundus Image Acquisition Sequence (I). If Not, the Stage Is Moved According to the Deviation from the True Center (H).
Figure 4. Autonomous Pupil Center Algorithm. The Region of Interest (ROI) Provided by Figure 2 (D) Is Used to Calculate the Sum of Rows (O) and Columns (P) and Plot Them as a Profile. The Mean Coordinate of the Points that Exceed a Threshold Is Considered the Vertical and Horizontal Pupil Centers (Q) Marked on the ROI Image (R).
Back to loop C to E, the autonomous processes can be overridden by the operator when the mark on the display deviates markedly from the pupil center. By placing the cursor inside the pupil, the operator relocates the ROI.
Once the operator is satisfied with the pupil center mark, he or she activates process F to I. The coordinates (xa, ya or xb, yb) are compared to the values corresponding to perfect pupil alignment. If the values are within a preset tolerance (5 pixels corresponding to 0.5 mm in the pupil), the imaging process is activated (I). Otherwise, the coordinates are passed to subroutine H, which converts them into motion commands for the three motors of the xyz table. To avoid situations in which the process is not convergent and thereby remains in a loop, the operator can override and initiate image acquisition (I).
The flowcharts indicate that, except for operator approval in step F, the process is autonomous but can be overridden by the operator.
Accuracy of Pupil Center Detection
The accuracy was defined as the agreement between the coordinates derived manually and by the algorithm. Following the method of Bland and Altman,6 the relative difference of the two measures was plotted over range of values. If it is found to be constant, the agreement can be gauged by the “95% limits of agreement”; namely, the range within 95% of the differences will fall. The middle of this range is defined as the “bias.”
Derivation of Optical Alignment
The coordinates of the pupil center on the images of the two webcams (xa, ya, xb, and yb), expressed in pixels, can be translated into misalignment of the optics with the use of the following equations:
where M is the magnification and θ is the angle between the optical axes of the two webcams viewed from the working distance (Fig. 1
The system was pilot tested in a glaucoma clinic in accordance with the Declaration of Helsinki for research involving human subjects and after approval by the Institutional Review Board of the Johns Hopkins University School of Medicine. An effort was made to recruit consecutive subjects without any entry criteria other than signature of an informed consent.
The cohort consisted of 45 patients with a mean age of 61 ± 15 years. Twenty-six patients were considered to have glaucoma by the referring clinician (HJ) and 19 were diagnosed as normal (HJ). Twenty-two patients were male and 23 were female. Regarding ethnic distribution, 36 patients were white, 6 were African American, 1 was Asian, and 2 were other.
The study was performed on the right eyes in a non-mydriatic mode (without pharmacological pupil dilation). The subjects were presented the same fixation target corresponding, on the fundus, to the optic disc region.
The processing time of the algorithm to find the pupil center was derived by Matlab and found to be 0.75 msec with an Intel Core 2 Duo CPU T8100 (Matlab, Natick, MA) operating at 2.1 GHz.
The pupil images shown in Figure 4 illustrate that, typically, good contrast was achieved between the pupil and its vicinity. The eyelid appeared white under the infrared illumination regardless of its pigmentation in visible light. The main difference between eyes was due to eyelash make-up, which appeared dark on the image. Occasionally, the subject did not lean adequately on the nose pad, which then cast a shadow. This was easily remedied by instructing the subject to lean correctly on the nose pad.
The diameter of the pupil was derived from the images and found to be 45 ± 11 pixels (range: 20 to 70 pixels), corresponding to 4.5 ± 1.1 mm (range: 2 to 7 mm).
The tracking converged to within the preset tolerance in 45 of the 45 eyes. In other words, the difference between the pupil center derived by the algorithm and the nominal position on the image was reduced to less than 5 pixels, corresponding to 0.5 mm at the pupil on completion of the loop described in Figures 2 to 4. Typically, three iterations were sufficient once the pupil was within the gray zone (ROI) on the two webcams.
The rapidity of the tracking process was evaluated by the time elapsed between two consecutive alignments and tracking sequences at the same fixation point. This time included review by the operator of the result of the first event, tracking of the eye in case it had moved acquisition of pupil images, and saving to file. The time between these two consecutive events was 33 ± 11 seconds.
Accuracy of Pupil Center Detection
As mentioned, the accuracy was taken as the difference between the coordinates of the pupil center derived manually and by the algorithm, respectively, for each of the two webcams. The Bland–Altman plots (Figs. 5 to 8) indicate that the difference does not depend on the coordinate value, and thus it is legitimate to use the limit of agreement as a descriptor. The results yielded values of 0.18 and 0.19 mm (mean = 0.19 mm) for the Y and 0.29 and 0.36 (mean = 0.33 mm) for the X standard deviations, respectively, for the two cameras. The bias was −0.12 and 0.02 mm, respectively, for the two cameras. The variability in the working distance can be estimated by the combination of the horizontal and vertical variabilities, which yields a value of 0.26 mm.
Figure 5. Webcam a Coordinate Xa. bias = 0.0 and σ = 0.36, (inferior Limit = −0.71, Superior Limit = 0.71) mm.
Figure 8. Webcam B Coordinate ya. bias = −0.12 and σ = 0.19, (inferior Limit = −0.50, Superior Limit = 0.25) mm.
Fundus photography has long been a major tool of eye care specialists and remains so despite the introduction of new technologies such as optical coherence tomography and scanning laser ophthalmoscopes. Its wide use in specialty clinics has justified the service of well-trained professionals such as ophthalmic photographers who are comfortable with the use of sophisticated instrumentation. There is a growing national drive for a more widespread delivery of care because most individuals with nonsymptomatic but sight-threatening disease do not visit specialty clinics, resulting in an increase in the risk of vision loss and cost of therapy. A low-cost fundus camera that could be easily operated by available office staff would allow acquisition of diagnostic data to be interpreted locally by the physician or by an expert reading center via telehealth.
The practicality of this approach has been demonstrated by the implementation of a system developed by us (the DigiScope), which has been used in screening for retinal disease in more than 100,000 patients with diabetes mellitus who visited their primary care physicians. A fundus camera that could image almost all subjects without pharmacological dilation of the pupil would widen the implementation and potentially extend the application to the three most common causes of treatable vision loss: diabetes mellitus, glaucoma, and age-related macular degeneration. To capture 95% of the population, imaging should be achieved through pupils as small as 3.5 mm. To achieve this goal, there is a need for a supervised automated system capable of tracking the pupil and simplifying the task of the operator.
The approach we describe in this report takes into account specific practical needs. The illumination is in infrared not only to make it invisible to the patient, but also to minimize the effect of skin pigmentation. The results indicate that the contrast between the pupil and the eyelids is similar in dark- and light-skinned subjects. The images (Fig. 4) show that cross polarization was effective in eliminating corneal reflections from the infrared diode lasers and thereby simplifying image processing. Parallax imaging of the pupil yields simultaneous information on all three axes of motion (lateral, vertical, and axial) necessary for optical alignment of the optical system. Although simple, our technique has not been reported. Finally, the acquisition of pupil images during the session could help the operator assess the reason for poor quality of fundus images and decide to retake the image.
Our approach is a compromise between full automation and manual operation. Experience with the DigiScope has shown that, with full automation, the operators consider themselves uninvolved with full automation and in the few occasions that the algorithm fails they do not know how to proceed and rapidly come to the conclusion that the instrument is not reliable. On the other hand, when the operators are given the role of supervising the automated operation they develop a vested interest in the results and proceed correctly when the algorithm occasionally fails. Their most important role is to recognize failure and repeat the operation rather than to troubleshoot, as is shown in the flowchart (Fig. 2 to 4).
The tests were performed on a cohort that could be relevant to the intended use. The age range was large enough to include working-age individuals in whom diabetes mellitus is of concern and older subjects who are at risk of macular degeneration. Patients with glaucoma were included because they tend to have smaller pupils due to age and the effect of long-term therapy. Whites and African Americans were included to assess the influence of pigmentation on the image of the eyelids and iris. The 2- to 7-mm range of pupil sizes in this study cohort is consistent with the natural pupil size under room light.7
The results indicate that the processing time of less than a millisecond is negligible in comparison to the image acquisition rate and the time needed to move the stage.
The results are promising in that all eyes were successfully tracked despite a tight tolerance of 0.5 mm for the center and 1.25 mm for the working distance. The process yielded a variability of 0.26 mm in the determination of pupil center compared to the pupil size (2 to 7 mm in the cohort). The adequacy of the working distance variability (estimated as ± 0.26 mm) can be judged by comparison to the anterior chamber depth of 3.5 mm.
The system tested in this report was limited by the fact that the image acquisition rate was 1 fps due to the slow image display rate of the Matlab tool. There is no doubt that this limitation will be overcome in upcoming updates.
We would like to conclude with a practical note. The proposed system could be integrated into existing ophthalmic instruments with a computer-controlled stage by coupling the optics to the objective, deriving power from USB ports, and compiling the Matlab code into a license-free stand-alone application. The system could also be retrofitted in instruments with manual joystick control by coupling stepper motors to the fundus camera joystick.
- Moscaritolo M, Jampel H, Knezevich F, Zeimer R. An image based auto-focusing algorithm for digital fundus photography. IEEE Transactions on Medical Imaging. 2009;28:1703–1707. doi:10.1109/TMI.2009.2019755 [CrossRef]
- Iskander DR, Collins MJ, Mioschek S, Trunk M. Automatic pupillometry from digital images. IEEE Transactions on Biomedical Engineering. 2004;51:1619–1627. doi:10.1109/TBME.2004.827546 [CrossRef]
- Ji Q, Yang X. Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-Time Imaging. 2002;8:357–377. doi:10.1006/rtim.2002.0279 [CrossRef]
- He X, Shi P. A novel iris segmentation method for hand-held capture device. In: Zhang D., Jain A. K., eds. Advances in Biometrics. Berlin: Springer-Verlag; 2005:479–485.
- Zhang ZY, Deriche R, Faugeras O, Luong QT. A robust technique for matching 2 uncalibrated images through the recovery of the unknown epipolar geometry 9. Artificial Intelligence. 1995;78:87–119. doi:10.1016/0004-3702(95)00022-4 [CrossRef]
- Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–310.
- Loewenfeld IE. Pupillary changes related to age. In: Thompson HS, ed. Topics in Neuro-ophthalmology. Baltimore: Williams & Wilkins; 1979:124–150.
Figure 6. Webcam A Coordinate ya. bias= − 0.02 and σ = 0.18, (inferior Limit = −0.39, Superior Limit = 0.34) mm.
Figure 7. Webcam B Coordinate xa. bias = −0.02 and σ = 0.29, (inferior Limit = −0.55, Superior Limit = 0.61) mm.