Future of thyroid nodule classification driven by AI, machine learning as support tools
The detection and diagnosis of thyroid nodules may be on the cusp of a technological overhaul as a growing body of research on artificial intelligence and machine learning tools aims to bring about more efficiency and accuracy while decreasing cost and improving ease-of-use.
As an example, in a 2019 study published in Thyroid, researchers showed that a machine learning algorithm was able to diagnose thyroid nodules as well if not better than radiologists. Despite such promising results, the study of these types of tools is still ongoing and it may take some time before they become more widely available.
“[Computer analysis of thyroid images] is not gaining much traction in the U.S. at present,” Jennifer Sipos, MD, professor of medicine and director of the benign thyroid disorders program in the division of endocrinology at Ohio State University Wexner Medical Center, told Healio. “That’s not to say that it couldn’t change.”
If that change is to come, there is still work to be done, especially when it comes to determining where exactly these tools would fit in the treatment process and how physicians would utilize them.
“Ultimately, I think the human eye just cannot be replaced,” Sipos said. “As a sonographer, I am admittedly biased, but I do believe there is a great deal of importance in physician experience and this cannot be replaced by a machine.”
AI support system
But these tools can be developed not to replace physician diagnosis, but to enhance it, according to Johnson Thomas, MD, FACE, section chair in the department of endocrinology at Mercy in Springfield, Missouri. Thomas and colleagues have been working on such a system for nearly half a decade, with promising results recently presented at the 2019 American Thyroid Association annual meeting.
Sparked by his own interest and background in computer coding, Thomas began working on a machine learning tool in 2015. The original tool was what Thomas calls an image classification algorithm. The system estimated the likelihood that a particular imaged thyroid nodule would display cancerous characteristics. Although this tool is available only for research purposes, it can be found at TUMscore.com.
The machine learning tool was just the first step of a larger journey for Thomas, who still sees some shortcomings of the system.
“That didn’t really solve the problem. It’s still subjective,” Thomas told Healio. “Someone has to look at that picture and make this determination.”
In the next step, Thomas attempted to decipher just what a system was looking at when making a diagnosis by using heat maps. Once again, this system proved to be a progressive step but one that still came up short.
“It’s kind of explainable, but what we have found out is that in many confusing cases the heat map doesn’t help us, so then again it just becomes a black box algorithm,” Thomas said. “It might be explainable to the AI model, it might be looking at the right place, but when a physician looks at the heat map, he or she cannot make any sense out of it.”
Making a system that is explainable was a prime motivating factor for Thomas’ most recent work with what he calls an image similarity model. This AI-powered system — called AIBX — allows a physician to take an ultrasound picture of a thyroid nodule, process it through the algorithm and receive images with similar features as well as their ultimate diagnoses.
“If you think about it, this is exactly what any AI algorithm does. It cannot really say [whether a nodule is] cancer or benign,” Thomas said. “It’s actually comparing it to other images and making an assumption based on the available data.”
The algorithm was originally tested with 103 thyroid nodules against a collection of images from 482 nodules. In the results Thomas presented at the 2019 ATA annual meeting, the model had a 93.2% negative predictive value, 87.8% sensitivity, 78.5% specificity, 81.5% accuracy and a 65.9% positive predictive value. That final metric stands out as one to be excited about, as the value outstripped most others, according to Thomas.
“In our case, we also had good positive predictive values. Conventional classification systems have high negative predictive values, but then they had low positive predictive value that still ended up doing more, making us do more biopsies,” Thomas said. “A system with a good negative predictive value and [a] good positive predictive value will hopefully decrease the amount of biopsies that we have to do.”
Show your work
Even with these positive signs, Thomas said he does not imagine that this system will be a replacement for physician review but rather a decision support tool. The system ostensibly shows its work and makes the conclusions more explainable.
“Instead of just giving that output, we are saying these images are similar to the image that you just submitted and it either looks benign or it looks malignant,” Thomas said. “We can only store so much information in our brain, and we cannot identify all these patterns and keep that in our brain. This algorithm pulls those things out from the database and kind of works with what a normal human will do.”
This potentially addresses the concern that Sipos and other physicians may have about these tools replacing human testing. Instead, Thomas argues, they would lend support to experienced physicians and begin to yield benefits in efficiency that are sorely needed.
“Essentially, I see it as beneficial to take the low risk and clearly benign nodules out of the ‘system’ so that the limited resources can be focused on a nodule with more concerning ultrasound features,” Sipos said.
Beyond how it could be used in clinical use, Thomas’ system has the potential to be used as an educational tool.
“If a physician wants to teach students, he has to go with what image is currently available in front of him or the patient who is available,” Thomas said. “But with this, you can pull up the same characteristics in hundreds of nodules and then show that to the trainee, and that’s more beneficial than just going with one case.”
Educational utility is not unique to Thomas’ system, however, and is a potential benefit for other AI and machine learning tools.
“It would be interesting to see if this could be used to train younger providers as they gain experience and volume in terms of exams performed,” Sipos said.
There are still downsides to Thomas’ system and others like it. For Thomas’ system specifically, although there is very little overhead in terms of necessary equipment, cost is a concern.
“I’m not sure how that will pan out, but that’s a discussion that’s happening in medical AI all around the world,” Thomas said. “How is this going to be reimbursed?”
In addition, Thomas noted that his AI system takes slightly longer to process than some others, although physicians would still be looking at about 1 second or less in terms of wait time. Plus, all testing has been done locally and has yet to be verified in other institutions with additional images.
From the page to practice
The potential downsides of these types of technology will continue to be addressed in the years to come. These tools are still populating research articles and not actual clinical practice. Thomas predicts that there is still 1 or 2 years left before the system he presented could potentially be commercially available, especially as he and his team work to get FDA approval. Additionally, he noted that more research with more included images must be conducted to further fine-tune the algorithm.
For Sipos, future research should compare these systems with the more accepted procedures and demonstrate flexibility to adapt to a given circumstance.
“It is still very important for a clinician to be involved in the ‘interpretation’ of these ultrasound exams read by AI, even if the technology were to supplant the need for a human eye to read it,” Sipos said. “The decision-making must still be defined within the context of the individual clinical scenario, accounting for the patient preferences, demographics and comorbid conditions.”
Thomas said he is even more optimistic about the future, indicating that ultrasound equipment could eventually incorporate these models out of the box.
“It’s still a developing field, but it is developing at a rapid pace,” Thomas said. “It will definitely have a place in the diagnosis and management of thyroid nodules in the future, that’s for sure, but when that’s going to happen, that’s the question.” – by Phil Neuffer
Thomas J. Oral 27: AIBX, A Deep Learning Model to Classify Thyroid Ultrasound Images. Presented at: 89th Annual Meeting of the American Thyroid Association; Oct. 30-Nov. 3, 2019; Chicago.
For more information:
Jennifer Sipos, MD, can be reached at Jennifer.Sipos@osumc.edu.
Johnson Thomas, MD, FACE, can be reached at 3231 S. National Ave, Suite 440, Springfield, MO 65807; email: firstname.lastname@example.org.
Disclosures: Sipos and Thomas report no relevant financial disclosures.