In the Journals

Algorithm-based apps show poor performance in detecting skin cancer

Jon Deeks
Jon Deeks

Smartphone applications that use algorithms to assess the risk for skin cancer appear to have poor and variable performance, according to a systematic review of diagnostic accuracy studies.

“Smartphone apps which claim to identify skin cancer are not yet reliable. They may miss many skin cancers,” Jon Deeks, PhD, professor of biostatistics at the University of Birmingham, United Kingdom, told Healio. “App users are not being informed about this risk and should always get medical assessment of any moles which they are concerned about, regardless of what the apps say.”

Deeks and colleagues searched the literature for studies published between August 2016 and April 2019 that explored the use of smartphone apps to examine suspicious skin lesions. For diagnostic accuracy, the studies prospectively recruited a representative sample of patients who used an app on their own device. For verification of skin cancerous lesions, a histological assessment or follow-up was conducted. For verification with expert recommendations, lesions assessed by the app were reassessed in person by a dermatologist, and data were reported for all lesions.

Nine studies evaluating six apps met the eligibility requirements and were included in the analysis. Diagnostic accuracy was evaluated in six of those studies by comparing risk as graded by the app with a reference standard diagnosis; five studies were focused on melanoma detection, and one attempted to differentiate malignant or premalignant lesions and benign lesions. The three other studies verified the app recommendations against a standard of expert recommendations. Only two of the six apps, SkinScan and SkinVision, are still available for use.

The safety of cancer detecting smartphone apps is questionable, according to data.

A single study of 15 lesions using the SkinScan app showed low sensitivity whether moderate risk was combined with the low- or high-risk category. The SkinVision app showed a sensitivity of 73% in a study of 144 pigmented lesions (26 melanomas) when only high-risk results were considered as test positive, but the sensitivity was 26% when 108 pigmented and nonpigmented lesions (35 malignant or premalignant) were included. Only one of three melanomas was correctly assessed by the app as being high risk.

The estimates given by the SkinVision app showed the most accuracy, but overall, the performance was likely to be poor because the studies were small, had poor methodological quality and did not evaluate the apps as they would be used by people in practice.

The smartphone apps did not recruit samples that were representative of the general population, which has a relatively low prevalence of malignant lesions and a range of skin conditions. In addition, image quality was a concern due to suboptimal conditions when using the smartphone camera. Users may also be uncertain of their next steps regarding lesions because of a lack of clarity in app recommendations.

The safety of these algorithm-based apps is questionable given the lack of evidence and limitations in these studies.

“There are three types of research we think are needed,” Deeks said. “First, if the apps could improve their ability to detect cancer, they would be a very useful tool, so research is needed to improve their predictive ability. Second, we need better evaluations of how these apps work in the hands of typical users. Finally, we need to find out whether the apps helpfully inform people about risk and how they behave. If they encourage people to get medical advice early, they are helpful, but if they make people complacent, that could be harmful.”

The current smartphone app assessment of skin lesions is “unsatisfactory.”

“Collectively as a society we must decide what amounts to good evidence when evaluating health apps; who is responsible for generating, validating and appraising this evidence; and how post-market monitoring of regularly updated software should be organized,” Jessica Morley, DataLab policy lead at the Nuffield Department of Primary Care at the University of Oxford, and colleagues wrote in an accompanying editorial.

“Reliable evaluations must find the truth, purchasers must require and use those truths, and regulators and other governing bodies must support and enhance these processes,” they wrote. “Without better information patients, clinicians and other stakeholders cannot be assured of an app’s efficacy and safety.” – by Erin T. Welsh

For more information:

Jon Deeks, PhD, can be reached at University of Birmingham, Birmingham B15 2TT, United Kingdom; email: j.deeks@bham.ac.uk.

Disclosures: Deeks reports he is employed by the University of Birmingham under an NIHR Cochrane Program grant and is supported by the NIHR Birmingham Biomedical Research Centre. Morley reports she is a recent employee of NHSX and has received a research grant from Digital Catapult. Please see the reports for all other authors’ relevant financial disclosures.

Jon Deeks
Jon Deeks

Smartphone applications that use algorithms to assess the risk for skin cancer appear to have poor and variable performance, according to a systematic review of diagnostic accuracy studies.

“Smartphone apps which claim to identify skin cancer are not yet reliable. They may miss many skin cancers,” Jon Deeks, PhD, professor of biostatistics at the University of Birmingham, United Kingdom, told Healio. “App users are not being informed about this risk and should always get medical assessment of any moles which they are concerned about, regardless of what the apps say.”

Deeks and colleagues searched the literature for studies published between August 2016 and April 2019 that explored the use of smartphone apps to examine suspicious skin lesions. For diagnostic accuracy, the studies prospectively recruited a representative sample of patients who used an app on their own device. For verification of skin cancerous lesions, a histological assessment or follow-up was conducted. For verification with expert recommendations, lesions assessed by the app were reassessed in person by a dermatologist, and data were reported for all lesions.

Nine studies evaluating six apps met the eligibility requirements and were included in the analysis. Diagnostic accuracy was evaluated in six of those studies by comparing risk as graded by the app with a reference standard diagnosis; five studies were focused on melanoma detection, and one attempted to differentiate malignant or premalignant lesions and benign lesions. The three other studies verified the app recommendations against a standard of expert recommendations. Only two of the six apps, SkinScan and SkinVision, are still available for use.

The safety of cancer detecting smartphone apps is questionable, according to data.

A single study of 15 lesions using the SkinScan app showed low sensitivity whether moderate risk was combined with the low- or high-risk category. The SkinVision app showed a sensitivity of 73% in a study of 144 pigmented lesions (26 melanomas) when only high-risk results were considered as test positive, but the sensitivity was 26% when 108 pigmented and nonpigmented lesions (35 malignant or premalignant) were included. Only one of three melanomas was correctly assessed by the app as being high risk.

The estimates given by the SkinVision app showed the most accuracy, but overall, the performance was likely to be poor because the studies were small, had poor methodological quality and did not evaluate the apps as they would be used by people in practice.

The smartphone apps did not recruit samples that were representative of the general population, which has a relatively low prevalence of malignant lesions and a range of skin conditions. In addition, image quality was a concern due to suboptimal conditions when using the smartphone camera. Users may also be uncertain of their next steps regarding lesions because of a lack of clarity in app recommendations.

PAGE BREAK

The safety of these algorithm-based apps is questionable given the lack of evidence and limitations in these studies.

“There are three types of research we think are needed,” Deeks said. “First, if the apps could improve their ability to detect cancer, they would be a very useful tool, so research is needed to improve their predictive ability. Second, we need better evaluations of how these apps work in the hands of typical users. Finally, we need to find out whether the apps helpfully inform people about risk and how they behave. If they encourage people to get medical advice early, they are helpful, but if they make people complacent, that could be harmful.”

The current smartphone app assessment of skin lesions is “unsatisfactory.”

“Collectively as a society we must decide what amounts to good evidence when evaluating health apps; who is responsible for generating, validating and appraising this evidence; and how post-market monitoring of regularly updated software should be organized,” Jessica Morley, DataLab policy lead at the Nuffield Department of Primary Care at the University of Oxford, and colleagues wrote in an accompanying editorial.

“Reliable evaluations must find the truth, purchasers must require and use those truths, and regulators and other governing bodies must support and enhance these processes,” they wrote. “Without better information patients, clinicians and other stakeholders cannot be assured of an app’s efficacy and safety.” – by Erin T. Welsh

For more information:

Jon Deeks, PhD, can be reached at University of Birmingham, Birmingham B15 2TT, United Kingdom; email: j.deeks@bham.ac.uk.

Disclosures: Deeks reports he is employed by the University of Birmingham under an NIHR Cochrane Program grant and is supported by the NIHR Birmingham Biomedical Research Centre. Morley reports she is a recent employee of NHSX and has received a research grant from Digital Catapult. Please see the reports for all other authors’ relevant financial disclosures.