May 23, 2019
4 min read

Big data, big errors of interpretation

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact

In this issue, we report the interesting study presented at Genitourinary Cancers Symposium from a strong multidisciplinary uro-oncology group, using the National Cancer Database (NCDB) to study the impact of retroperitoneal node dissection (RPLND) for stage I to stage II seminoma (see related article).

The researchers correctly noted that RPLND is conventionally not used for the management of early-stage seminoma, both because radiation or observation has been viewed as technically easier — both with excellent results — and because of the established morbidity of RPLND (although this has declined substantially in the past 2 decades in experienced hands). It should not be forgotten that the training and quality of the surgeon and caseload experience heavily influence the morbidity and effectiveness of RPLND.

Now, truth in disclosure, I’m one of the old guard who were treating germ cell malignancy before cisplatin was available, the group who fought hard to secure the great results that make this cancer outcome one of the poster children of modern oncology.

Derek Raghavan, MD, PhD, FACP, FRACP, FASCO
Derek Raghavan

If you look back nearly 40 years, you will see my early reports — crafted with teams in the U.K. and U.S. led by Sir Michael Peckham, FMedSci, FRCP, FRCS, FRCR, FRCPath, Elwin Fraley, MD, and Paul Lange, MD, FACS — of pathological predictors of good outcomes in germ cell malignancy, the genesis of active surveillance for clinical stage I germ cell tumors, the use of the Einhorn and other cisplatin-based regimens for testis cancer, and the late effects of these regimens.

So, I have a vested interest in not screwing up the hard-won results, and also in not doing pointless studies in germinal malignancy that might actually add risk. Further, I am proud of being an extreme data cynic with a reasonable understanding of applied biostatistics, and thus my following comments are perhaps a little harsh.

Study limitations

That said — and recognizing that the authors are from an excellent group and have been conservative in their presentation of data — there are some important limitations to this study, which so often characterize the use of these big data sets.

The quality of the data is variable, culled from hundreds of different centers with different quality and training of staff, notwithstanding the broad quality criteria set by NCDB. Starting from a national sample size of 63,727, the final study population is 412, and quite a heterogeneous group! The sample size is even smaller when one excludes the 47 post-chemotherapy cases.


It is noteworthy that the quality of pathology reporting and clinical staging must perforce be varied, given the broad range of practices contributing data, and it is very problematic for interpretation that the N stage is not defined, as all of these cases underwent RPLND.

Given the huge variability of likely surgical expertise, it is probably a merciful relief that these authors did not attempt to address surgical morbidity.

If one studies the survival curves closely, despite the fact that data were culled from 2004 to 2014 (in my math, representing a minimum follow-up of 4 years), the number of patients listed to be at risk at 2 years was only 261, and much smaller at 4 years (n = 160); this means that the quality of close follow-up and attention to detail regarding risk for relapse is not adequate for this type of report.

There is a significant risk for relapse of early-stage germ cell malignancy for the first 3 to 5 years, which is why our follow-up and active surveillance protocols have been designed in the current fashion.

Thus, the NCDB data, as framed in this poster, simply are not adequate to make strong relapse and survival statements — other than falling back on our generic knowledge that patients with early-stage seminoma do well whether treated with radiotherapy or active surveillance.

That, in turn, leads to the question, why undertake this (and other) studies of post-orchiectomy RPLND? Surely not to find a replacement for the PSA screening industry that is increasingly falling into disfavor!

‘Meaningless numbers’

This leads me to the following observations.

This study is appropriately reported, and the authors correctly note that final validation will be achieved by completion of the SEMS and PRIMETEST clinical trials.

However, what isn’t clear is why this work is being done at all. Survival of early-stage seminoma — managed in centers of excellence, with top tumor-specific diagnostic radiology, pathology and clinical protocols — is excellent, both with post-orchiectomy observation or radiotherapy. Further, most of the reported complications of radiotherapy have arisen from earlier eras when much higher doses of treatment were employed.

A sad observation that is probably not confounded by the nature of the database is that, once again, there has been heavy selection against the impoverished and underserved: Two-thirds of the patients had incomes greater than $48,000 per year and 70% were privately insured. Paradoxically, this time, the impoverished may actually have come out ahead!

I should note that, in this instance, the paucity of African Americans is consistent with the fact that germ cell malignancy is much less common in African Americans than in Caucasians, and thus does not represent case selection bias. However, the broader issue of disparities of care and lack of equity and equality also has been discussed in detail in this edition of HemOnc Today in our cover story and remains a national disgrace ... and a much more expensive way of treating cancer, as I have discussed in previous editorials.


This brings me back to the question, why was RPLND assessed predominantly in a wealthy population with private insurance?

I wonder if they were informed that this procedure is not the standard of care for stage I to stage II seminoma, and whether that discussion was framed differently for those who could not pay for the experiment.

With respect to young investigators embarking on big-data reviews, I suggest you read the ASCO paper on the topic by Visvanathan and colleagues before wasting your time on a study that uses big data inappropriately.

Using big data to assess demographic and health policy implications can often be valid. These same data sets that are so poorly constructed to assess true efficacy, morbidity and decision process are too often misapplied ... and editors and program committees would do well to pay much more attention to quality and content than completely meaningless numbers like 63,727.


Tabakin A, et al. Abstract 534. Presented at: Genitourinary Cancers Symposium; Feb. 14-16, 2019; San Francisco.

Visvanathan K, et al. J Clin Oncol. 2017;doi:10.1200/JCO.2017.72.6414.

For more information:

Derek Raghavan, MD, PhD, FACP, FRACP, FASCO, is HemOnc Today’s Chief Medical Editor for Oncology. He also is president of Levine Cancer Institute at Atrium Health. He can be reached at

Disclosure: Raghavan reports no relevant financial disclosures.