April 11, 2017
2 min read

Molecular data improves prediction of colorectal cancer recurrence

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Microarray-trained models, which include molecular data, better predicted the recurrence of colorectal cancer than models that used clinical data, according to study findings presented at the Society of Surgical Oncology’s Annual Cancer Symposium.

Excluding skin cancer, colorectal cancer is the third-most diagnosed cancer in the United States with an estimated 95,520 new cases of colon cancer and 39,910 new cases of rectal cancer expected in 2017, according to the American Cancer Society.

Jason Castellanos

A key therapeutic dilemma in the treatment of colorectal cancer is whether patients with stage II and stage III disease require adjuvant chemotherapy after surgical resection. Patients at increased risk for recurrence have been identified with predictive models based on gene expression data. However, those models are not FDA approved and are not used in standard clinical practice.

“Treatment of stage II colon cancer is controversial as the majority of patients will be cured with surgery,” Jason Castellanos, MD, MS, resident in general surgery at Vanderbilt University Medical Center, told HemOnc Today. “However, 20% of patients recur, so theoretically chemotherapy would be helpful. The treatment of stage III colon cancer is a major success story as OS is significantly improved with chemotherapy after surgery. However, based on historical data, 50% of patients would not recur after surgery alone and may be exposed to chemotherapy without benefit.”

Castellanos and colleagues incorporated ideas from the winning entries in machine learning and bioinformatics competitions to improve the ability to predict which patients with colorectal cancer would recur at 3 years after diagnosis. These ideas included:

  • Ensemble learning, which is the combination of several machine learning algorithms together to arrive at one prediction; and
  • Multiple views of the data, in which representations of genomic data is used to incorporate biologic information.

The dataset in the study included information from nontumor tissue expression genes, gene set structure, protein-to-protein interaction network structure, previously curated molecular signatures and identified tumor suppressor/driver mutations.

The microarray-trained models performed significantly better compared with models training on clinical data (P = 1.49 x 10–8).

Castellanos and colleagues also found that nonlinear classifiers often outperform linear classifiers, and that ensemble methods can also enhance performance.

“We found that the molecular data views enhanced predictive performance by integrating biologic information,” Castellanos said. “Our multiple view, multiple learner framework was able predict 3-year disease recurrence significantly better than a modified Oncotype Dx score (Genomic Health). However, we still need larger datasets to fully leverage these approaches.”– by Chuck Gormley


Castellanos J, et al. Abstract 7. Presented at: Society of Surgical Oncology’s Annual Cancer Symposium; March 15-18, 2017; Seattle.

Disclosure: The researchers report no relevant financial disclosures.