TY - JOUR
T1 - Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling
AU - Bilal, Erhan
AU - Dutkowski, Janusz
AU - Guinney, Justin
AU - Jang, In Sock
AU - Logsdon, Benjamin A.
AU - Pandey, Gaurav
AU - Sauerwine, Benjamin A.
AU - Shimoni, Yishai
AU - Moen Vollan, Hans Kristian
AU - Mecham, Brigham H.
AU - Rueda, Oscar M.
AU - Tost, Jorg
AU - Curtis, Christina
AU - Alvarez, Mariano J.
AU - Kristensen, Vessela N.
AU - Aparicio, Samuel
AU - Børresen-Dale, Anne Lise
AU - Caldas, Carlos
AU - Califano, Andrea
AU - Friend, Stephen H.
AU - Ideker, Trey
AU - Schadt, Eric E.
AU - Stolovitzky, Gustavo A.
AU - Margolin, Adam A.
PY - 2013/5
Y1 - 2013/5
N2 - Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.
AB - Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one avenue for improving treatment by matching the proper treatment with molecular subtypes of the disease. In this work, we employed a competition-based approach to modeling breast cancer prognosis using large datasets containing genomic and clinical information and an online real-time leaderboard program used to speed feedback to the modeling team and to encourage each modeler to work towards achieving a higher ranked submission. We find that machine learning methods combined with molecular features selected based on expert prior knowledge can improve survival predictions compared to current best-in-class methodologies and that ensemble models trained across multiple user submissions systematically outperform individual models within the ensemble. We also find that model scores are highly consistent across multiple independent evaluations. This study serves as the pilot phase of a much larger competition open to the whole research community, with the goal of understanding general strategies for model optimization using clinical and molecular profiling data and providing an objective, transparent system for assessing prognostic models.
UR - http://www.scopus.com/inward/record.url?scp=84877734926&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877734926&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1003047
DO - 10.1371/journal.pcbi.1003047
M3 - Article
C2 - 23671412
AN - SCOPUS:84877734926
SN - 1553-734X
VL - 9
JO - PLoS Computational Biology
JF - PLoS Computational Biology
IS - 5
M1 - e1003047
ER -