TY - JOUR
T1 - Comprehensive assessment of computational algorithms in predicting cancer driver mutations
AU - Chen, Hu
AU - Li, Jun
AU - Wang, Yumeng
AU - Ng, Patrick Kwok Shing
AU - Tsang, Yiu Huen
AU - Shaw, Kenna R.
AU - Mills, Gordon B.
AU - Liang, Han
N1 - Publisher Copyright:
© 2020 The Author(s).
PY - 2020/2/20
Y1 - 2020/2/20
N2 - Background: The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. Results: We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Conclusions: Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.
AB - Background: The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. Results: We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Conclusions: Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.
KW - 3D clustering
KW - Cell viability assay
KW - Driver mutations
KW - Passenger mutations
KW - TP53 mutations
KW - The Cancer Genome Atlas
KW - Tumor transformation
UR - http://www.scopus.com/inward/record.url?scp=85079814510&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079814510&partnerID=8YFLogxK
U2 - 10.1186/s13059-020-01954-z
DO - 10.1186/s13059-020-01954-z
M3 - Article
C2 - 32079540
AN - SCOPUS:85079814510
SN - 1474-7596
VL - 21
JO - Genome Biology
JF - Genome Biology
IS - 1
M1 - 43
ER -