Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Hu Chen; Jun Li; Yumeng Wang; Patrick Kwok Shing Ng; Yiu Huen Tsang; Kenna R. Shaw; Gordon B. Mills; Han Liang

doi:10.1186/s13059-020-01954-z

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Hu Chen, Jun Li, Yumeng Wang, Patrick Kwok Shing Ng, Yiu Huen Tsang, Kenna R. Shaw, Gordon B. Mills, Han Liang

Cell Developmental and Cancer Biology

Research output: Contribution to journal › Article › peer-review

35 Scopus citations

Abstract

Background: The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. Results: We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Conclusions: Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.

Original language	English (US)
Article number	43
Journal	Genome biology
Volume	21
Issue number	1
DOIs	https://doi.org/10.1186/s13059-020-01954-z
State	Published - Feb 20 2020

Keywords

3D clustering
Cell viability assay
Driver mutations
Passenger mutations
TP53 mutations
The Cancer Genome Atlas
Tumor transformation

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Genetics
Cell Biology

Access to Document

10.1186/s13059-020-01954-z

Cite this

@article{e0e3ce050c2d43519aa9d34f08b94165,

title = "Comprehensive assessment of computational algorithms in predicting cancer driver mutations",

abstract = "Background: The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. Results: We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Conclusions: Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.",

keywords = "3D clustering, Cell viability assay, Driver mutations, Passenger mutations, TP53 mutations, The Cancer Genome Atlas, Tumor transformation",

author = "Hu Chen and Jun Li and Yumeng Wang and Ng, {Patrick Kwok Shing} and Tsang, {Yiu Huen} and Shaw, {Kenna R.} and Mills, {Gordon B.} and Han Liang",

note = "Publisher Copyright: {\textcopyright} 2020 The Author(s).",

year = "2020",

month = feb,

day = "20",

doi = "10.1186/s13059-020-01954-z",

language = "English (US)",

volume = "21",

journal = "Genome biology",

issn = "1474-7596",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - Comprehensive assessment of computational algorithms in predicting cancer driver mutations

AU - Chen, Hu

AU - Li, Jun

AU - Wang, Yumeng

AU - Ng, Patrick Kwok Shing

AU - Tsang, Yiu Huen

AU - Shaw, Kenna R.

AU - Mills, Gordon B.

AU - Liang, Han

PY - 2020/2/20

Y1 - 2020/2/20

N2 - Background: The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. Results: We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Conclusions: Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.

AB - Background: The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. Results: We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Conclusions: Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.

KW - 3D clustering

KW - Cell viability assay

KW - Driver mutations

KW - Passenger mutations

KW - TP53 mutations

KW - The Cancer Genome Atlas

KW - Tumor transformation

UR - http://www.scopus.com/inward/record.url?scp=85079814510&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85079814510&partnerID=8YFLogxK

U2 - 10.1186/s13059-020-01954-z

DO - 10.1186/s13059-020-01954-z

M3 - Article

C2 - 32079540

AN - SCOPUS:85079814510

SN - 1474-7596

VL - 21

JO - Genome biology

JF - Genome biology

IS - 1

M1 - 43

ER -

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this