TY - JOUR
T1 - What Markov State Models Can and Cannot Do
T2 - Correlation versus Path-Based Observables in Protein-Folding Models
AU - Suárez, Ernesto
AU - Wiewiora, Rafal P.
AU - Wehmeyer, Chris
AU - Noé, Frank
AU - Chodera, John D.
AU - Zuckerman, Daniel M.
N1 - Funding Information:
J.D.C. is a current member of the Scientific Advisory Board of OpenEye Scientific Software, Redesign Science, and Interline Therapeutics, and has equity interests in Redesign Science and Interline Therapeutics. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, Vir Biotechnology, Bayer, XtalPi, Foresite Laboratories, the Molecular Sciences Software Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, and the Sloan Kettering Institute. A complete funding history for the Chodera lab can be found at http://choderalab.org/funding .
Funding Information:
J.D.C. acknowledges support from NIH grant P30 CA008748, NIH grant R01 GM121505, NIH grant R01 GM132386, and the Sloan Kettering Institute.
Funding Information:
We thank D. E. Shaw Research (DESRES) for providing a copy of the protein folding trajectory data set from ref . Simon Olsson helped to build early MSM models for this study. D.M.Z. acknowledges support from NIH Grant R01GM115805, as well as NSF Grant MCB-1119091. J.D.C. acknowledges support from NIH Grant R01GM121505 and National Cancer Institute Cancer Center Core Grant P30CA008748. R.P.W. acknowledges support from the Tri-Institutional PhD Program in Chemical Biology and the Department of Defense (Peer Reviewed Cancer Research Program, Award W81XWH-17-1-0412). This project has also been funded in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract HHSN26120080001E. F.N. acknowledges support from the European Research commission (ERC CoG 772230), the Berlin mathematics center MATH+ and the BMBF (BIFOLD). We thank Josh Fass for helpful discussions.
Publisher Copyright:
© 2021 American Chemical Society.
PY - 2021/5/11
Y1 - 2021/5/11
N2 - Markov state models (MSMs) have been widely applied to study the kinetics and pathways of protein conformational dynamics based on statistical analysis of molecular dynamics (MD) simulations. These MSMs coarse-grain both configuration space and time in ways that limit what kinds of observables they can reproduce with high fidelity over different spatial and temporal resolutions. Despite their popularity, there is still limited understanding of which biophysical observables can be computed from these MSMs in a robust and unbiased manner, and which suffer from the space-time coarse-graining intrinsic in the MSM model. Most theoretical arguments and practical validity tests for MSMs rely on long-time equilibrium kinetics, such as the slowest relaxation time scales and experimentally observable time-correlation functions. Here, we perform an extensive assessment of the ability of well-validated protein folding MSMs to accurately reproduce path-based observable such as mean first-passage times (MFPTs) and transition path mechanisms compared to a direct trajectory analysis. We also assess a recently proposed class of history-augmented MSMs (haMSMs) that exploit additional information not accounted for in standard MSMs. We conclude with some practical guidance on the use of MSMs to study various problems in conformational dynamics of biomolecules. In brief, MSMs can accurately reproduce correlation functions slower than the lag time, but path-based observables can only be reliably reproduced if the lifetimes of states exceed the lag time, which is a much stricter requirement. Even in the presence of short-lived states, we find that haMSMs reproduce path-based observables more reliably.
AB - Markov state models (MSMs) have been widely applied to study the kinetics and pathways of protein conformational dynamics based on statistical analysis of molecular dynamics (MD) simulations. These MSMs coarse-grain both configuration space and time in ways that limit what kinds of observables they can reproduce with high fidelity over different spatial and temporal resolutions. Despite their popularity, there is still limited understanding of which biophysical observables can be computed from these MSMs in a robust and unbiased manner, and which suffer from the space-time coarse-graining intrinsic in the MSM model. Most theoretical arguments and practical validity tests for MSMs rely on long-time equilibrium kinetics, such as the slowest relaxation time scales and experimentally observable time-correlation functions. Here, we perform an extensive assessment of the ability of well-validated protein folding MSMs to accurately reproduce path-based observable such as mean first-passage times (MFPTs) and transition path mechanisms compared to a direct trajectory analysis. We also assess a recently proposed class of history-augmented MSMs (haMSMs) that exploit additional information not accounted for in standard MSMs. We conclude with some practical guidance on the use of MSMs to study various problems in conformational dynamics of biomolecules. In brief, MSMs can accurately reproduce correlation functions slower than the lag time, but path-based observables can only be reliably reproduced if the lifetimes of states exceed the lag time, which is a much stricter requirement. Even in the presence of short-lived states, we find that haMSMs reproduce path-based observables more reliably.
UR - http://www.scopus.com/inward/record.url?scp=85105904298&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105904298&partnerID=8YFLogxK
U2 - 10.1021/acs.jctc.0c01154
DO - 10.1021/acs.jctc.0c01154
M3 - Article
C2 - 33904312
AN - SCOPUS:85105904298
SN - 1549-9618
VL - 17
SP - 3119
EP - 3133
JO - Journal of Chemical Theory and Computation
JF - Journal of Chemical Theory and Computation
IS - 5
ER -