Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data

Steven J. Schrodi; Andrea DeBarber; Max He; Zhan Ye; Peggy Peissig; Jeffrey J. Van Wormer; Robert Haws; Murray H. Brilliant; Robert D. Steiner

doi:10.1007/s00439-015-1551-8

Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data

Steven J. Schrodi, Andrea DeBarber, Max He, Zhan Ye, Peggy Peissig, Jeffrey J. Van Wormer, Robert Haws, Murray H. Brilliant, Robert D. Steiner

Chemical Physiology and Biochemistry

Research output: Contribution to journal › Article › peer-review

22 Scopus citations

Abstract

Genetic methods can complement epidemiological surveys and clinical registries in determining prevalence of monogenic autosomal recessive diseases. Several large population-based genetic databases, such as the NHLBI GO Exome Sequencing Project, are now publically available. By assuming Hardy–Weinberg equilibrium, the frequency of individuals homozygous in the general population for a particular pathogenic allele can be directly calculated from a sample of chromosomes where some harbor the pathogenic allele. Further assuming that the penetrance of the pathogenic allele(s) is known, the prevalence of recessive phenotypes can be determined. Such work can inform public health efforts for rare recessive diseases. A Bayesian estimation procedure has yet to be applied to the problem of estimating disease prevalence from large population-based genetic data. A Bayesian framework is developed to derive the posterior probability density of monogenic, autosomal recessive phenotypes. Explicit equations are presented for the credible intervals of these disease prevalence estimates. A primary impediment to performing accurate disease prevalence calculations is the determination of truly pathogenic alleles. This issue is discussed, but in many instances remains a significant barrier to investigations solely reliant on statistical interrogation—functional studies can provide important information for solidifying evidence of variant pathogenicity. We also discuss several challenges to these efforts, including the population structure in the sample of chromosomes, the treatment of allelic heterogeneity, and reduced penetrance of pathogenic variants. To illustrate the application of these methods, we utilized recently published genetic data collected on a large sample from the Schmiedeleut Hutterites. We estimate prevalence and calculate 95 % credible intervals for 13 autosomal recessive diseases using these data. In addition, the Bayesian estimation procedure is applied to data from a central European study of hereditary fructose intolerance. The methods described herein show a viable path to robustly estimating both the expected prevalence of autosomal recessive phenotypes and corresponding credible intervals using population-based genetic databases that have recently become available. As these genetic databases increase in number and size with the advent of cost-effective next-generation sequencing, we anticipate that these methods and approaches may be helpful in recessive disease prevalence calculations, potentially impacting public health management, health economic analyses, and treatment of rare diseases.

Original language	English (US)
Pages (from-to)	659-669
Number of pages	11
Journal	Human genetics
Volume	134
Issue number	6
DOIs	https://doi.org/10.1007/s00439-015-1551-8
State	Published - Jun 1 2015

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1007/s00439-015-1551-8

Cite this

@article{7e178734dd5c4bd682ff048696f5d844,

title = "Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data",

abstract = "Genetic methods can complement epidemiological surveys and clinical registries in determining prevalence of monogenic autosomal recessive diseases. Several large population-based genetic databases, such as the NHLBI GO Exome Sequencing Project, are now publically available. By assuming Hardy–Weinberg equilibrium, the frequency of individuals homozygous in the general population for a particular pathogenic allele can be directly calculated from a sample of chromosomes where some harbor the pathogenic allele. Further assuming that the penetrance of the pathogenic allele(s) is known, the prevalence of recessive phenotypes can be determined. Such work can inform public health efforts for rare recessive diseases. A Bayesian estimation procedure has yet to be applied to the problem of estimating disease prevalence from large population-based genetic data. A Bayesian framework is developed to derive the posterior probability density of monogenic, autosomal recessive phenotypes. Explicit equations are presented for the credible intervals of these disease prevalence estimates. A primary impediment to performing accurate disease prevalence calculations is the determination of truly pathogenic alleles. This issue is discussed, but in many instances remains a significant barrier to investigations solely reliant on statistical interrogation—functional studies can provide important information for solidifying evidence of variant pathogenicity. We also discuss several challenges to these efforts, including the population structure in the sample of chromosomes, the treatment of allelic heterogeneity, and reduced penetrance of pathogenic variants. To illustrate the application of these methods, we utilized recently published genetic data collected on a large sample from the Schmiedeleut Hutterites. We estimate prevalence and calculate 95 % credible intervals for 13 autosomal recessive diseases using these data. In addition, the Bayesian estimation procedure is applied to data from a central European study of hereditary fructose intolerance. The methods described herein show a viable path to robustly estimating both the expected prevalence of autosomal recessive phenotypes and corresponding credible intervals using population-based genetic databases that have recently become available. As these genetic databases increase in number and size with the advent of cost-effective next-generation sequencing, we anticipate that these methods and approaches may be helpful in recessive disease prevalence calculations, potentially impacting public health management, health economic analyses, and treatment of rare diseases.",

author = "Schrodi, {Steven J.} and Andrea DeBarber and Max He and Zhan Ye and Peggy Peissig and {Van Wormer}, {Jeffrey J.} and Robert Haws and Brilliant, {Murray H.} and Steiner, {Robert D.}",

note = "Publisher Copyright: {\textcopyright} 2015, Springer-Verlag Berlin Heidelberg.",

year = "2015",

month = jun,

day = "1",

doi = "10.1007/s00439-015-1551-8",

language = "English (US)",

volume = "134",

pages = "659--669",

journal = "Human genetics",

issn = "0340-6717",

publisher = "Springer Verlag",

number = "6",

}

TY - JOUR

T1 - Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data

AU - Schrodi, Steven J.

AU - DeBarber, Andrea

AU - He, Max

AU - Ye, Zhan

AU - Peissig, Peggy

AU - Van Wormer, Jeffrey J.

AU - Haws, Robert

AU - Brilliant, Murray H.

AU - Steiner, Robert D.

PY - 2015/6/1

Y1 - 2015/6/1

N2 - Genetic methods can complement epidemiological surveys and clinical registries in determining prevalence of monogenic autosomal recessive diseases. Several large population-based genetic databases, such as the NHLBI GO Exome Sequencing Project, are now publically available. By assuming Hardy–Weinberg equilibrium, the frequency of individuals homozygous in the general population for a particular pathogenic allele can be directly calculated from a sample of chromosomes where some harbor the pathogenic allele. Further assuming that the penetrance of the pathogenic allele(s) is known, the prevalence of recessive phenotypes can be determined. Such work can inform public health efforts for rare recessive diseases. A Bayesian estimation procedure has yet to be applied to the problem of estimating disease prevalence from large population-based genetic data. A Bayesian framework is developed to derive the posterior probability density of monogenic, autosomal recessive phenotypes. Explicit equations are presented for the credible intervals of these disease prevalence estimates. A primary impediment to performing accurate disease prevalence calculations is the determination of truly pathogenic alleles. This issue is discussed, but in many instances remains a significant barrier to investigations solely reliant on statistical interrogation—functional studies can provide important information for solidifying evidence of variant pathogenicity. We also discuss several challenges to these efforts, including the population structure in the sample of chromosomes, the treatment of allelic heterogeneity, and reduced penetrance of pathogenic variants. To illustrate the application of these methods, we utilized recently published genetic data collected on a large sample from the Schmiedeleut Hutterites. We estimate prevalence and calculate 95 % credible intervals for 13 autosomal recessive diseases using these data. In addition, the Bayesian estimation procedure is applied to data from a central European study of hereditary fructose intolerance. The methods described herein show a viable path to robustly estimating both the expected prevalence of autosomal recessive phenotypes and corresponding credible intervals using population-based genetic databases that have recently become available. As these genetic databases increase in number and size with the advent of cost-effective next-generation sequencing, we anticipate that these methods and approaches may be helpful in recessive disease prevalence calculations, potentially impacting public health management, health economic analyses, and treatment of rare diseases.

AB - Genetic methods can complement epidemiological surveys and clinical registries in determining prevalence of monogenic autosomal recessive diseases. Several large population-based genetic databases, such as the NHLBI GO Exome Sequencing Project, are now publically available. By assuming Hardy–Weinberg equilibrium, the frequency of individuals homozygous in the general population for a particular pathogenic allele can be directly calculated from a sample of chromosomes where some harbor the pathogenic allele. Further assuming that the penetrance of the pathogenic allele(s) is known, the prevalence of recessive phenotypes can be determined. Such work can inform public health efforts for rare recessive diseases. A Bayesian estimation procedure has yet to be applied to the problem of estimating disease prevalence from large population-based genetic data. A Bayesian framework is developed to derive the posterior probability density of monogenic, autosomal recessive phenotypes. Explicit equations are presented for the credible intervals of these disease prevalence estimates. A primary impediment to performing accurate disease prevalence calculations is the determination of truly pathogenic alleles. This issue is discussed, but in many instances remains a significant barrier to investigations solely reliant on statistical interrogation—functional studies can provide important information for solidifying evidence of variant pathogenicity. We also discuss several challenges to these efforts, including the population structure in the sample of chromosomes, the treatment of allelic heterogeneity, and reduced penetrance of pathogenic variants. To illustrate the application of these methods, we utilized recently published genetic data collected on a large sample from the Schmiedeleut Hutterites. We estimate prevalence and calculate 95 % credible intervals for 13 autosomal recessive diseases using these data. In addition, the Bayesian estimation procedure is applied to data from a central European study of hereditary fructose intolerance. The methods described herein show a viable path to robustly estimating both the expected prevalence of autosomal recessive phenotypes and corresponding credible intervals using population-based genetic databases that have recently become available. As these genetic databases increase in number and size with the advent of cost-effective next-generation sequencing, we anticipate that these methods and approaches may be helpful in recessive disease prevalence calculations, potentially impacting public health management, health economic analyses, and treatment of rare diseases.

UR - http://www.scopus.com/inward/record.url?scp=84936746793&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936746793&partnerID=8YFLogxK

U2 - 10.1007/s00439-015-1551-8

DO - 10.1007/s00439-015-1551-8

M3 - Article

C2 - 25893794

AN - SCOPUS:84936746793

SN - 0340-6717

VL - 134

SP - 659

EP - 669

JO - Human genetics

JF - Human genetics

IS - 6

ER -

Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this