Protein database and quantitative analysis considerations when integrating genetics and proteomics to compare mouse strains

Suzanne S. Fei; Phillip A. Wilmarth; Robert J. Hitzemann; Shannon K. McWeeney; John K. Belknap; Larry L. David

doi:10.1021/pr200133p

Protein database and quantitative analysis considerations when integrating genetics and proteomics to compare mouse strains

Suzanne S. Fei, Phillip A. Wilmarth, Robert J. Hitzemann, Shannon K. McWeeney, John K. Belknap, Larry L. David

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8% gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1% FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.

Original language	English (US)
Pages (from-to)	2905-2912
Number of pages	8
Journal	Journal of Proteome Research
Volume	10
Issue number	7
DOIs	https://doi.org/10.1021/pr200133p
State	Published - Jul 1 2011

ASJC Scopus subject areas

General Chemistry
Biochemistry

Access to Document

10.1021/pr200133p

Cite this

@article{999cca6ea6ba482e96108b03cde41fba,

title = "Protein database and quantitative analysis considerations when integrating genetics and proteomics to compare mouse strains",

abstract = "Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8% gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1% FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.",

author = "Fei, {Suzanne S.} and Wilmarth, {Phillip A.} and Hitzemann, {Robert J.} and McWeeney, {Shannon K.} and Belknap, {John K.} and David, {Larry L.}",

year = "2011",

month = jul,

day = "1",

doi = "10.1021/pr200133p",

language = "English (US)",

volume = "10",

pages = "2905--2912",

journal = "Journal of Proteome Research",

issn = "1535-3893",

publisher = "American Chemical Society",

number = "7",

}

TY - JOUR

T1 - Protein database and quantitative analysis considerations when integrating genetics and proteomics to compare mouse strains

AU - Fei, Suzanne S.

AU - Wilmarth, Phillip A.

AU - Hitzemann, Robert J.

AU - McWeeney, Shannon K.

AU - Belknap, John K.

AU - David, Larry L.

PY - 2011/7/1

Y1 - 2011/7/1

N2 - Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8% gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1% FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.

AB - Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8% gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1% FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.

UR - http://www.scopus.com/inward/record.url?scp=79959988142&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959988142&partnerID=8YFLogxK

U2 - 10.1021/pr200133p

DO - 10.1021/pr200133p

M3 - Article

C2 - 21553863

AN - SCOPUS:79959988142

SN - 1535-3893

VL - 10

SP - 2905

EP - 2912

JO - Journal of Proteome Research

JF - Journal of Proteome Research

IS - 7

ER -

Protein database and quantitative analysis considerations when integrating genetics and proteomics to compare mouse strains

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this