Skip to main navigation Skip to search Skip to main content

Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data

Research output: Contribution to journalArticlepeer-review

Abstract

Single-cell RNA sequencing (scRNA-seq) distinguishes cell types, states and lineages within the context of heterogeneous tissues. However, current single-cell data cannot directly link cell clusters with specific phenotypes. Here we present Scissor, a method that identifies cell subpopulations from single-cell data that are associated with a given phenotype. Scissor integrates phenotype-associated bulk expression data and single-cell data by first quantifying the similarity between each single cell and each bulk sample. It then optimizes a regression model on the correlation matrix with the sample phenotype to identify relevant subpopulations. Applied to a lung cancer scRNA-seq dataset, Scissor identified subsets of cells associated with worse survival and with TP53 mutations. In melanoma, Scissor discerned a T cell subpopulation with low PDCD1/CTLA4 and high TCF7 expression associated with an immunotherapy response. Beyond cancer, Scissor was effective in interpreting facioscapulohumeral muscular dystrophy and Alzheimer’s disease datasets. Scissor identifies biologically and clinically relevant cell subpopulations from single-cell assays by leveraging phenotype and bulk-omics datasets.

Original languageEnglish (US)
Pages (from-to)527-538
Number of pages12
JournalNature biotechnology
Volume40
Issue number4
DOIs
StatePublished - Apr 1 2022

Funding

This work was supported by the following funding: NIH 5K01LM012877 (to Z.X.); NIH 1R21HL145426 (to Z.X.); NIH 1R01CA207377 (to D.Z.Q.); NIH NIGMS MIRA R35GM124704 (to A.C.A.); the Medical Research Foundation of Oregon (to Z.X.); NCI R01 CA251245, P50 CA097186, P50 CA186786, P50 CA186786-07S1 and Department of Defense Impact Award W81XWH-16-1-0597 (to J.J.A.); and NCI R01CA244576 (to A.V.D.). We thank W. Anderson and A. Hill for editing the manuscript. The resources of the Exacloud high-performance computing environment, developed jointly by Oregon Health & Science University (OHSU) and Intel, and the technical support of the OHSU Advanced Computing Center are gratefully acknowledged. A.E.M. discloses receipt of a sponsored research agreement from AstraZeneca. A.V.D. reports consultancy from Abbvie, Beigene, Celgene, Curis, Janssen, Karyopharm, Nurix, Seattle Genetics, Teva Oncology and TG Therapeutics; research funding from Aptose Biosciences, Bristol Myers Squibb, Gilead Sciences and Takeda Oncology; and consultancy and research funding from AstraZeneca, Bayer Oncology, Genentech and Verastem Oncology. J.A.A. has received consulting income from Janssen Biotech, Merck Sharp & Dohme and Dendreon and honoraria for speaker’s fees from Astellas. All other authors declare no competing interests.

FundersFunder number
Aptose Biosciences
Bayer Oncology
Genentech and Verastem Oncology
Oregon Medical Research Foundation
Teva Oncology and TG Therapeutics
Author National Institutes of Health National Institutes of Health National Institutes of Health National Institutes of Health The Bev Hartig Huntington's Disease Foundation National Institutes of Health1R01CA207377, 1R21HL145426
U.S. Department of DefenseR01CA244576, W81XWH-16-1-0597
National Institute of Health-National Cancer InstituteP50 CA186786-07S1, P50 CA097186, R01 CA251245
National Institute of General Medical SciencesMIRA R35GM124704
U.S. National Library of MedicineK01LM012877
Intel Corporation
AstraZeneca
AbbVie
Oregon State University/Oregon Health and Science University

    ASJC Scopus subject areas

    • Biotechnology
    • Bioengineering
    • Applied Microbiology and Biotechnology
    • Biomedical Engineering
    • Molecular Medicine

    Fingerprint

    Dive into the research topics of 'Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data'. Together they form a unique fingerprint.

    Cite this