A long-read RNA-seq approach to identify novel transcripts of very large genes

Prech Uapinyoying; Jeremy Goecks; Susan M. Knoblac; Karuna Panchapakesan; Carsten G. Bonneman; Terence A. Partridg; Jyoti K. Jaiswa; Eric P. Hoffma

doi:10.1101/gr.259903.119

A long-read RNA-seq approach to identify novel transcripts of very large genes

Prech Uapinyoying, Jeremy Goecks, Susan M. Knoblac, Karuna Panchapakesan, Carsten G. Bonneman, Terence A. Partridg, Jyoti K. Jaiswa, Eric P. Hoffma

Research output: Contribution to journal › Article › peer-review

26 Scopus citations

Abstract

RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap: 5 kb), large (Neb: 22 kb), and very large (Ttn: 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type-specific differential expression of these novel transcripts. The improved transcript identification and quantification shown by our approach removes previous impediments to studies aimed at quantitative differential expression of ultralong transcripts.

Original language	English (US)
Pages (from-to)	885-897
Number of pages	13
Journal	Genome Research
Volume	30
Issue number	6
DOIs	https://doi.org/10.1101/gr.259903.119
State	Published - Jun 2020

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1101/gr.259903.119

Cite this

@article{797a01339ad748f09a186f05b9054927,

title = "A long-read RNA-seq approach to identify novel transcripts of very large genes",

abstract = "RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap: 5 kb), large (Neb: 22 kb), and very large (Ttn: 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type-specific differential expression of these novel transcripts. The improved transcript identification and quantification shown by our approach removes previous impediments to studies aimed at quantitative differential expression of ultralong transcripts.",

author = "Prech Uapinyoying and Jeremy Goecks and Knoblac, {Susan M.} and Karuna Panchapakesan and Bonneman, {Carsten G.} and Partridg, {Terence A.} and Jaiswa, {Jyoti K.} and Hoffma, {Eric P.}",

note = "Publisher Copyright: {\textcopyright} 2020 Uapinyoying et al.",

year = "2020",

month = jun,

doi = "10.1101/gr.259903.119",

language = "English (US)",

volume = "30",

pages = "885--897",

journal = "Genome Research",

issn = "1088-9051",

publisher = "Cold Spring Harbor Laboratory Press",

number = "6",

}

TY - JOUR

T1 - A long-read RNA-seq approach to identify novel transcripts of very large genes

AU - Uapinyoying, Prech

AU - Goecks, Jeremy

AU - Knoblac, Susan M.

AU - Panchapakesan, Karuna

AU - Bonneman, Carsten G.

AU - Partridg, Terence A.

AU - Jaiswa, Jyoti K.

AU - Hoffma, Eric P.

PY - 2020/6

Y1 - 2020/6

N2 - RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap: 5 kb), large (Neb: 22 kb), and very large (Ttn: 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type-specific differential expression of these novel transcripts. The improved transcript identification and quantification shown by our approach removes previous impediments to studies aimed at quantitative differential expression of ultralong transcripts.

AB - RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap: 5 kb), large (Neb: 22 kb), and very large (Ttn: 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type-specific differential expression of these novel transcripts. The improved transcript identification and quantification shown by our approach removes previous impediments to studies aimed at quantitative differential expression of ultralong transcripts.

UR - http://www.scopus.com/inward/record.url?scp=85089162894&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85089162894&partnerID=8YFLogxK

U2 - 10.1101/gr.259903.119

DO - 10.1101/gr.259903.119

M3 - Article

C2 - 32660935

AN - SCOPUS:85089162894

SN - 1088-9051

VL - 30

SP - 885

EP - 897

JO - Genome Research

JF - Genome Research

IS - 6

ER -

A long-read RNA-seq approach to identify novel transcripts of very large genes

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this