TY - JOUR
T1 - Structurally divergent and recurrently mutated regions of primate genomes
AU - Mao, Yafei
AU - Harvey, William T.
AU - Porubsky, David
AU - Munson, Katherine M.
AU - Hoekzema, Kendra
AU - Lewis, Alexandra P.
AU - Audano, Peter A.
AU - Rozanski, Allison
AU - Yang, Xiangyu
AU - Zhang, Shilong
AU - Yoo, Dong Ahn
AU - Gordon, David S.
AU - Fair, Tyler
AU - Wei, Xiaoxi
AU - Logsdon, Glennis A.
AU - Haukness, Marina
AU - Dishuck, Philip C.
AU - Jeong, Hyeonsoo
AU - del Rosario, Ricardo
AU - Bauer, Vanessa L.
AU - Fattor, Will T.
AU - Wilkerson, Gregory K.
AU - Mao, Yuxiang
AU - Shi, Yongyong
AU - Sun, Qiang
AU - Lu, Qing
AU - Paten, Benedict
AU - Bakken, Trygve E.
AU - Pollen, Alex A.
AU - Feng, Guoping
AU - Sawyer, Sara L.
AU - Warren, Wesley C.
AU - Carbone, Lucia
AU - Eichler, Evan E.
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/3/14
Y1 - 2024/3/14
N2 - We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.
AB - We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.
KW - adaptive evolution
KW - comparative genomics
KW - duplicated genes
KW - evolutionary medicine
KW - human diseases
KW - long-read sequencing
KW - NPHP1 and Joubert syndrome
KW - primate evolution
KW - RGPD gene family
UR - http://www.scopus.com/inward/record.url?scp=85187211353&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85187211353&partnerID=8YFLogxK
U2 - 10.1016/j.cell.2024.01.052
DO - 10.1016/j.cell.2024.01.052
M3 - Article
C2 - 38428424
AN - SCOPUS:85187211353
SN - 0092-8674
VL - 187
SP - 1547-1562.e13
JO - Cell
JF - Cell
IS - 6
ER -