TY - JOUR
T1 - Reactome graph database
T2 - Efficient access to complex pathway data
AU - Fabregat, Antonio
AU - Korninger, Florian
AU - Viteri, Guilherme
AU - Sidiropoulos, Konstantinos
AU - Marin-Garcia, Pablo
AU - Ping, Peipei
AU - Wu, Guanming
AU - Stein, Lincoln
AU - D’Eustachio, Peter
AU - Hermjakob, Henning
N1 - Publisher Copyright:
© 2018 Fabregat et al.
PY - 2018/1
Y1 - 2018/1
N2 - Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
AB - Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
UR - http://www.scopus.com/inward/record.url?scp=85041381376&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041381376&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1005968
DO - 10.1371/journal.pcbi.1005968
M3 - Article
C2 - 29377902
AN - SCOPUS:85041381376
SN - 1553-734X
VL - 14
JO - PLoS computational biology
JF - PLoS computational biology
IS - 1
M1 - e1005968
ER -