TY - JOUR
T1 - Next Generation Indexing for Genomic Intervals
AU - Jalili, Vahid
AU - Matteucci, Matteo
AU - Goecks, Jeremy
AU - Deldjoo, Yashar
AU - Ceri, Stefano
N1 - Funding Information:
This research is funded by the European Research Center (ERC) (Advanced ERC Grant 693174) Project “Data-Driven Genomic Computing (GeCo)”.
Publisher Copyright:
© 1989-2012 IEEE.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - One-dimensional intervals incremental inverted index (Di4) is a multi-resolution, single-dimension indexing framework for efficient, scalable, and extensible computation of genomic interval expressions. The framework has a tri-layer architecture: the semantic layer provides orthogonal and generic means (including the support of user-defined function) of sense-making and higher-lever reasoning from region-based datasets; the logical layer provides building blocks for region calculus and topological relations between intervals; the physical layer abstracts from persistence technology and makes the model adaptable to variety of persistence technologies, spanning from small-scale (e.g., B+tree) to large-scale (e.g., LevelDB). The extensibility of Di4 to application scenarios is shown with an example of comparative evaluation of ChIP-seq and DNase-Seq replicates. Performance of Di4 is benchmarked for small and large scale scenarios under common bioinformatics application scenarios. Di4 is freely available from https://genometric.github.io/Di4.
AB - One-dimensional intervals incremental inverted index (Di4) is a multi-resolution, single-dimension indexing framework for efficient, scalable, and extensible computation of genomic interval expressions. The framework has a tri-layer architecture: the semantic layer provides orthogonal and generic means (including the support of user-defined function) of sense-making and higher-lever reasoning from region-based datasets; the logical layer provides building blocks for region calculus and topological relations between intervals; the physical layer abstracts from persistence technology and makes the model adaptable to variety of persistence technologies, spanning from small-scale (e.g., B+tree) to large-scale (e.g., LevelDB). The extensibility of Di4 to application scenarios is shown with an example of comparative evaluation of ChIP-seq and DNase-Seq replicates. Performance of Di4 is benchmarked for small and large scale scenarios under common bioinformatics application scenarios. Di4 is freely available from https://genometric.github.io/Di4.
KW - Index structures
KW - efficient query processing
KW - genomic data management
UR - http://www.scopus.com/inward/record.url?scp=85053611008&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053611008&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2018.2871031
DO - 10.1109/TKDE.2018.2871031
M3 - Article
AN - SCOPUS:85053611008
SN - 1041-4347
VL - 31
SP - 2008
EP - 2021
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 10
M1 - 8468044
ER -