TY - GEN
T1 - An invertible transform for efficient string matching in Labeled Digraphs
AU - Nellore, Abhinav
AU - Nguyen, Austin
AU - Thompson, Reid F.
N1 - Publisher Copyright:
© Abhinav Nellore, Austin Nguyen, and Reid F. Thompson.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Let G = (V,E) be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet O, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of G into a weakly connected digraph G' = (V ',E') that enables solving the decision problem of whether there is a walk in G matching an arbitrarily long query string q in time linear in |q| and independent of |E| and |V |. We show G is uniquely determined by G' when for every vl ϵ V, there is some distinct string sl on O such that vl is the origin of a closed walk in G matching sl, and no other walk in G matches sl unless it starts and ends at vl. We then exploit this invertibility condition to strategically alter any G so its transform G' enables retrieval of all t terminal vertices of walks in the unaltered G matching q in O(|q| + t log |V |) time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.
AB - Let G = (V,E) be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet O, and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of G into a weakly connected digraph G' = (V ',E') that enables solving the decision problem of whether there is a walk in G matching an arbitrarily long query string q in time linear in |q| and independent of |E| and |V |. We show G is uniquely determined by G' when for every vl ϵ V, there is some distinct string sl on O such that vl is the origin of a closed walk in G matching sl, and no other walk in G matches sl unless it starts and ends at vl. We then exploit this invertibility condition to strategically alter any G so its transform G' enables retrieval of all t terminal vertices of walks in the unaltered G matching q in O(|q| + t log |V |) time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.
KW - Burrows-Wheeler transform
KW - Labeled graphs
KW - Pattern matching
KW - String matching
UR - http://www.scopus.com/inward/record.url?scp=85113825665&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113825665&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.CPM.2021.20
DO - 10.4230/LIPIcs.CPM.2021.20
M3 - Conference contribution
AN - SCOPUS:85113825665
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021
A2 - Gawrychowski, Pawel
A2 - Starikovskaya, Tatiana
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021
Y2 - 5 July 2021 through 7 July 2021
ER -