Suffix-Prefix Queries on a Dictionary

Grigorios Loukides*, Solon P. Pissis*, Sharma V. Thankachan*, Wiktor Zuba*

*Corresponding author for this work

Research output: Contribution to journalConference paperpeer-review

4 Citations (Scopus)

Abstract

In the all-pairs suffix-prefix (APSP) problem, we are given a dictionary R of k strings, S1, . . ., Sk, of total length n, and we are asked to find the length SPLi,j of the longest string that is both a suffix of Si and a prefix of Sj, for all i, j ∈ [1, k]. APSP is a classic problem in string algorithms with many applications in bioinformatics. When all strings of the dictionary are over an integer alphabet of size σ ≤ nO(1), APSP can be solved in the optimal O(n + k2) time with the use of the generalized suffix tree of the dictionary [Gusfield et al., Inf. Process. Lett. 1992]. In many bioinformatics applications, such as in sequence assembly, the size k of dictionary R is very large. In particular, k2 usually dominates n, and thus the k2 factor is the bottleneck both in the time and in the space complexity of such applications. We thus initiate a holistic study on several data structure variants of APSP. In particular, we consider the following types of queries: One-to-One(i, j): output SPLi,j. One-to-All(i): output SPLi,j for every j ∈ [1, k]. Report(i, ℓ): output all distinct j ∈ [1, k] such that SPLi,j ≥ ℓ, where ℓ ≥ 0 is an integer. Count(i, ℓ): output the number of distinct j ∈ [1, k] such that SPLi,j ≥ ℓ, where ℓ ≥ 0 is an integer. Top(i, K): output K distinct j ∈ [1, k] with the highest values of SPLi,j breaking ties arbitrarily. We assume the standard word RAM model of computation with word size w = Ω(log n) and an integer alphabet of size σ ≤ nO(1). We show the following upper bounds: Query Space (words) Query time Note One-to-One(i, j) O(n) O(log log k) Theorem 11 One-to-All(i) O(n) O(k) Theorem 14 Report(i, ℓ) O(n) O(log n/log log n + output) Theorem 19(i) Count(i, ℓ) O(n) O(log n/log log n) Theorem 19(ii) Top(i, K) O(n) O(log2 n/log log n + K) Theorem 22 We also present efficient algorithms for constructing these data structures.

Original languageEnglish
JournalLeibniz International Proceedings in Informatics, LIPIcs
DOIs
Publication statusAccepted/In press - 24 Mar 2023
Event34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023 - Marne-la-Vallee, France
Duration: 26 Jun 202328 Jun 2023

Keywords

  • all-pairs suffix-prefix
  • internal pattern matching
  • suffix-prefix queries

Fingerprint

Dive into the research topics of 'Suffix-Prefix Queries on a Dictionary'. Together they form a unique fingerprint.

Cite this