Abstract

The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e., a set of Π-valid intervals over x so that a pattern p occurs in x if and only if x has an occurrence of p that lies entirely within an interval of Π. A suffix-tree-like index over the valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1/z, we say that a string p of length m matches a weighted sequence x of length n at starting position i if the product of probabilities of the letters of p at positions i, ⋯, i+m - 1 in x is at least 1/z. Our algorithm for building the PSA can be directly applied to build an O(nz)-sized suffix-array-like index over x in time and space O(nz). Finally, we present extensive experimental results showing that our new indexing data structure is well suited for real-world applications.

Original languageEnglish
Article number3385898
JournalACM Journal of Experimental Algorithmics
Volume25
DOIs
Publication statusPublished - Nov 2020

Keywords

  • property suffix array
  • property suffix tree
  • suffix array
  • Suffix tree
  • weighted sequence

Fingerprint

Dive into the research topics of 'Property Suffix Array with Applications in Indexing Weighted Sequences'. Together they form a unique fingerprint.

Cite this