Property Suffix Array with Applications in Indexing Weighted Sequences

Panagiotis Charalampopoulos; Costas S. Iliopoulos; Chang Liu; Solon P. Pissis

doi:10.1145/3385898

Property Suffix Array with Applications in Indexing Weighted Sequences

Panagiotis Charalampopoulos, Costas S. Iliopoulos, Chang Liu, Solon P. Pissis

Research output: Contribution to journal › Article › peer-review

7 Citations (Scopus)

Abstract

The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e., a set of Π-valid intervals over x so that a pattern p occurs in x if and only if x has an occurrence of p that lies entirely within an interval of Π. A suffix-tree-like index over the valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1/z, we say that a string p of length m matches a weighted sequence x of length n at starting position i if the product of probabilities of the letters of p at positions i, ⋯, i+m - 1 in x is at least 1/z. Our algorithm for building the PSA can be directly applied to build an O(nz)-sized suffix-array-like index over x in time and space O(nz). Finally, we present extensive experimental results showing that our new indexing data structure is well suited for real-world applications.

Original language	English
Article number	3385898
Journal	ACM Journal of Experimental Algorithmics
Volume	25
DOIs	https://doi.org/10.1145/3385898
Publication status	Published - Nov 2020

Keywords

property suffix array
property suffix tree
suffix array
Suffix tree
weighted sequence

Access to Document

10.1145/3385898

Cite this

@article{093b1f424bcb4e229dcb5a59b3a8a289,

title = "Property Suffix Array with Applications in Indexing Weighted Sequences",

abstract = "The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e., a set of Π-valid intervals over x so that a pattern p occurs in x if and only if x has an occurrence of p that lies entirely within an interval of Π. A suffix-tree-like index over the valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1/z, we say that a string p of length m matches a weighted sequence x of length n at starting position i if the product of probabilities of the letters of p at positions i, ⋯, i+m - 1 in x is at least 1/z. Our algorithm for building the PSA can be directly applied to build an O(nz)-sized suffix-array-like index over x in time and space O(nz). Finally, we present extensive experimental results showing that our new indexing data structure is well suited for real-world applications.",

keywords = "property suffix array, property suffix tree, suffix array, Suffix tree, weighted sequence",

author = "Panagiotis Charalampopoulos and Iliopoulos, {Costas S.} and Chang Liu and Pissis, {Solon P.}",

year = "2020",

month = nov,

doi = "10.1145/3385898",

language = "English",

volume = "25",

journal = "ACM Journal of Experimental Algorithmics",

issn = "1084-6654",

publisher = "Association for Computing Machinery (ACM)",

}

TY - JOUR

T1 - Property Suffix Array with Applications in Indexing Weighted Sequences

AU - Charalampopoulos, Panagiotis

AU - Iliopoulos, Costas S.

AU - Liu, Chang

AU - Pissis, Solon P.

PY - 2020/11

Y1 - 2020/11

N2 - The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e., a set of Π-valid intervals over x so that a pattern p occurs in x if and only if x has an occurrence of p that lies entirely within an interval of Π. A suffix-tree-like index over the valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1/z, we say that a string p of length m matches a weighted sequence x of length n at starting position i if the product of probabilities of the letters of p at positions i, ⋯, i+m - 1 in x is at least 1/z. Our algorithm for building the PSA can be directly applied to build an O(nz)-sized suffix-array-like index over x in time and space O(nz). Finally, we present extensive experimental results showing that our new indexing data structure is well suited for real-world applications.

AB - The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree is space efficiency. In Property Indexing, we are given a string x of length n and a property Π, i.e., a set of Π-valid intervals over x so that a pattern p occurs in x if and only if x has an occurrence of p that lies entirely within an interval of Π. A suffix-tree-like index over the valid prefixes of suffixes of x can be built in time and space O(n). We show here how to directly build a suffix-array-like index, the Property Suffix Array (PSA), in time and space O(n). We mainly draw our motivation from weighted (probabilistic) sequences: sequences of probability distributions over a given alphabet. Given a probability threshold 1/z, we say that a string p of length m matches a weighted sequence x of length n at starting position i if the product of probabilities of the letters of p at positions i, ⋯, i+m - 1 in x is at least 1/z. Our algorithm for building the PSA can be directly applied to build an O(nz)-sized suffix-array-like index over x in time and space O(nz). Finally, we present extensive experimental results showing that our new indexing data structure is well suited for real-world applications.

KW - property suffix array

KW - property suffix tree

KW - suffix array

KW - Suffix tree

KW - weighted sequence

UR - http://www.scopus.com/inward/record.url?scp=85092095146&partnerID=8YFLogxK

U2 - 10.1145/3385898

DO - 10.1145/3385898

M3 - Article

AN - SCOPUS:85092095146

SN - 1084-6654

VL - 25

JO - ACM Journal of Experimental Algorithmics

JF - ACM Journal of Experimental Algorithmics

M1 - 3385898

ER -

Property Suffix Array with Applications in Indexing Weighted Sequences

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this