Crochemore’s Partitioning on Weighted Strings and Applications

Carl Barton*, Solon Pissis

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)
206 Downloads (Pure)

Abstract

Given a string on alphabet (Formula presented.) the partitioning problem is to compute classes of equivalences on the set of positions of the input string. These classes implicitly memorise identical factors of the string and, hence, their efficient computation is essential for a wide range of string processing applications. We study this problem for a weighted string: for every position of the weighted string and every letter of the alphabet a probability of occurrence of this letter at this position is given. Thus a weighted string may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. In this article, we present a non-trivial generalisation of Crochemore’s partitioning algorithm (IPL, 1981) that works on weighted strings requiring time (Formula presented.), where n is the length of the string, (Formula presented.), (Formula presented.) is the size of (Formula presented.), and 1 / z is a cumulative weight threshold, defined as the minimal probability of occurrence of factors in the string. Our contributions can be summarised as follows: (a) we design the first algorithm to solve the partitioning problem on weighted strings for arbitrary z and (Formula presented.) in time (Formula presented.) and space (Formula presented.) improving the state of the art for (Formula presented.); (b) we improve the state of the art for numerous other string processing problems; and (c) we show further combinatorial insight into the relation between weighted and indeterminate strings, that is, sequences of alphabet subsets without associated occurrence probabilities.

Original languageEnglish
Pages (from-to)496–514
Number of pages19
JournalALGORITHMICA
Volume80
Early online date27 Jan 2017
DOIs
Publication statusPublished - Feb 2018

Keywords

  • Covers
  • Crochemore partitioning
  • Position weight matrix
  • Repetitions
  • Seeds
  • Uncertain sequences
  • Weighted strings

Fingerprint

Dive into the research topics of 'Crochemore’s Partitioning on Weighted Strings and Applications'. Together they form a unique fingerprint.

Cite this