Linear-time computation of prefix table for weighted strings & applications

Carl Barton; Chang Liu; Solon Pissis

doi:10.1016/j.tcs.2016.04.029

Linear-time computation of prefix table for weighted strings & applications

Carl Barton, Chang Liu, Solon Pissis

Informatics

Research output: Contribution to journal › Article › peer-review

9 Citations (Scopus)

150 Downloads (Pure)

Abstract

The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1 / z , defined as the minimal probability of occurrence of factors in x, we present an O ( n ) -time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.

Original language	English
Pages (from-to)	160 - 172
Journal	Theoretical Computer Science
Volume	656, Part B
Early online date	30 Apr 2016
DOIs	https://doi.org/10.1016/j.tcs.2016.04.029
Publication status	Published - 2016

Keywords

algorithms on strings
weighted strings
uncertain sequences

Access to Document

10.1016/j.tcs.2016.04.029Licence: CC BY

Linear-time computation of prefix table for weighted strings & applications
©2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Accepted author manuscript, 366 KBLicence: CC BY-NC-ND

Cite this

@article{0ff61f4782494ff9b037f21a36cd8bf7,

title = "Linear-time computation of prefix table for weighted strings & applications",

abstract = "The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1 / z , defined as the minimal probability of occurrence of factors in x, we present an O ( n ) -time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.",

keywords = "algorithms on strings, weighted strings, uncertain sequences",

author = "Carl Barton and Chang Liu and Solon Pissis",

note = "Stringology: In Celebration of Bill Smyth{\textquoteright}s 80th Birthday",

year = "2016",

doi = "10.1016/j.tcs.2016.04.029",

language = "English",

volume = "656, Part B",

pages = "160 -- 172",

journal = "Theoretical Computer Science",

issn = "0304-3975",

publisher = "Elsevier",

}

TY - JOUR

T1 - Linear-time computation of prefix table for weighted strings & applications

AU - Barton, Carl

AU - Liu, Chang

AU - Pissis, Solon

N1 - Stringology: In Celebration of Bill Smyth’s 80th Birthday

PY - 2016

Y1 - 2016

N2 - The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1 / z , defined as the minimal probability of occurrence of factors in x, we present an O ( n ) -time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.

AB - The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1 / z , defined as the minimal probability of occurrence of factors in x, we present an O ( n ) -time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.

KW - algorithms on strings

KW - weighted strings

KW - uncertain sequences

U2 - 10.1016/j.tcs.2016.04.029

DO - 10.1016/j.tcs.2016.04.029

M3 - Article

SN - 0304-3975

VL - 656, Part B

SP - 160

EP - 172

JO - Theoretical Computer Science

JF - Theoretical Computer Science

ER -

Linear-time computation of prefix table for weighted strings & applications

Abstract

Keywords

Access to Document

Fingerprint

Cite this