Optimal computation of all tandem repeats in a weighted sequence

Carl Barton; Costas S. Iliopoulos; Solon P. Pissis

doi:10.1186/s13015-014-0021-5

Optimal computation of all tandem repeats in a weighted sequence

Carl Barton^*, Costas S. Iliopoulos, Solon P. Pissis

^*Corresponding author for this work

Informatics

Research output: Contribution to journal › Article › peer-review

12 Citations (Scopus)

Abstract

Background: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment.

Results: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal O (n log n)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal O(n log n) time, thus improving on the best known O (n(2))-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.

Original language	English
Article number	21
Number of pages	8
Journal	Algorithms for Molecular Biology
Volume	9
DOIs	https://doi.org/10.1186/s13015-014-0021-5
Publication status	Published - 16 Aug 2014

Keywords

Tandem repeats
Weighted sequences
IUPAC notation
REPETITIONS
DNA
EFFICIENT
PROTEINS

Access to Document

10.1186/s13015-014-0021-5

Cite this

@article{822768b164004af7bc71114bd7f0ccaa,

title = "Optimal computation of all tandem repeats in a weighted sequence",

abstract = "Background: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment.Results: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal O (n log n)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal O(n log n) time, thus improving on the best known O (n(2))-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.",

keywords = "Tandem repeats, Weighted sequences, IUPAC notation, REPETITIONS, DNA, EFFICIENT, PROTEINS",

author = "Carl Barton and Iliopoulos, {Costas S.} and Pissis, {Solon P.}",

year = "2014",

month = aug,

day = "16",

doi = "10.1186/s13015-014-0021-5",

language = "English",

volume = "9",

journal = "Algorithms for Molecular Biology",

issn = "1748-7188",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Optimal computation of all tandem repeats in a weighted sequence

AU - Barton, Carl

AU - Iliopoulos, Costas S.

AU - Pissis, Solon P.

PY - 2014/8/16

Y1 - 2014/8/16

N2 - Background: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment.Results: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal O (n log n)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal O(n log n) time, thus improving on the best known O (n(2))-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.

AB - Background: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment.Results: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal O (n log n)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal O(n log n) time, thus improving on the best known O (n(2))-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.

KW - Tandem repeats

KW - Weighted sequences

KW - IUPAC notation

KW - REPETITIONS

KW - DNA

KW - EFFICIENT

KW - PROTEINS

U2 - 10.1186/s13015-014-0021-5

DO - 10.1186/s13015-014-0021-5

M3 - Article

SN - 1748-7188

VL - 9

JO - Algorithms for Molecular Biology

JF - Algorithms for Molecular Biology

M1 - 21

ER -

Optimal computation of all tandem repeats in a weighted sequence

Abstract

Keywords

Access to Document

Fingerprint

Cite this