Optimal computation of all tandem repeats in a weighted sequence

Carl Barton*, Costas S. Iliopoulos, Solon P. Pissis

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)

Abstract

Background: Tandem duplication, in the context of molecular biology, occurs as a result of mutational events in which an original segment of DNA is converted into a sequence of individual copies. More formally, a repetition or tandem repeat in a string of letters consists of exact concatenations of identical factors of the string. Biologists are interested in approximate tandem repeats and not necessarily only in exact tandem repeats. A weighted sequence is a string in which a set of letters may occur at each position with respective probabilities of occurrence. It naturally arises in many biological contexts and provides a method to realise the approximation among distinct adjacent occurrences of the same DNA segment.

Results: Crochemore's repetitions algorithm, also referred to as Crochemore's partitioning algorithm, was introduced in 1981, and was the first optimal O (n log n)-time algorithm to compute all repetitions in a string of length n. In this article, we present a novel variant of Crochemore's partitioning algorithm for weighted sequences, which requires optimal O(n log n) time, thus improving on the best known O (n(2))-time algorithm (Zhang et al., 2013) for computing all repetitions in a weighted sequence of length n.

Original languageEnglish
Article number21
Number of pages8
JournalAlgorithms for Molecular Biology
Volume9
DOIs
Publication statusPublished - 16 Aug 2014

Keywords

  • Tandem repeats
  • Weighted sequences
  • IUPAC notation
  • REPETITIONS
  • DNA
  • EFFICIENT
  • PROTEINS

Fingerprint

Dive into the research topics of 'Optimal computation of all tandem repeats in a weighted sequence'. Together they form a unique fingerprint.

Cite this