Querying highly similar sequences

Carl Barton; Mathieu Giraud; Costas S. Iliopoulos; Thierry Lecroq; Laurent Mouchard; Solon P. Pissis

doi:10.1504/IJCBDD.2013.052206

Querying highly similar sequences

Carl Barton, Mathieu Giraud, Costas S. Iliopoulos, Thierry Lecroq, Laurent Mouchard, Solon P. Pissis

Informatics

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S(0), S(1), …, S(k), of sequences of equal length, where S(i), for all 1≤i≤k, differs from S(0) by a constant number of errors - around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k - the number of sequences.

Original language	English
Pages (from-to)	119 - 130
Number of pages	12
Journal	International Journal of Computational Biology and Drug Design
Volume	6
Issue number	1
DOIs	https://doi.org/10.1504/IJCBDD.2013.052206
Publication status	Published - 2013

Access to Document

10.1504/IJCBDD.2013.052206

Cite this

@article{d68f9c567529445991a86d7c741652c1,

title = "Querying highly similar sequences",

abstract = "In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S(0), S(1), …, S(k), of sequences of equal length, where S(i), for all 1≤i≤k, differs from S(0) by a constant number of errors - around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k - the number of sequences.",

author = "Carl Barton and Mathieu Giraud and Iliopoulos, {Costas S.} and Thierry Lecroq and Laurent Mouchard and Pissis, {Solon P.}",

year = "2013",

doi = "10.1504/IJCBDD.2013.052206",

language = "English",

volume = "6",

pages = "119 -- 130",

journal = "International Journal of Computational Biology and Drug Design",

number = "1",

}

TY - JOUR

T1 - Querying highly similar sequences

AU - Barton, Carl

AU - Giraud, Mathieu

AU - Iliopoulos, Costas S.

AU - Lecroq, Thierry

AU - Mouchard, Laurent

AU - Pissis, Solon P.

PY - 2013

Y1 - 2013

N2 - In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S(0), S(1), …, S(k), of sequences of equal length, where S(i), for all 1≤i≤k, differs from S(0) by a constant number of errors - around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k - the number of sequences.

AB - In this paper, we present a solution to the extreme similarity sequencing problem. The extreme similarity sequencing problem consists of finding occurrences of a pattern p in a set S(0), S(1), …, S(k), of sequences of equal length, where S(i), for all 1≤i≤k, differs from S(0) by a constant number of errors - around 10 in practice. We present an asymptotically fast O(n + occ logocc) time algorithm, as well as a practical O(nk/w) time algorithm for solving this problem, where n is the length of a sequence, occ is the number of candidate occurrences reported by our technique, w is the size of the machine word, and the total number of errors is bounded by k - the number of sequences.

U2 - 10.1504/IJCBDD.2013.052206

DO - 10.1504/IJCBDD.2013.052206

M3 - Article

VL - 6

SP - 119

EP - 130

JO - International Journal of Computational Biology and Drug Design

JF - International Journal of Computational Biology and Drug Design

IS - 1

ER -

Querying highly similar sequences

Abstract

Access to Document

Fingerprint

Cite this