TY - JOUR
T1 - On avoided words, absent words, and their application to biological sequence analysis
AU - Almirantis, Yannis
AU - Charalampopoulos, Panagiotis
AU - Gao, Jia
AU - Iliopoulos, Costas S.
AU - Mohamed, Manal
AU - Pissis, Solon P.
AU - Polychronopoulos, Dimitris
PY - 2017/3/14
Y1 - 2017/3/14
N2 - The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided. This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w, denoted by dev(w), effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word w of length k > 2 is a -avoided word in x if dev(w) , for a given threshold < 0. Notice that such a word may be completely absent from x. Hence, computing all such words naïvely can be
a very time-consuming procedure, in particular for large k.
AB - The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided. This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w, denoted by dev(w), effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word w of length k > 2 is a -avoided word in x if dev(w) , for a given threshold < 0. Notice that such a word may be completely absent from x. Hence, computing all such words naïvely can be
a very time-consuming procedure, in particular for large k.
U2 - 10.1186/s13015-017-0094-z
DO - 10.1186/s13015-017-0094-z
M3 - Article
SN - 1748-7188
VL - 12
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
ER -