On avoided words, absent words, and their application to biological sequence analysis

Research output: Contribution to journalArticlepeer-review

24 Citations (Scopus)
187 Downloads (Pure)

Abstract

The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided. This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w, denoted by dev(w), effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word w of length k > 2 is a -avoided word in x if dev(w) , for a given threshold < 0. Notice that such a word may be completely absent from x. Hence, computing all such words naïvely can be a very time-consuming procedure, in particular for large k.
Original languageEnglish
Number of pages12
JournalAlgorithms for Molecular Biology
Volume12
Issue number1
DOIs
Publication statusPublished - 14 Mar 2017

Fingerprint

Dive into the research topics of 'On avoided words, absent words, and their application to biological sequence analysis'. Together they form a unique fingerprint.

Cite this