IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences

Hayam Alamro; Mai Alzamel; Costas S. Iliopoulos; Solon P. Pissis; Steven Watts

doi:10.1186/s12859-021-03983-2

IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences

Hayam Alamro, Mai Alzamel, Costas S. Iliopoulos, Solon P. Pissis^*, Steven Watts

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

9 Citations (Scopus)

112 Downloads (Pure)

Abstract

Background: An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results: We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion: Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.

Original language	English
Article number	51
Journal	BMC Bioinformatics
Volume	22
Issue number	1
Early online date	Feb 2021
DOIs	https://doi.org/10.1186/s12859-021-03983-2
Publication status	Published - 6 Feb 2021

Keywords

Gaps
Inverted repeat
IUPAC
Mismatches
Palindrome
Software

Access to Document

10.1186/s12859-021-03983-2Licence: CC BY

IUPACpal efcient identifcation_ALAMRO_Acc27Jan2021Epub6Feb_GOLD VoR(CC BY)Final published version, 2.15 MBLicence: CC BY

Cite this

@article{54c4249bd62641cda9ed17c702661db3,

title = "IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences",

abstract = "Background: An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results: We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion: Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.",

keywords = "Gaps, Inverted repeat, IUPAC, Mismatches, Palindrome, Software",

author = "Hayam Alamro and Mai Alzamel and Iliopoulos, {Costas S.} and Pissis, {Solon P.} and Steven Watts",

note = "Funding Information: This project was supported by EPSRC DTA grant EP/M50788X-1. The funding body did not influence the study, collection, analysis or interpretation of any data. This project has received funding from the European Union{\textquoteright}s Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie grant agreement No 872539. Publisher Copyright: {\textcopyright} 2021, The Author(s). Copyright: Copyright 2021 Elsevier B.V., All rights reserved.",

year = "2021",

month = feb,

day = "6",

doi = "10.1186/s12859-021-03983-2",

language = "English",

volume = "22",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - IUPACpal

T2 - efficient identification of inverted repeats in IUPAC-encoded DNA sequences

AU - Alamro, Hayam

AU - Alzamel, Mai

AU - Iliopoulos, Costas S.

AU - Pissis, Solon P.

AU - Watts, Steven

N1 - Funding Information: This project was supported by EPSRC DTA grant EP/M50788X-1. The funding body did not influence the study, collection, analysis or interpretation of any data. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 872539. Publisher Copyright: © 2021, The Author(s). Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

PY - 2021/2/6

Y1 - 2021/2/6

N2 - Background: An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results: We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion: Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.

AB - Background: An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results: We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion: Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.

KW - Gaps

KW - Inverted repeat

KW - IUPAC

KW - Mismatches

KW - Palindrome

KW - Software

UR - http://www.scopus.com/inward/record.url?scp=85100685290&partnerID=8YFLogxK

U2 - 10.1186/s12859-021-03983-2

DO - 10.1186/s12859-021-03983-2

M3 - Article

AN - SCOPUS:85100685290

SN - 1471-2105

VL - 22

JO - BMC Bioinformatics

JF - BMC Bioinformatics

IS - 1

M1 - 51

ER -

IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this