Dokument: NeuralBeds: Neural embeddings for efficient DNA data compression and optimized similarity search

Titel:NeuralBeds: Neural embeddings for efficient DNA data compression and optimized similarity search
URL für Lesezeichen:https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=68377
URN (NBN):urn:nbn:de:hbz:061-20250131-103507-5
Kollektion:Publikationen
Sprache:Englisch
Dokumententyp:Wissenschaftliche Texte » Artikel, Aufsatz
Medientyp:Text
Autoren: Sarumi, Oluwafemi A. [Autor]
Hahn, Maximilian [Autor]
Heider, Dominik [Autor]
Dateien:
[Dateien anzeigen]Adobe PDF
[Details]1,26 MB in einer Datei
[ZIP-Datei erzeugen]
Dateien vom 31.01.2025 / geändert 31.01.2025
Stichwörter:DNA similarity, Neural embeddings, Artificial intelligence
Beschreibung:The availability of high throughput sequencing tools coupled with the declining costs in the production of DNA sequences has led to the generation of enormous amounts of omics data curated in several databases such as NCBI and EMBL. Identification of similar DNA sequences from these databases is one of the fundamental tasks in bioinformatics. It is essential for discovering homologous sequences in organisms, phylogenetic studies of evolutionary relationships among several biological entities, or detection of pathogens. Improving DNA similarity search is of outmost importance because of the increased complexity of the evergrowing repositories of sequences. Therefore, instead of using the conventional approach of comparing raw sequences, e.g., in fasta format, a numerical representation of the sequences can be used to calculate their similarities and optimize the search process. In this study, we analyzed different approaches for numerical embeddings, including Chaos Game Representation, hashing, and neural networks, and compared them with classical approaches such as principal component analysis. It turned out that neural networks generate embeddings that are able to capture the similarity between DNA sequences as a distance measure and outperform the other approaches on DNA similarity search, significantly.
Rechtliche Vermerke:Originalveröffentlichung:
Sarumi, O. A., Hahn, M., & Heider, D. (2024). NeuralBeds: Neural embeddings for efficient DNA data compression and optimized similarity search. Computational and Structural Biotechnology Journal , 23, 732–741. https://doi.org/10.1016/j.csbj.2023.12.046
Lizenz:Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung 4.0 International Lizenz
Fachbereich / Einrichtung:Mathematisch- Naturwissenschaftliche Fakultät
Dokument erstellt am:31.01.2025
Dateien geändert am:31.01.2025
english
Benutzer
Status: Gast
Aktionen