Dokument: SWGTS—a platform for stream-based host DNA depletion

Titel:SWGTS—a platform for stream-based host DNA depletion
URL für Lesezeichen:https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=68456
URN (NBN):urn:nbn:de:hbz:061-20250205-141149-0
Kollektion:Publikationen
Sprache:Englisch
Dokumententyp:Wissenschaftliche Texte » Artikel, Aufsatz
Medientyp:Text
Autoren: Spohr, Philipp [Autor]
Ried, Max [Autor]
Kühle, Laura [Autor]
Dilthey, Alexander [Autor]
Dateien:
[Dateien anzeigen]Adobe PDF
[Details]1,81 MB in einer Datei
[ZIP-Datei erzeugen]
Dateien vom 05.02.2025 / geändert 05.02.2025
Beschreibung:Motivation

Microbial sequencing data from clinical samples is often contaminated with human sequences, which have to be removed prior to sharing. Existing methods for human read removal, however, are applicable only after the target dataset has been retrieved in its entirety, putting the recipient at least temporarily in control of a potentially identifiable genetic dataset with potential implications under regulatory frameworks such as the GDPR. In some instances, the ability to carry out stream-based host depletion as part of the data transfer process may be preferable.
Results

We present SWGTS, a client–server application for the transfer and stream-based host depletion of sequencing reads. SWGTS enforces a robust upper bound on the maximum amount of human genetic data from any one client held in memory at any point in time by storing all incoming sequencing data in a limited-size, client-specific intermediate processing buffer, and by throttling the rate of incoming data if it exceeds the speed of host depletion carried out on the SWGTS server in the background. SWGTS exposes a HTTP–REST interface, is implemented using docker-compose, Redis and traefik, and requires less than 8 Gb of RAM for deployment. We demonstrate high filtering accuracy of SWGTS; incoming data transfer rates of up to 1.65 megabases per second in a conservative configuration; and mitigation of re-identification risks by the ability to limit the number of SNPs present on a popular population-scale genotyping array covered by reads in the SWGTS buffer to a low user-defined number, such as 10 or 100.
Rechtliche Vermerke:Originalveröffentlichung:
Spohr, P., Ried, M., Kühle, L., & Dilthey, A. (2024). SWGTS—a platform for stream-based host DNA depletion. Bioinformatics, 40(6), Article btae332. https://doi.org/10.1093/bioinformatics/btae332
Lizenz:Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung 4.0 International Lizenz
Fachbereich / Einrichtung:Mathematisch- Naturwissenschaftliche Fakultät
Dokument erstellt am:05.02.2025
Dateien geändert am:05.02.2025
english
Benutzer
Status: Gast
Aktionen