Abstract:
We introduce Pantheon-DNA, an end-to-end processing pipeline for DNA data storage that effectively addresses scalability challenges while efficiently managing large datasets, maintaining ≥99.996% retrievability at 10× coverage under both LER and HER in our tests. To prevent repetitive patterns in DNA sequences, which potentially cause chimeras at the molecular level and also hinder clustering algorithms, we propose a data arrangement scheme and a randomization procedure during encoding. We use block data architecture to enhance parallel processing and retrieval. The proposed sequencing data preprocessing pipeline utilizes prior knowledge of the data structure encoded in the DNA sequences to simplify conventional clustering routines and reduce computational complexity. The system’s robustness and reliability are validated through an actual synthesis and sequencing experiment, which encodes and decodes 1.59 MB of data containing multiple files. Future enhancements will focus on refining error correction capabilities, particularly for indel recovery, as well as optimizing preprocessing efficiency and sensitivity.
Referência:
LEAL, Adriano Galindo; AOYAGI, Thiago Yuji; COSTA-MARTINS, André Guilherme; SOUZA, Diego Trindade de; SILVA, Cristina Maria Ferreira; UEDA, Eduardo Takeo; PARADA, Marcelo Gonzaga de Oliveira; FEITOSA, Allan Eduardo; FUJITA, André. Pantheon-DNA: versatile encoding-decoding system with integrated adaptive NGS preprocessing algorithms for DNA data storage. Computational and Structural Biotechnology Journal, v.27, p.3951-3951, 2025.
Acesso ao artigo no site do Periódico: