BIO-03 Diversity of marine host-associated microbiomes
Symbiont-Screener: a reference-free tool to separate host sequences from symbionts for error-prone long reads
Mengyang Xu* , BGI-Qingdao, Qingdao, 266555, China BGI-Shenzhen, Shenzhen, 518083, China
Lidong Guo, BGI-Qingdao, Qingdao, 266555, China College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
Chengcheng Shi, BGI-Qingdao, Qingdao, 266555, China
Xiaochuan Liu, BGI-Qingdao, Qingdao, 266555, China
Yanwei Qi, BGI-Qingdao, Qingdao, 266555, China
Jianwei Chen, BGI-Qingdao, Qingdao, 266555, China
Jinglin Han, BGI-Qingdao, Qingdao, 266555, China
Li Deng, BGI-Qingdao, Qingdao, 266555, China
Xin Liu, BGI-Qingdao, Qingdao, 266555, China BGI-Shenzhen, Shenzhen, 518083, China State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China
Guangyi Fan, BGI-Qingdao, Qingdao, 266555, China BGI-Shenzhen, Shenzhen, 518083, China State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China

Although decontamination is necessary for eliminating the effect of foreign genomes on symbiosis research and biomedical discoveries, the direct extraction of host error-prone long reads with no references remains challenging. We here present Symbiont-Screener, a reference-free approach to identify host raw long reads according to a trio-based screening model, which exploits strobemer and unsupervised clustering to overcome high error rates. When applied to simulated and real contaminated datasets, it outperforms other de novo decontamination tools, and obtains high precision and recall of decontamination comparable to that of state-of-the-art reference-based classifiers, thus promising a high-quality genome reconstruction of the host. The code of the analysis is available at https://github.com/BGI-Qingdao/Symbiont-Screener. This research was supported by the National Natural Science Foundation of China (Grant No. 32100514).