Microbiome composition and implications for ballast water classification using machine learning.

Abstract

Ballast water is a vector for global translocation of microorganisms, and should be monitored to protect human and environmental health. This study utilizes high throughput sequencing (HTS) and machine learning to examine the bacterial and fungal microbiomes of ballast water to identify associations between 16S and 18S rRNA genes and the fungal ITS region. These sequencing regions were examined using the SILVA v132 and UNITE reference databases. The highest correlation was found between the communities in Silva_16S and UNITE_ITS (0.74). There was a higher proportion of positive inter-kingdom correlations than positive intra-kingdom interactions (p = 0.032). Understanding the reasons for this difference requires additional research under more controlled conditions. Finally, a machine learning model was used to examine the classification accuracy when using each sequencing region and reference database to identify ballast residence time and ballast sample location. There was significantly higher accuracy using SILVA (0.843) compared to UNITE (0.614) (p < 0.001). In the short term, future research with the goal of classifying ballast water samples based on location or ballast water residence time should be performed using the 16S rRNA gene and SILVA reference database. Research to curate other sequencing regions or the UNITE reference database in the aquatic ecosystem may improve the utility of these tools.

DOI
10.1016/j.scitotenv.2019.07.053
Year