BGC-03 Biogeochemistry of DOM
Improved understanding of photochemical processing of DOM using machine learning approaches
Chen Zhao* , Department of Ocean Science and the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Hong Kong, China
Xinyue Xu, Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Yifu Hou, Department of Ocean Science and the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Hong Kong, China
Yuanbi Yi, Department of Ocean Science and the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Hong Kong, China
Chen He, State Key Laboratory of Heavy Oil Processing, China University of Petroleum, Changping District, Beijing 102249, China
Quan Shi, State Key Laboratory of Heavy Oil Processing, China University of Petroleum, Changping District, Beijing 102249, China
Xiaomeng Li, Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Ding He, Department of Ocean Science and the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Hong Kong, China

As one of the largest active carbon pools in the ocean, the chemical composition, reactivity, and lability of dissolved organic matter (DOM) are closely associated with regional and global carbon cycling. In particular, photochemical reactions (photo-production and photodegradation) are essential components of the biogeochemical processing of DOM to alter its chemical composition. The state-of-the-art ultra-high resolution mass spectrometry, including the Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) or orbitrap mass spectrometry (orbitrap MS), can unprecedently provide several thousands of formulae within one sample and effectively depict DOM at the molecular level. Previous studies have demonstrated that significant changes in DOM molecular compositions occurred after photo-incubation. The formulae linked to photochemistry processing were typically classified as photo-resistant, photo-labile, and photo-product types. However, current studies still have challenges to discriminate multiple concurrent formulae with different photochemical reactivity. One obstacle is that molecular matching alone would probably lead to biased estimates of the photo-reactivity of specific formulae among different samples. Considering the complexity of the DOM transformation, it is reasonable to assume that the relationship between the classes of photochemistry-related formulae and their molecular composition should be complicated and non-linear, which cannot be simply elucidated through the molecular matching approach. Machine learning methodologies have proven to be promising tools to address traditional geochemical concerns. Here, we established photochemistry-related formulae classification models using various machine learning algorithms (e.g., random forest, XGboost, deep learning neural networks), based on several recognized molecular pools generated from several photochemistry experiments. The environmental specificity of photochemistry-related formulae was further assessed to provide a more precise constraint of the photochemical alteration of DOM in coastal bay ecosystems. This work aims to provide novel insights into exploring valuable molecular information provided by ultra-high-resolution mass spectrometry.