How to model environmentally relevant chemical reactions with machine learning

BY Huichun Zhang|
2022-08-11
|Article view (WeChat):


Prof. Huichun Zhang

Frank H. Neff professor

Department of Civil and Environmental Engineering, Case Western Reserve University




Abstract: Environmental chemical reactions have been frequently investigated for various purposes; however, it remains challenging to accurately model the reaction kinetics under different conditions. Existing studies mostly model reaction kinetics with traditional quantitative structure-activity relationships (QSARs); however, these approaches generally require extensive feature engineering. Recently, machine learning (ML) has become a promising tool for modeling chemical reactions as ML can achieve better performance and is powerful in using diverse chemical representations. In this talk, we use two examples to demonstrate how to use ML to model environmental reactions of different samples sizes. In the first example, we compiled a large database of 12750 records for aerobic biodegradability, considering both ready and inherent biodegradation under different conditions, and then developed regression and classification models using different chemical representations and ML algorithms. The best regression model (R2 = 0.54 and root mean square error of 0.25) and classification model (the prediction accuracy from 85.1%) achieved very good performance. The models also showed large applicability domains and provided reasonable predictions for the biodegradability of more than 98% of over 850,000 environmentally relevant chemicals in the database Distributed Structure-Searchable Toxicity (DSSTox). In the second example, we proposed two approaches to model the reactivity of organic contaminants toward four oxidants—SO4•-, HClO, O3 and ClO2—all with small sample sizes: combining small datasets and transferring knowledge between them. We first merged these datasets and developed unified ML models, which showed better predictive performance than the individual models because the model ‘corrected’ wrongly learned effects of several atom groups. We then developed knowledge transfer models between two datasets and observed different predictive performance.


HostProf. Cheng Gu

            Executive Editor

            Nanjing University


Time09:00pm August 11, 2022 (Beijing time)

Zoom ID: 816 9975 7155

Bilibili: 25002335

Video: How to model environmentally relevant chemical reactions with machine learning