Learning chemical representation in environmental data science

BY Feng Gao|

2022-10-13

|Article view (WeChat):

Dr Feng Gao

Columbia University, USA

Abstract: Assessing fate, transport, and toxicity of contaminants is vital to evaluating the potential exposures and risks to human. Emerging machine learning and deep learning models have been used to predict chemical ecotoxicity and important fate properties such as bioconcentration to complement time-consuming and labor-intensive experiments. The performance of these machine learning models heavily relies on the numerical representation of chemicals. On the other hand, representation learning is a class of machine learning and deep learning approach that learns representation of the data to be used for various downstream tasks such as regression, classification, or clustering. In this talk, I’ll discuss three common ways of representing molecules: molecular fingerprints, molecular physicochemical properties, and molecular graphs. I’ll share our recent work on using both supervised and unsupervised machine learning/deep learning methods to learn chemical representations and predict the fate and toxicity of organic contaminants. Specially, I’ll discuss our work on linking molecular substructures with bioconcentration through molecular fingerprints and learning chemical representations from hundreds of chemical physicochemical properties for toxicity prediction. Finally, I’ll talk about a novel unsupervised graph learning method we developed named geometric scattering transform (GST). GST can learn representations from graph-structured data, and we comprehensively tested its performance on seven biochemistry datasets. Our results demonstrate that learning chemical representation can provide unique perspectives and is important in building predictive models towards accurately assessing the fate and toxicity of organic contaminants.

Host：Assist. Prof. Yanbin Zhao

EEH Early Career Board Member

Shanghai Jiao Tong University

Time：9:00am Oct 13, 2022 (Beijing time)

Zoom ID: 816 9975 7155

Bilibili: 25002335

导航

Talk@EEH

Learning chemical representation in environmental data science

Explore Content

Journal Information

Publishing with Us

Collections

Academic Co-partner

Publishing Partner