Curating Novel Datasets for Machine Learning Applications in Organic Optoelectronics and Reaction Mechanism Prediction
Name
Joonyoung F. Joung
Affiliation
Department of Chemical Engineering, Massachusetts Institute of Technology
Abstract
In the data-driven paradigm, the importance of high-quality datasets cannot be overstated. While many machine learning researchers rely on well-curated and pre-polished datasets, this often restricts them to addressing only the problems those datasets are designed for. Tackling new and more diverse challenges necessitates the development of novel datasets, making their creation and publication a critical endeavor. In this presentation, I will introduce two datasets I have developed: (1) a collection of seven experimentally measured optical properties and HOMO-LUMO energy levels of organic molecules, and (2) a dataset for organic reaction mechanisms. These datasets offer new opportunities for advancing machine learning applications. Specifically, the first dataset can accelerate the development of organic optoelectronic devices such as OLEDs, while the second can aid in predicting potential impurities formed during chemical reactions. Through these contributions, I aim to emphasize the value of creating and sharing new datasets to address emerging challenges in both material science and organic chemistry.