Repository logo
 

Active Learning-Closed-Loop Optimisation for Organic Chemistry and Formulations Research


Change log

Authors

Pomberger, Alexander  ORCID logo  https://orcid.org/0000-0003-2267-7090

Abstract

The discipline of chemistry emerged in the 18th century and ever since then has been a major driver of todayʼs technologies, leading to associated comfort and resilience of the human species. Organic chemistry particularly aims to deliver tailored molecules that fulfil tasks for various purposes such as medication to treat diseases, agricultural chemicals to ensure sufficient nutrition or monomers to manufacture materials. Finding optimal methods and pathways toward target molecules is a task that has been intuition-guided for years, requiring expert chemists to select chemical process parameters based on experience. This thesis focuses on strategies to automate chemical reactions and conditions optimisation by employing active learning-driven closed-loop optimisation and merging algorithmic efforts with robotic technologies.

First, I explored the effect of different machine-readable representations on closed- loop optimisation performance for an organic reaction towards a small molecule. Chemical descriptors with increasing information content were calculated ranging from generic OHE to bespoke and expensive DFT-derived descriptors. Moreover, the balance between descriptor complexity and dataset size for initialising optimisation was investigated. Based on the study, complex descriptors did not outperform simple OHE representation and it was shown that larger initial datasets delivered better performance than smaller initial datasets containing highly informative descriptors.

Second, I investigated the efficiency of an active learning algorithm for the adjustment of the pH value within aqueous multi-buffered poly-protic buffer systems. Those solutions have tremendous importance in biological systems and are hard to model mathematically. By applying a data-driven optimisation I was able to model the pH response of the system after acid/base addition and control a developed liquid-handling robotic platform capable of conducting the experiments. Eventually, I managed to demonstrate that transfer learning, a method where the model leveraged prior data from a similar task to the target task, could increase efficiency up to 40%.

Finally, I aimed to combine the learnings from the first two projects and investigate the feasibility of employing transfer learning to closed-loop chemical reaction optimisation workflows. In detail, an XGB model was trained on source data from the Pistachio online database with the aim to boost performance of a target task which required finding the ideal reaction parameters. While I learned that the weighting of the data as well as the exclusion of the low-quality source data (after a specified number of iterations) matters for the performance of the active learning, the overall benefits so far did not justify the added work effort and require further investigation.

Description

Date

2023-12-22

Advisors

Lapkin, Alexei

Keywords

Machine learning, Reaction Optimization, Synthetic Chemistry

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
BASF PhD Scholarship