Repository logo
 

Algorithmic Approaches for Context-Informed Reaction Prediction


Change log

Authors

Abstract

Understanding the chemical reactivity of small organic molecules is one of the key challenges in chemistry and can unlock unimaginable value in the pharmaceutical industry and beyond by accelerating molecular synthesis. Computational approaches are becoming increasingly important, and an ecosystem of tools for understanding chemical reactions is under development with the ultimate aim of accelerating lab-based workflows. This thesis makes contributions across this ecosystem of computational tools, including knowledge representation, data preparation, and model development. The first chapter discusses the representation of molecules in detail, laying the foundation for all the following chapters. The discussion of data representation continues with a showcase of how the Unified Data Model (UDM) can be used as a language for closed-loop optimisation. UDM is a standard for storage of chemical data, much like the Open Reaction Database (ORD). The ORD contains millions of reactions, and to prepare machine learning datasets from these reactions, I present the Python package ORDerly. Datasets generated by ORDerly were used in a subsequent chapter to experiment with novel model architectures exploiting the hierarchical nature of chemical reaction classes. The final two chapters discuss how a deeper understanding of the mechanistic details of a reaction can accelerate optimisation using multi-task Bayesian optimisation and make out-of-sample predictions on the rate protodeboronation, the primary degradation pathway of boronic acids. This thesis touches on numerous topics, all with the ultimate goal of developing computational tools to accelerate synthesis.

Description

Date

2023-12-07

Advisors

Lapkin, Alexei

Keywords

big data, machine learning, organic chemistry, reaction optimisation

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
Engineering and Physical Sciences Research Council (2276995)
- UCB Pharma - Engineering and Physical Sciences Research Council via project EP/S024220/1 EPSRC Centre for Doctoral Training in Automated Chemical Synthesis Enabled by Digital Molecular Technologies - European Regional Development Fund via the project “Innovation Centre in Digital Molecular Technologies”.