Improving de novo molecule generation for structure-based drug design

Thomas, Morgan

Improving de novo molecule generation for structure-based drug design

Repository URI

https://www.repository.cam.ac.uk/handle/1810/367394

Repository DOI

https://doi.org/10.17863/CAM.107998

Files

Primary Thesis (16.6 MB)

Type

Thesis

Authors

Thomas, Morgan

https://orcid.org/0000-0002-1610-3499

Abstract

De novo molecule generation for drug design has seen a resurgence in recent years, mostly due to the rapid advances in machine learning (ML) algorithms that utilise deep neural networks, resulting in a plethora of ML-based generative models. However, there is often a large disparity in published evaluations and applications of such approaches compared to the practical needs of real drug design projects (for example, optimizing QED versus optimizing binding affinity commonly approximated by structure-based approaches). Moreover, the density of approaches and often lack of relevant, standardized objectives makes it difficult to truly discern “state-of-the-art”. The work in this thesis aims to address some of these issues and improve the applicability and evaluation of de novo molecule generation for practical drug design.

The first research chapter will outline the design and use of an open-source python-based software named MolScore. This configurable suite of scoring functions (including an interface to 5 docking algorithms and ~2,300 trained bioactivity models) can be used to design difficult yet relevant drug design objectives for standardized comparison, or practical usage with generative models. In addition, MolScore includes a graphical user interface to improve usability and a suite of common evaluation metrics to evaluate de novo generated molecules.

Next, MolScore was implemented to compare the use of docking as a more difficult objective function for REINVENT (a generative model for goal-directed de novo molecule generation), as opposed to more commonly used predictive models of molecule bioactivity. This resulted in increased diversity of de novo molecules and improved coverage of known bioactive chemical space. However, the added computational expense required for generative model optimization is a practical disadvantage of docking as a scoring function.

To address the computational expense of optimizing docking scores, a hybrid reinforcement learning algorithm (Augmented Hill-Climb) is proposed to improve the learning efficiency of language-based generative models. This significantly reduced the computational runtime while maintaining the chemical desirability of de novo molecules. Augmented Hill-Climb displayed superior efficiency against four other commonly used reinforcement learning algorithms, also displayed in an alternative model architecture. It was then benchmarked against 22 various generative models showing the best sample efficiency when additionally constraining for chemical desirability.

Overall, the work outlined in this thesis contributes to the field of computational drug design by providing software, algorithmic developments, and benchmark results for different de novo molecule generation approaches.

Date

2023-08-03

Advisors

Bender, Andreas

Keywords

De novo molecule generation, Drug design, Generative models, Reinforcement learning, Structure-based drug design

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Sponsorship

This PhD was funded by Nxera Therapeutics (previously branded Sosei Heptares)

Collections

Theses - Chemistry