Question Answering on Dynamic Knowledge Graph for Chemistry

Zhou, Xiaochi

Question Answering on Dynamic Knowledge Graph for Chemistry

Repository URI

https://www.repository.cam.ac.uk/handle/1810/366843

Repository DOI

https://doi.org/10.17863/CAM.107599

Files

Primary Thesis (7 MB)

Type

Thesis

Authors

Zhou, Xiaochi

https://orcid.org/0000-0002-4008-9965

Abstract

The field of chemistry relies heavily on accessing diverse types of data and information, both for human users seeking information and for implementing applications. While the chemistry Knowledge Graph provides a solution for representing this data and information, it poses challenges for human users to access it efficiently. A Knowledge Graph Question Answering system is one of the solutions. However, due to the specific nature of the chemistry Knowledge Graph, off-the-shelf solutions for Knowledge Graph Question Answering might be less effective. As a result, this thesis explores implementing Knowledge Graph Question Answering on the chemistry Knowledge Graph, addressing challenges querying it. The challenges addressed in this thesis include the non-shallow structure, the semantic heterogeneity, the embedding of numerical values, and the large scale of the chemistry Knowledge Graph. The thesis studied the two main methods for Knowledge Graph Question Answering: the Semantic Parsing and Knowledge Graph embedding methods against the chemistry Knowledge Graph. The thesis also investigated models including LDA-based topic modelling, StarSpace-based text classification, CRF-based Named Entity Recognition, Knowledge Graph embedding models (TransE, Complex, TransR, and TransRA), relation prediction, and score alignment. The first study implements a Semantic Parsing-based Question Answering system using CRF-based Named Entity Recognition to extract key components from questions. The system also applies an ontology lookup service to ground components to semantic representations. StarSpace-based text classification locates suitable SPARQL query templates. SPARQL queries are formed by filling semantic representations into templates, enabling data retrieval. Evaluation results show that the system outperforms Wolfram Alpha and Google Search Engine baselines in some question types. The second study explores integrating semantic agents to expand system coverage. Modifications to OntoAgent ontology enable agent discovery, matching, and invocation. Semantic agent descriptions include question templates for automated training question generation. Evaluation results show a 1.0 F1 score for StarSpace question classification and a 0.95 F1 score for CRF-based Named Entity classification. For agent-related questions, 83% have correct requests and 81% have correct answers. The final study investigates Knowledge Graph embedding-based Question Answering methods and various embedding techniques. Knowledge Graph embedding represents information, while BERT-based relation prediction predicts question embeddings. Answer candidates are ranked based on triple likelihood calculated from the topic entity, predicted relation, and candidate embeddings. A novel Knowledge Graph embedding algorithm, TransRA, has numerically higher filtered mean reciprocal rank than other embedding methods. A BERT-based score alignment model integrates and re-ranks answers, increasing mean reciprocal rank by 0.41. Evaluation results show filtered mean reciprocal rank ranging from 0.53 to 0.88 across domains.

Date

2023-09-29

Advisors

Kraft, Markus

Keywords

Knowledge Graph, Question Answering

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Attribution 4.0 International (CC BY 4.0)

Collections

Theses - Chemical Engineering and Biotechnology