Analysis of Environmental Treaty Design: A Data Science Approach

Kunz, Martina

Analysis of Environmental Treaty Design: A Data Science Approach

Repository URI

https://www.repository.cam.ac.uk/handle/1810/362545

Repository DOI

https://doi.org/10.17863/CAM.104673

Files

Thesis (7.58 MB)

Type

Thesis

Authors

Kunz, Martina

https://orcid.org/0000-0002-1345-2538

Abstract

There are hundreds if not thousands of international agreements governing all sorts of environmental problems, from endangered species and pollution to stratospheric ozone depletion and climate change. Analysing and describing the provisions of all these treaties using the traditional `reading and writing' approach has become all but impossible. The main proposals for solving this epistemic challenge involve either time-consuming manual approaches to building datasets, or use statistical natural language processing (NLP) for a different kind of content analysis. This thesis proposes an intermediate approach, leveraging rule-based NLP for dataset construction and employing statistics and machine learning only for downstream analysis. Traditional legal research can thus be supported and complemented while taking advantage of data science and automation. The approach is developed with a set of about 120 open multilateral environmental agreements and about 50 treaty design variables. Regular expression pattern matching is found to be well suited for accurate and precise extraction of information from common treaty provisions such as those on entry into force, amendment, supplementary agreements, treaty organs, withdrawal, termination and dispute settlement. Implementation-related provisions, including national reporting, international verification of compliance, treaty progress review, non-compliance procedures and sanctions are more difficult to capture and compare across treaties, but this difficulty itself is of interest for the analysis of treaty design. The variables, their distribution and associations are described and the speed of entry into force is predicted using various techniques including linear regression and neural networks.

Regarding the larger epistemic challenge, the scalability of the approach is assessed and limitations of existing treaty databases and research practices are identified. Drawing from achievements of the bioinformatics and linked open data communities, I argue that a collaborative, incrementally expanding database, or findable, accessible, interoperable and reusable (FAIR) datasets would make the approach scalable. This relies on a standardised vocabulary or formal ontology for data integration. Accordingly, the thesis builds a proof-of-concept Public International Law Ontology and an NLP pipeline to populate the ontology with data gathered from treaty texts and participation records. Output formats and interfaces are designed for wide accessibility, without requiring programming skills. All software and data accompanying this thesis are available under a free and open source licence.

Date

2023-07-31

Advisors

Vinuales, Jorge

Keywords

data science, environmental agreements, international law, linked open data

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Sponsorship

Swiss National Science Foundation Doc.Mobility Grant number 168340.

Relationships

Is supplemented by:

https://doi.org/10.5281/zenodo.10078710

Collections

Theses - Land Economy