Repository logo
 

Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment


Type

Thesis

Change log

Authors

von Kügelgen, Julius  ORCID logo  https://orcid.org/0000-0001-6469-4118

Abstract

This thesis brings together ideas from causality and representation learning. Causal models provide rich descriptions of complex systems as sets of mechanisms by which each variable is influenced by its direct causes. They support reasoning about manipulating parts of the system, capture a whole range of interventional distributions, and thus hold promise for addressing some of the open challenges of artificial intelligence (AI), such as planning, transferring knowledge in changing environments, or robustness to distribution shifts. However, a key obstacle to a more widespread use of causal models in AI is the requirement that the relevant variables need to be specified a priori, which is typically not the case for the high-dimensional, unstructured data processed by modern AI systems. At the same time, machine learning (ML) has proven quite successful at automatically extracting useful and compact representations of such complex data. Causal representation learning (CRL) aims to combine the core strengths of ML and causality by learning representations in the form of latent variables endowed with causal model semantics.

In this thesis, we study and present new results for different CRL settings. A central theme is the question of identifiability: Given infinite data, when are representations satisfying the same learning objective guaranteed to be equivalent? This is arguably an important prerequisite for CRL, as it formally characterises if and when a learning task is, at least in principle, feasible. Since learning causal models—even without a representation learning component—is notoriously difficult, we require additional assumptions on the model class or rich data beyond the classical i.i.d. setting. For unsupervised representation learning from i.i.d. data, we develop independent mechanism analysis, a constraint on the mixing function mapping latent to observed variables, which is shown to promote the identifiability of independent latents. For a multi-view setting of learning from pairs of non-independent observations, we prove that the invariant block of latents that are always shared across views can be identified. Finally, for a multi-environment setting of learning from non-identically distributed datasets arising from perfect single-node interventions, we show that the latents and their causal graph are identifiable.

By studying and partially characterising identifiability for different settings, this thesis investigates what is possible and impossible for CRL without direct supervision, and thus contributes to its theoretical foundations. Ideally, the developed insights can help inform data collection practices or inspire the design of new practical estimation methods and algorithms.

Description

Date

2023-11-27

Advisors

Weller, Adrian

Keywords

AI, artificial intelligence, causal inference, causal representation learning, causality, identfifiability, latent variable model, machine learning, representation learning, unsupervised learning

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge