Repository logo
 

On the Optimality of the Lexicon


Type

Thesis

Change log

Authors

Pimentel Martins Da Silva, Tiago 

Abstract

The principle of least effort posits that a pressure towards communicative efficiency shapes natural languages. In this thesis, we investigate the existence, the nature, and the impact of such a pressure in natural languages' lexicons. We investigate the existence of this pressure by (i) estimating what optimal word lengths would be using coding theory, (ii) proposing pressure-free baselines and estimating their word lengths, and then (iii) comparing natural lexicons to both these optimal and pressure-free artificial lexicons. We investigate the nature of this pressure by comparing multiple ways in which communicative efficiency can be operationalised; we formalise it as either a pressure to shorten utterances, or a pressure to keep information rates as close as possible to an unknown communication channel capacity. Finally, we study the impact of this pressure on cross-linguistic differences in word lengths and on the ratio of homophones in natural languages. Overall, our results support a Zipfian view of communicative efficiency, in which lexicons are pressured towards having utterances that are as short as possible. Our results, however, also highlight the existence of competing constraints and pressures in how lexicons are structured: (i) a language's phonotactic complexity seems to bottleneck the extent to which economy of expression can optimise a lexicon, and (ii) a pressure for clarity seems to keep the ratio of homophones in a language close to chance.

Description

Date

2023-10-29

Advisors

Teufel, Simone

Keywords

computational linguistics, information-theoretic linguistics, language efficiency, lexical optimisation, word length

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge