Title: Exploring nonlinear regression methods, with application to association studies
Authors: Speed, Douglas Christopher
Supervisors: Tavaré, Simon
Keywords: Nonlinear regression
Association studies
Bayesian
Statistical genetics
Issue Date: 12-Jul-2011
Abstract: The field of nonlinear regression is a long way from reaching a consensus. Once a method decides to explore nonlinear combinations of predictors, a number of questions are raised, such as what nonlinear combinations to permit and how best to search the resulting model space. Genetic Association Studies comprise an area that stands to gain greatly from the development of more sophisticated regression methods. While these studies’ ability to interrogate the genome has advanced rapidly over recent years, it is thought that a lack of suitable regression tools prevents them from achieving their full potential. I have tried to investigate the area of regression in a methodical manner. In Chapter 1, I explain the regression problem and outline existing methods. I observe that both linear and nonlinear methods can be categorised according to the restrictions enforced by their underlying model assumptions and speculate that a method with as few restrictions as possible might prove more powerful. In order to design such a method, I begin by assuming each predictor is tertiary (takes no more than three distinct values). In Chapters 2 and 3, I propose the method Sparse Partitioning. Its name derives from the way it searches for high scoring partitions of the predictor set, where each partition defines groups of predictors that jointly contribute towards the response. A sparsity assumption supposes most predictors belong in the “null group” indicating they have no effect on the outcome. In Chapter 4, I compare the performance of Sparse Partitioning to existing methods using simulated and real data. The results highlight how greatly a method’s power depends on the validity of its model assumptions. For this reason, Sparse Partitioning appears to offer a robust alternative to current methods, as its lack of restrictions allows it to maintain power in scenarios where other methods will fail. Sparse Partitioning relies on Markov chain Monte Carlo estimation, which limits the size of problem on which it can be used. Therefore, in Chapter 5, I propose a deterministic version of the method which, although less powerful, is not affected by convergence issues. In Chapter 6, I describe Bayesian Projection Pursuit, which adds spline fitting into the method to cope with non-tertiary predictors.
URI: http://www.dspace.cam.ac.uk/handle/1810/241092
Appears in Collections:Theses - DAMTP

Files in This Item:

File Description SizeFormat
dspace_thesis_doug_speed.pdf11.77 MBAdobe PDFThumbnail
View/Open
Additional resources for this item
search for alternative versions in eresources@cambridge
retrieve citation metadata in EndNote format

This item is licensed under a Creative Commons License
Creative Commons

This item has been accessed 590 times.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.