Repository logo
 

On the use of resampling tests for evaluating statistical significance of binding-site co-occurrence.


Change log

Authors

Huen, David S 
Russell, Steven 

Abstract

BACKGROUND: In eukaryotes, most DNA-binding proteins exert their action as members of large effector complexes. The presence of these complexes are revealed in high-throughput genome-wide assays by the co-occurrence of the binding sites of different complex components. Resampling tests are one route by which the statistical significance of apparent co-occurrence can be assessed. RESULTS: We have investigated two resampling approaches for evaluating the statistical significance of binding-site co-occurrence. The permutation test approach was found to yield overly favourable p-values while the independent resampling approach had the opposite effect and is of little use in practical terms. We have developed a new, pragmatically-devised hybrid approach that, when applied to the experimental results of an Polycomb/Trithorax study, yielded p-values consistent with the findings of that study. We extended our investigations to the FL method developed by Haiminen et al, which derives its null distribution from all binding sites within a dataset, and show that the p-value computed for a pair of factors by this method can depend on which other factors are included in that dataset. Both our hybrid method and the FL method appeared to yield plausible estimates of the statistical significance of co-occurrences although our hybrid method was more conservative when applied to the Polycomb/Trithorax dataset.A high-performance parallelized implementation of the hybrid method is available. CONCLUSIONS: We propose a new resampling-based co-occurrence significance test and demonstrate that it performs as well as or better than existing methods on a large experimentally-derived dataset. We believe it can be usefully applied to data from high-throughput genome-wide techniques such as ChIP-chip or DamID. The Cooccur package, which implements our approach, accompanies this paper.

Description

Keywords

Animals, Binding Sites, DNA-Binding Proteins, Drosophila, Genome-Wide Association Study, Oligonucleotide Array Sequence Analysis

Journal Title

BMC Bioinformatics

Conference Name

Journal ISSN

1471-2105
1471-2105

Volume Title

Publisher

Springer Science and Business Media LLC
Sponsorship
Biotechnology and Biological Sciences Research Council (BB/E015492/1)
Medical Research Council (G8225539)
BBSRC (G18877)