Catch me if you can: signal localization with knockoff e-values

  • 0Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, CA 94305-4020  USA.
Journal of the Royal Statistical Society. Series B, Statistical Methodology +

|

|

Summary

This summary is machine-generated.

This study introduces a new method for precise hypothesis testing with false discovery rate (FDR) control, especially useful in genomics. It enables adaptive discovery across multiple resolutions, ensuring reliable findings in complex data analyses.

Area Of Science

  • Statistical genetics
  • Bioinformatics
  • Computational biology

Background

  • Testing numerous, often redundant, hypotheses is common in scientific research, particularly in genome-wide association studies (GWAS).
  • Researchers aim to identify precise rejections while controlling the false discovery rate (FDR), especially when analyzing data at multiple resolutions.
  • Existing methods struggle to provide FDR control for adaptive searches across varying resolution levels.

Purpose Of The Study

  • To develop a multiple comparison procedure that allows for adaptive selection of hypothesis testing resolution while maintaining FDR control.
  • To adapt existing methods, leveraging e-values and linear programming, for problems involving individual and group hypotheses.
  • To address the challenge of assuring FDR control in adaptive searches where signal strength varies.

Main Methods

  • Utilized e-values and linear programming to design a novel multiple comparison procedure.
  • Adapted the approach for scenarios where knockoffs and group knockoffs are applicable for testing conditional independence.
  • Applied the developed method to analyze real-world data from the UK Biobank.

Main Results

  • The proposed method successfully enables adaptive choice of resolution for hypothesis testing.
  • Demonstrated the efficacy of the procedure in controlling false discoveries across different analysis resolutions.
  • The approach proved effective when applied to complex genetic association data.

Conclusions

  • The developed procedure offers a robust solution for FDR-controlled hypothesis testing in adaptive search scenarios.
  • This method enhances the ability to report precise discoveries in fields like genomics where signal resolution varies.
  • The findings pave the way for more reliable and interpretable results in large-scale biological data analysis.