significant single-SNP associations in case-control studies
gpass.pl ${input1.extra_files_path}/${input1.metadata.base_name}.map ${input1.extra_files_path}/${input1.metadata.base_name}.ped $output $fdr
gpass
**Dataset formats**
The input dataset must be in lped_ format, and the output is tabular_.
(`Dataset missing?`_)
.. _lped: ./static/formatHelp.html#lped
.. _tabular: ./static/formatHelp.html#tab
.. _Dataset missing?: ./static/formatHelp.html
-----
**What it does**
GPASS (Genome-wide Poisson Approximation for Statistical Significance)
detects significant single-SNP associations in case-control studies at a user-specified FDR. Unlike previous methods, this tool can accurately approximate the genome-wide significance and FDR of SNP associations, while adjusting for millions of multiple comparisons, within seconds or minutes.
The program has two main functionalities:
1. Detect significant single-SNP associations at a user-specified false
discovery rate (FDR).
*Note*: a "typical" definition of FDR could be
FDR = E(# of false positive SNPs / # of significant SNPs)
This definition however is very inappropriate for association mapping, since SNPs are
highly correlated. Our FDR is
defined differently to account for SNP correlations, and thus will obtain
a proper FDR in terms of "proportion of false positive loci".
2. Approximate the significance of a list of candidate SNPs, adjusting for
multiple comparisons. If you have isolated a few SNPs of interest and want
to know their significance in a GWAS, you can supply the GWAS data and let
the program specifically test those SNPs.
*Also note*: the number of SNPs in a study cannot be both too small and at the same
time too clustered in a local region. A few hundreds of SNPs, or tens of SNPs
spread in different regions, will be fine. The sample size cannot be too small
either; around 100 or more individuals (case + control combined) will be fine.
Otherwise use permutation.
-----
**Example**
- input map file::
1 rs0 0 738547
1 rs1 0 5597094
1 rs2 0 9424115
etc.
- input ped file::
1 1 0 0 1 1 G G A A A A A A A A A G A A G G G G A A G G G G G G A A A A A G A A G G A G A G A A G G A A G G A A G G A G A A G G A A G G A A A G A G G G A G G G G G A A A G A A G G G G G G G G A G A A A A A A A A
1 1 0 0 1 1 G G A G G G A A A A A G A A G G G G G G A A G G A G A G G G G G A G G G A G A A G G A G G G A A G G G G A G A G G G A G A A A A G G G G A G A G G G A G A A A A A G G G A G G G A G G G G G A A G G A G
etc.
- output dataset, showing significant SNPs and their p-values and FDR::
#ID chr position Statistics adj-Pvalue FDR
rs35 chr1 136606952 4.890849 0.991562 0.682138
rs36 chr1 137748344 4.931934 0.991562 0.795827
rs44 chr2 14423047 7.712832 0.665086 0.218776
etc.
-----
**Reference**
Zhang Y, Liu JS. (2010)
Fast and accurate significance approximation for genome-wide association studies.
Submitted.