significant transcription factor binding sites from ChIP data
pass_wrapper.sh "$input" "$min_window" "$max_window" "$false_num" "$output"
pass
sed
**Dataset formats**
The input is in GFF_ format, and the output is tabular_.
(`Dataset missing?`_)
.. _GFF: ./static/formatHelp.html#gff
.. _tabular: ./static/formatHelp.html#tab
.. _Dataset missing?: ./static/formatHelp.html
-----
**What it does**
PASS (Poisson Approximation for Statistical Significance) detects
significant transcription factor binding sites in the genome from
ChIP data. This is probably the only peak-calling method that
accurately controls the false-positive rate and FDR in ChIP data,
which is important given the huge discrepancy in results obtained
from different peak-calling algorithms. At the same time, this
method achieves a similar or better power than previous methods.
-----
**Hints**
- ChIP-Seq data:
If the data is from ChIP-Seq, you need to convert the ChIP-Seq values
into z-scores before using this program. It is also recommended that
you group read counts within a neighborhood together, e.g. in tiled
windows of 30bp. In this way, the ChIP-Seq data will resemble
ChIP-chip data in format.
- Choosing window size options:
The window size is related to the probe tiling density. For example,
if the probes are tiled at every 100bp, then setting the smallest
window = 2 and largest window = 6 is appropriate, because the DNA
fragment size is around 300-500bp.
-----
**Example**
- input file::
chr7 Nimblegen ID 40307603 40307652 1.668944 . . .
chr7 Nimblegen ID 40307703 40307752 0.8041307 . . .
chr7 Nimblegen ID 40307808 40307865 -1.089931 . . .
chr7 Nimblegen ID 40307920 40307969 1.055044 . . .
chr7 Nimblegen ID 40308005 40308068 2.447853 . . .
chr7 Nimblegen ID 40308125 40308174 0.1638694 . . .
chr7 Nimblegen ID 40308223 40308275 -0.04796628 . . .
chr7 Nimblegen ID 40308318 40308367 0.9335709 . . .
chr7 Nimblegen ID 40308526 40308584 0.5143972 . . .
chr7 Nimblegen ID 40308611 40308660 -1.089931 . . .
etc.
In GFF, a value of dot '.' is used to mean "not applicable".
- output file::
ID Chr Start End WinSz PeakValue # of FPs FDR
1 chr7 40310931 40311266 4 1.663446 0.248817 0.248817
-----
**References**
Zhang Y. (2008)
Poisson approximation for significance in genome-wide ChIP-chip tiling arrays.
Bioinformatics. 24(24):2825-31. Epub 2008 Oct 25.
Chen KB, Zhang Y. (2010)
A varying threshold method for ChIP peak calling using multiple sources of information.
Submitted.