significant transcription factor binding sites from ChIP data pass_wrapper.sh "$input" "$min_window" "$max_window" "$false_num" "$output" pass sed **Dataset formats** The input is in GFF_ format, and the output is tabular_. (`Dataset missing?`_) .. _GFF: ./static/formatHelp.html#gff .. _tabular: ./static/formatHelp.html#tab .. _Dataset missing?: ./static/formatHelp.html ----- **What it does** PASS (Poisson Approximation for Statistical Significance) detects significant transcription factor binding sites in the genome from ChIP data. This is probably the only peak-calling method that accurately controls the false-positive rate and FDR in ChIP data, which is important given the huge discrepancy in results obtained from different peak-calling algorithms. At the same time, this method achieves a similar or better power than previous methods. ----- **Hints** - ChIP-Seq data: If the data is from ChIP-Seq, you need to convert the ChIP-Seq values into z-scores before using this program. It is also recommended that you group read counts within a neighborhood together, e.g. in tiled windows of 30bp. In this way, the ChIP-Seq data will resemble ChIP-chip data in format. - Choosing window size options: The window size is related to the probe tiling density. For example, if the probes are tiled at every 100bp, then setting the smallest window = 2 and largest window = 6 is appropriate, because the DNA fragment size is around 300-500bp. ----- **Example** - input file:: chr7 Nimblegen ID 40307603 40307652 1.668944 . . . chr7 Nimblegen ID 40307703 40307752 0.8041307 . . . chr7 Nimblegen ID 40307808 40307865 -1.089931 . . . chr7 Nimblegen ID 40307920 40307969 1.055044 . . . chr7 Nimblegen ID 40308005 40308068 2.447853 . . . chr7 Nimblegen ID 40308125 40308174 0.1638694 . . . chr7 Nimblegen ID 40308223 40308275 -0.04796628 . . . chr7 Nimblegen ID 40308318 40308367 0.9335709 . . . chr7 Nimblegen ID 40308526 40308584 0.5143972 . . . chr7 Nimblegen ID 40308611 40308660 -1.089931 . . . etc. In GFF, a value of dot '.' is used to mean "not applicable". - output file:: ID Chr Start End WinSz PeakValue # of FPs FDR 1 chr7 40310931 40311266 4 1.663446 0.248817 0.248817 ----- **References** Zhang Y. (2008) Poisson approximation for significance in genome-wide ChIP-chip tiling arrays. Bioinformatics. 24(24):2825-31. Epub 2008 Oct 25. Chen KB, Zhang Y. (2010) A varying threshold method for ChIP peak calling using multiple sources of information. Submitted.