[2] | 1 | <tool id="cshl_fastq_quality_filter" name="Filter by quality"> |
---|
| 2 | <description></description> |
---|
| 3 | <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
|
---|
| 4 | |
---|
| 5 | <command>zcat -f '$input' | fastq_quality_filter -q $quality -p $percent -v -o $output</command> |
---|
| 6 | |
---|
| 7 | <inputs> |
---|
| 8 | <param format="fastqsolexa,fastqsanger" name="input" type="data" label="Library to filter" /> |
---|
| 9 | |
---|
| 10 | <param name="quality" size="4" type="integer" value="20"> |
---|
| 11 | <label>Quality cut-off value</label> |
---|
| 12 | </param> |
---|
| 13 | |
---|
| 14 | <param name="percent" size="4" type="integer" value="90"> |
---|
| 15 | <label>Percent of bases in sequence that must have quality equal to / higher than cut-off value</label> |
---|
| 16 | </param> |
---|
| 17 | </inputs> |
---|
| 18 | |
---|
| 19 | <tests> |
---|
| 20 | <test> |
---|
| 21 | <!-- Test1: 100% of bases with quality 33 or higher (pretty steep requirement...) --> |
---|
| 22 | <param name="input" value="fastq_qual_filter1.fastq" /> |
---|
| 23 | <param name="quality" value="33"/> |
---|
| 24 | <param name="percent" value="100"/> |
---|
| 25 | <output name="output" file="fastq_qual_filter1a.out" /> |
---|
| 26 | </test> |
---|
| 27 | <test> |
---|
| 28 | <!-- Test2: 80% of bases with quality 20 or higher --> |
---|
| 29 | <param name="input" value="fastq_qual_filter1.fastq" /> |
---|
| 30 | <param name="quality" value="20"/> |
---|
| 31 | <param name="percent" value="80"/> |
---|
| 32 | <output name="output" file="fastq_qual_filter1b.out" /> |
---|
| 33 | </test> |
---|
| 34 | </tests> |
---|
| 35 | |
---|
| 36 | <outputs> |
---|
| 37 | <data format="input" name="output" metadata_source="input" /> |
---|
| 38 | </outputs> |
---|
| 39 | |
---|
| 40 | <help> |
---|
| 41 | **What it does** |
---|
| 42 | |
---|
| 43 | This tool filters reads based on quality scores. |
---|
| 44 | |
---|
| 45 | .. class:: infomark |
---|
| 46 | |
---|
| 47 | Using **percent = 100** requires all cycles of all reads to be at least the quality cut-off value. |
---|
| 48 | |
---|
| 49 | .. class:: infomark |
---|
| 50 | |
---|
| 51 | Using **percent = 50** requires the median quality of the cycles (in each read) to be at least the quality cut-off value. |
---|
| 52 | |
---|
| 53 | -------- |
---|
| 54 | |
---|
| 55 | Quality score distribution (of all cycles) is calculated for each read. If it is lower than the quality cut-off value - the read is discarded. |
---|
| 56 | |
---|
| 57 | |
---|
| 58 | **Example**:: |
---|
| 59 | |
---|
| 60 | @CSHL_4_FC042AGOOII:1:2:214:584 |
---|
| 61 | GACAATAAAC |
---|
| 62 | +CSHL_4_FC042AGOOII:1:2:214:584 |
---|
| 63 | 30 30 30 30 30 30 30 30 20 10 |
---|
| 64 | |
---|
| 65 | Using **percent = 50** and **cut-off = 30** - This read will not be discarded (the median quality is higher than 30). |
---|
| 66 | |
---|
| 67 | Using **percent = 90** and **cut-off = 30** - This read will be discarded (90% of the cycles do no have quality equal to / higher than 30). |
---|
| 68 | |
---|
| 69 | Using **percent = 100** and **cut-off = 20** - This read will be discarded (not all cycles have quality equal to / higher than 20). |
---|
| 70 | |
---|
| 71 | ------ |
---|
| 72 | |
---|
| 73 | This tool is based on `FASTX-toolkit`__ by Assaf Gordon. |
---|
| 74 | |
---|
| 75 | .. __: http://hannonlab.cshl.edu/fastx_toolkit/ |
---|
| 76 | </help> |
---|
| 77 | </tool> |
---|
| 78 | <!-- FASTQ-Quality-Filter is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> |
---|