1 | <tool id="cshl_fastx_quality_statistics" name="Compute quality statistics"> |
---|
2 | <description></description> |
---|
3 | <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
|
---|
4 | <command>zcat -f $input | fastx_quality_stats -o $output -Q 33</command> |
---|
5 | |
---|
6 | <inputs> |
---|
7 | <param format="fastqsanger" name="input" type="data" label="Library to analyse" /> |
---|
8 | </inputs> |
---|
9 | |
---|
10 | <tests> |
---|
11 | <test> |
---|
12 | <param name="input" value="fastq_stats1.fastq" ftype="fastqsanger"/> |
---|
13 | <output name="output" file="fastq_stats1.out" /> |
---|
14 | </test> |
---|
15 | </tests> |
---|
16 | |
---|
17 | <outputs> |
---|
18 | <data format="txt" name="output" metadata_source="input" /> |
---|
19 | </outputs> |
---|
20 | |
---|
21 | <help> |
---|
22 | |
---|
23 | **What it does** |
---|
24 | |
---|
25 | Creates quality statistics report for the given Solexa/FASTQ library. |
---|
26 | |
---|
27 | .. class:: infomark |
---|
28 | |
---|
29 | **TIP:** This statistics report can be used as input for **Quality Score** and **Nucleotides Distribution** tools. |
---|
30 | |
---|
31 | ----- |
---|
32 | |
---|
33 | **The output file will contain the following fields:** |
---|
34 | |
---|
35 | * column = column number (1 to 36 for a 36-cycles read Solexa file) |
---|
36 | * count = number of bases found in this column. |
---|
37 | * min = Lowest quality score value found in this column. |
---|
38 | * max = Highest quality score value found in this column. |
---|
39 | * sum = Sum of quality score values for this column. |
---|
40 | * mean = Mean quality score value for this column. |
---|
41 | * Q1 = 1st quartile quality score. |
---|
42 | * med = Median quality score. |
---|
43 | * Q3 = 3rd quartile quality score. |
---|
44 | * IQR = Inter-Quartile range (Q3-Q1). |
---|
45 | * lW = 'Left-Whisker' value (for boxplotting). |
---|
46 | * rW = 'Right-Whisker' value (for boxplotting). |
---|
47 | * A_Count = Count of 'A' nucleotides found in this column. |
---|
48 | * C_Count = Count of 'C' nucleotides found in this column. |
---|
49 | * G_Count = Count of 'G' nucleotides found in this column. |
---|
50 | * T_Count = Count of 'T' nucleotides found in this column. |
---|
51 | * N_Count = Count of 'N' nucleotides found in this column. |
---|
52 | |
---|
53 | |
---|
54 | For example:: |
---|
55 | |
---|
56 | 1 6362991 -4 40 250734117 39.41 40 40 40 0 40 40 1396976 1329101 678730 2958184 0 |
---|
57 | 2 6362991 -5 40 250531036 39.37 40 40 40 0 40 40 1786786 1055766 1738025 1782414 0 |
---|
58 | 3 6362991 -5 40 248722469 39.09 40 40 40 0 40 40 2296384 984875 1443989 1637743 0 |
---|
59 | 4 6362991 -4 40 248214827 39.01 40 40 40 0 40 40 2536861 1167423 1248968 1409739 0 |
---|
60 | 36 6362991 -5 40 117158566 18.41 7 15 30 23 -5 40 4074444 1402980 63287 822035 245 |
---|
61 | |
---|
62 | ------ |
---|
63 | |
---|
64 | This tool is based on `FASTX-toolkit`__ by Assaf Gordon. |
---|
65 | |
---|
66 | .. __: http://hannonlab.cshl.edu/fastx_toolkit/ |
---|
67 | |
---|
68 | </help> |
---|
69 | </tool> |
---|
70 | <!-- FASTQ-Statistics is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> |
---|