[2] | 1 | <tool id="fastq_stats" name="FASTQ Summary Statistics" version="1.0.0"> |
---|
| 2 | <description>by column</description> |
---|
| 3 | <command interpreter="python">fastq_stats.py '$input_file' '$output_file' '${input_file.extension[len( 'fastq' ):]}'</command> |
---|
| 4 | <inputs> |
---|
| 5 | <param name="input_file" type="data" format="fastqsanger,fastqillumina,fastqsolexa,fastqcssanger" label="FASTQ File"/> |
---|
| 6 | </inputs> |
---|
| 7 | <outputs> |
---|
| 8 | <data name="output_file" format="tabular" /> |
---|
| 9 | </outputs> |
---|
| 10 | <tests> |
---|
| 11 | <test> |
---|
| 12 | <param name="input_file" value="fastq_stats1.fastq" ftype="fastqsanger" /> |
---|
| 13 | <output name="output_file" file="fastq_stats_1_out.tabular" /> |
---|
| 14 | </test> |
---|
| 15 | </tests> |
---|
| 16 | <help> |
---|
| 17 | This tool creates summary statistics on a FASTQ file. |
---|
| 18 | |
---|
| 19 | .. class:: infomark |
---|
| 20 | |
---|
| 21 | **TIP:** This statistics report can be used as input for the **Boxplot** and **Nucleotides Distribution** tools. |
---|
| 22 | |
---|
| 23 | ----- |
---|
| 24 | |
---|
| 25 | **The output file will contain the following fields:** |
---|
| 26 | |
---|
| 27 | * column = column number (1 to 36 for a 36-cycles read Solexa file) |
---|
| 28 | * count = number of bases found in this column. |
---|
| 29 | * min = Lowest quality score value found in this column. |
---|
| 30 | * max = Highest quality score value found in this column. |
---|
| 31 | * sum = Sum of quality score values for this column. |
---|
| 32 | * mean = Mean quality score value for this column. |
---|
| 33 | * Q1 = 1st quartile quality score. |
---|
| 34 | * med = Median quality score. |
---|
| 35 | * Q3 = 3rd quartile quality score. |
---|
| 36 | * IQR = Inter-Quartile range (Q3-Q1). |
---|
| 37 | * lW = 'Left-Whisker' value (for boxplotting). |
---|
| 38 | * rW = 'Right-Whisker' value (for boxplotting). |
---|
| 39 | * outliers = Scores falling beyond the left and right whiskers (comma separated list). |
---|
| 40 | * A_Count = Count of 'A' nucleotides found in this column. |
---|
| 41 | * C_Count = Count of 'C' nucleotides found in this column. |
---|
| 42 | * G_Count = Count of 'G' nucleotides found in this column. |
---|
| 43 | * T_Count = Count of 'T' nucleotides found in this column. |
---|
| 44 | * N_Count = Count of 'N' nucleotides found in this column. |
---|
| 45 | * Other_Nucs = Comma separated list of other nucleotides found in this column. |
---|
| 46 | * Other_Count = Comma separated count of other nucleotides found in this column. |
---|
| 47 | |
---|
| 48 | For example:: |
---|
| 49 | |
---|
| 50 | #column count min max sum mean Q1 med Q3 IQR lW rW outliers A_Count C_Count G_Count T_Count N_Count other_bases other_base_count |
---|
| 51 | 1 14336356 2 33 450600675 31.4306281875 32.0 33.0 33.0 1.0 31 33 2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30 4482314 2199633 4425957 3208745 19707 |
---|
| 52 | 2 14336356 2 34 441135033 30.7703737965 30.0 33.0 33.0 3.0 26 34 2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25 4419184 2170537 4627987 3118567 81 |
---|
| 53 | 3 14336356 2 34 433659182 30.2489127642 29.0 32.0 33.0 4.0 23 34 2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22 4310988 2941988 3437467 3645784 129 |
---|
| 54 | 4 14336356 2 34 433635331 30.2472490917 29.0 32.0 33.0 4.0 23 34 2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22 4110637 3007028 3671749 3546839 103 |
---|
| 55 | 5 14336356 2 34 432498583 30.167957813 29.0 32.0 33.0 4.0 23 34 2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22 4348275 2935903 3293025 3759029 124 |
---|
| 56 | |
---|
| 57 | ----- |
---|
| 58 | |
---|
| 59 | .. class:: warningmark |
---|
| 60 | |
---|
| 61 | Adapter bases in color space reads are excluded from statistics. |
---|
| 62 | |
---|
| 63 | </help> |
---|
| 64 | </tool> |
---|