[2] | 1 | <tool id="cshl_fasta_clipping_histogram" name="Length Distribution"> |
---|
| 2 | <description>chart</description> |
---|
| 3 | <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
|
---|
| 4 | <command>fasta_clipping_histogram.pl $input $outfile</command> |
---|
| 5 | |
---|
| 6 | <inputs> |
---|
| 7 | <param format="fasta" name="input" type="data" label="Library to analyze" /> |
---|
| 8 | </inputs> |
---|
| 9 | |
---|
| 10 | <outputs> |
---|
| 11 | <data format="png" name="outfile" metadata_source="input" /> |
---|
| 12 | </outputs> |
---|
| 13 | <help> |
---|
| 14 | |
---|
| 15 | **What it does** |
---|
| 16 | |
---|
| 17 | This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file. |
---|
| 18 | |
---|
| 19 | **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results. |
---|
| 20 | |
---|
| 21 | ----- |
---|
| 22 | |
---|
| 23 | **Output Examples** |
---|
| 24 | |
---|
| 25 | In the following library, most sequences are 24-mers to 27-mers. |
---|
| 26 | This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place). |
---|
| 27 | |
---|
| 28 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png |
---|
| 29 | |
---|
| 30 | |
---|
| 31 | In the following library, most sequences are 19,22 or 23-mers. |
---|
| 32 | This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place). |
---|
| 33 | |
---|
| 34 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png |
---|
| 35 | |
---|
| 36 | |
---|
| 37 | ----- |
---|
| 38 | |
---|
| 39 | |
---|
| 40 | **Input Formats** |
---|
| 41 | |
---|
| 42 | This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so:: |
---|
| 43 | |
---|
| 44 | >sequence1 |
---|
| 45 | AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG |
---|
| 46 | >sequence2 |
---|
| 47 | GTGTGTGTGGGAAGTTGACACAGTA |
---|
| 48 | >sequence3 |
---|
| 49 | CCTTGAGATTAACGCTAATCAAGTAAAC |
---|
| 50 | |
---|
| 51 | |
---|
| 52 | If the sequences span over multiple lines:: |
---|
| 53 | |
---|
| 54 | >sequence1 |
---|
| 55 | CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG |
---|
| 56 | TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG |
---|
| 57 | aactggtctttacctTTAAGTTG |
---|
| 58 | |
---|
| 59 | Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences:: |
---|
| 60 | |
---|
| 61 | >sequence1 |
---|
| 62 | CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG |
---|
| 63 | |
---|
| 64 | |
---|
| 65 | ----- |
---|
| 66 | |
---|
| 67 | |
---|
| 68 | |
---|
| 69 | **Multiplicity counts (a.k.a reads-count)** |
---|
| 70 | |
---|
| 71 | If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing). |
---|
| 72 | |
---|
| 73 | Example 1 - The following FASTA file *does not* have multiplicity counts:: |
---|
| 74 | |
---|
| 75 | >seq1 |
---|
| 76 | GGATCC |
---|
| 77 | >seq2 |
---|
| 78 | GGTCATGGGTTTAAA |
---|
| 79 | >seq3 |
---|
| 80 | GGGATATATCCCCACACACACACAC |
---|
| 81 | |
---|
| 82 | Each sequence is counts as one, to produce the following chart: |
---|
| 83 | |
---|
| 84 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png |
---|
| 85 | |
---|
| 86 | |
---|
| 87 | Example 2 - The following FASTA file have multiplicity counts:: |
---|
| 88 | |
---|
| 89 | >seq1-2 |
---|
| 90 | GGATCC |
---|
| 91 | >seq2-10 |
---|
| 92 | GGTCATGGGTTTAAA |
---|
| 93 | >seq3-3 |
---|
| 94 | GGGATATATCCCCACACACACACAC |
---|
| 95 | |
---|
| 96 | The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart: |
---|
| 97 | |
---|
| 98 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png |
---|
| 99 | |
---|
| 100 | Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts. |
---|
| 101 | |
---|
| 102 | </help> |
---|
| 103 | </tool> |
---|
| 104 | <!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> |
---|