| 1 | <tool id="cshl_fasta_clipping_histogram" name="Length Distribution"> |
|---|
| 2 | <description>chart</description> |
|---|
| 3 | <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
|
|---|
| 4 | <command>fasta_clipping_histogram.pl $input $outfile</command> |
|---|
| 5 | |
|---|
| 6 | <inputs> |
|---|
| 7 | <param format="fasta" name="input" type="data" label="Library to analyze" /> |
|---|
| 8 | </inputs> |
|---|
| 9 | |
|---|
| 10 | <outputs> |
|---|
| 11 | <data format="png" name="outfile" metadata_source="input" /> |
|---|
| 12 | </outputs> |
|---|
| 13 | <help> |
|---|
| 14 | |
|---|
| 15 | **What it does** |
|---|
| 16 | |
|---|
| 17 | This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file. |
|---|
| 18 | |
|---|
| 19 | **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results. |
|---|
| 20 | |
|---|
| 21 | ----- |
|---|
| 22 | |
|---|
| 23 | **Output Examples** |
|---|
| 24 | |
|---|
| 25 | In the following library, most sequences are 24-mers to 27-mers. |
|---|
| 26 | This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place). |
|---|
| 27 | |
|---|
| 28 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png |
|---|
| 29 | |
|---|
| 30 | |
|---|
| 31 | In the following library, most sequences are 19,22 or 23-mers. |
|---|
| 32 | This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place). |
|---|
| 33 | |
|---|
| 34 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png |
|---|
| 35 | |
|---|
| 36 | |
|---|
| 37 | ----- |
|---|
| 38 | |
|---|
| 39 | |
|---|
| 40 | **Input Formats** |
|---|
| 41 | |
|---|
| 42 | This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so:: |
|---|
| 43 | |
|---|
| 44 | >sequence1 |
|---|
| 45 | AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG |
|---|
| 46 | >sequence2 |
|---|
| 47 | GTGTGTGTGGGAAGTTGACACAGTA |
|---|
| 48 | >sequence3 |
|---|
| 49 | CCTTGAGATTAACGCTAATCAAGTAAAC |
|---|
| 50 | |
|---|
| 51 | |
|---|
| 52 | If the sequences span over multiple lines:: |
|---|
| 53 | |
|---|
| 54 | >sequence1 |
|---|
| 55 | CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG |
|---|
| 56 | TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG |
|---|
| 57 | aactggtctttacctTTAAGTTG |
|---|
| 58 | |
|---|
| 59 | Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences:: |
|---|
| 60 | |
|---|
| 61 | >sequence1 |
|---|
| 62 | CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG |
|---|
| 63 | |
|---|
| 64 | |
|---|
| 65 | ----- |
|---|
| 66 | |
|---|
| 67 | |
|---|
| 68 | |
|---|
| 69 | **Multiplicity counts (a.k.a reads-count)** |
|---|
| 70 | |
|---|
| 71 | If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing). |
|---|
| 72 | |
|---|
| 73 | Example 1 - The following FASTA file *does not* have multiplicity counts:: |
|---|
| 74 | |
|---|
| 75 | >seq1 |
|---|
| 76 | GGATCC |
|---|
| 77 | >seq2 |
|---|
| 78 | GGTCATGGGTTTAAA |
|---|
| 79 | >seq3 |
|---|
| 80 | GGGATATATCCCCACACACACACAC |
|---|
| 81 | |
|---|
| 82 | Each sequence is counts as one, to produce the following chart: |
|---|
| 83 | |
|---|
| 84 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png |
|---|
| 85 | |
|---|
| 86 | |
|---|
| 87 | Example 2 - The following FASTA file have multiplicity counts:: |
|---|
| 88 | |
|---|
| 89 | >seq1-2 |
|---|
| 90 | GGATCC |
|---|
| 91 | >seq2-10 |
|---|
| 92 | GGTCATGGGTTTAAA |
|---|
| 93 | >seq3-3 |
|---|
| 94 | GGGATATATCCCCACACACACACAC |
|---|
| 95 | |
|---|
| 96 | The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart: |
|---|
| 97 | |
|---|
| 98 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png |
|---|
| 99 | |
|---|
| 100 | Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts. |
|---|
| 101 | |
|---|
| 102 | </help> |
|---|
| 103 | </tool> |
|---|
| 104 | <!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> |
|---|