1 | <tool id="cshl_fasta_clipping_histogram" name="Length Distribution"> |
---|
2 | <description>chart</description> |
---|
3 | <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
|
---|
4 | <command>fasta_clipping_histogram.pl $input $outfile</command> |
---|
5 | |
---|
6 | <inputs> |
---|
7 | <param format="fasta" name="input" type="data" label="Library to analyze" /> |
---|
8 | </inputs> |
---|
9 | |
---|
10 | <outputs> |
---|
11 | <data format="png" name="outfile" metadata_source="input" /> |
---|
12 | </outputs> |
---|
13 | <help> |
---|
14 | |
---|
15 | **What it does** |
---|
16 | |
---|
17 | This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file. |
---|
18 | |
---|
19 | **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results. |
---|
20 | |
---|
21 | ----- |
---|
22 | |
---|
23 | **Output Examples** |
---|
24 | |
---|
25 | In the following library, most sequences are 24-mers to 27-mers. |
---|
26 | This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place). |
---|
27 | |
---|
28 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png |
---|
29 | |
---|
30 | |
---|
31 | In the following library, most sequences are 19,22 or 23-mers. |
---|
32 | This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place). |
---|
33 | |
---|
34 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png |
---|
35 | |
---|
36 | |
---|
37 | ----- |
---|
38 | |
---|
39 | |
---|
40 | **Input Formats** |
---|
41 | |
---|
42 | This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so:: |
---|
43 | |
---|
44 | >sequence1 |
---|
45 | AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG |
---|
46 | >sequence2 |
---|
47 | GTGTGTGTGGGAAGTTGACACAGTA |
---|
48 | >sequence3 |
---|
49 | CCTTGAGATTAACGCTAATCAAGTAAAC |
---|
50 | |
---|
51 | |
---|
52 | If the sequences span over multiple lines:: |
---|
53 | |
---|
54 | >sequence1 |
---|
55 | CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG |
---|
56 | TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG |
---|
57 | aactggtctttacctTTAAGTTG |
---|
58 | |
---|
59 | Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences:: |
---|
60 | |
---|
61 | >sequence1 |
---|
62 | CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG |
---|
63 | |
---|
64 | |
---|
65 | ----- |
---|
66 | |
---|
67 | |
---|
68 | |
---|
69 | **Multiplicity counts (a.k.a reads-count)** |
---|
70 | |
---|
71 | If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing). |
---|
72 | |
---|
73 | Example 1 - The following FASTA file *does not* have multiplicity counts:: |
---|
74 | |
---|
75 | >seq1 |
---|
76 | GGATCC |
---|
77 | >seq2 |
---|
78 | GGTCATGGGTTTAAA |
---|
79 | >seq3 |
---|
80 | GGGATATATCCCCACACACACACAC |
---|
81 | |
---|
82 | Each sequence is counts as one, to produce the following chart: |
---|
83 | |
---|
84 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png |
---|
85 | |
---|
86 | |
---|
87 | Example 2 - The following FASTA file have multiplicity counts:: |
---|
88 | |
---|
89 | >seq1-2 |
---|
90 | GGATCC |
---|
91 | >seq2-10 |
---|
92 | GGTCATGGGTTTAAA |
---|
93 | >seq3-3 |
---|
94 | GGGATATATCCCCACACACACACAC |
---|
95 | |
---|
96 | The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart: |
---|
97 | |
---|
98 | .. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png |
---|
99 | |
---|
100 | Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts. |
---|
101 | |
---|
102 | </help> |
---|
103 | </tool> |
---|
104 | <!-- FASTA-Clipping-Histogram is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> |
---|