1 | <tool id="cshl_fastx_collapser" name="Collapse"> |
---|
2 | <description>sequences</description> |
---|
3 | <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
|
---|
4 | <command>zcat -f '$input' | fastx_collapser -v -o '$output' </command> |
---|
5 | |
---|
6 | <inputs> |
---|
7 | <param format="fastqsolexa,fasta" name="input" type="data" label="Library to collapse" /> |
---|
8 | </inputs> |
---|
9 | |
---|
10 | <!-- The order of sequences in the test output differ between 32 bit and 64 bit machines. |
---|
11 | <tests> |
---|
12 | <test> |
---|
13 | <param name="input" value="fasta_collapser1.fasta" /> |
---|
14 | <output name="output" file="fasta_collapser1.out" /> |
---|
15 | </test> |
---|
16 | </tests> |
---|
17 | --> |
---|
18 | <outputs> |
---|
19 | <data format="fasta" name="output" metadata_source="input" /> |
---|
20 | </outputs> |
---|
21 | <help> |
---|
22 | |
---|
23 | **What it does** |
---|
24 | |
---|
25 | This tool collapses identical sequences in a FASTA file into a single sequence. |
---|
26 | |
---|
27 | -------- |
---|
28 | |
---|
29 | **Example** |
---|
30 | |
---|
31 | Example Input File (Sequence "ATAT" appears multiple times):: |
---|
32 | |
---|
33 | >CSHL_2_FC0042AGLLOO_1_1_605_414 |
---|
34 | TGCG |
---|
35 | >CSHL_2_FC0042AGLLOO_1_1_537_759 |
---|
36 | ATAT |
---|
37 | >CSHL_2_FC0042AGLLOO_1_1_774_520 |
---|
38 | TGGC |
---|
39 | >CSHL_2_FC0042AGLLOO_1_1_742_502 |
---|
40 | ATAT |
---|
41 | >CSHL_2_FC0042AGLLOO_1_1_781_514 |
---|
42 | TGAG |
---|
43 | >CSHL_2_FC0042AGLLOO_1_1_757_487 |
---|
44 | TTCA |
---|
45 | >CSHL_2_FC0042AGLLOO_1_1_903_769 |
---|
46 | ATAT |
---|
47 | >CSHL_2_FC0042AGLLOO_1_1_724_499 |
---|
48 | ATAT |
---|
49 | |
---|
50 | Example Output file:: |
---|
51 | |
---|
52 | >1-1 |
---|
53 | TGCG |
---|
54 | >2-4 |
---|
55 | ATAT |
---|
56 | >3-1 |
---|
57 | TGGC |
---|
58 | >4-1 |
---|
59 | TGAG |
---|
60 | >5-1 |
---|
61 | TTCA |
---|
62 | |
---|
63 | .. class:: infomark |
---|
64 | |
---|
65 | Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. |
---|
66 | |
---|
67 | The output sequence name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value. |
---|
68 | |
---|
69 | The following output:: |
---|
70 | |
---|
71 | >2-4 |
---|
72 | ATAT |
---|
73 | |
---|
74 | means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file. |
---|
75 | |
---|
76 | </help> |
---|
77 | </tool> |
---|