[2] | 1 | <tool id="cshl_fastx_collapser" name="Collapse"> |
---|
| 2 | <description>sequences</description> |
---|
| 3 | <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
|
---|
| 4 | <command>zcat -f '$input' | fastx_collapser -v -o '$output' </command> |
---|
| 5 | |
---|
| 6 | <inputs> |
---|
| 7 | <param format="fastqsolexa,fasta" name="input" type="data" label="Library to collapse" /> |
---|
| 8 | </inputs> |
---|
| 9 | |
---|
| 10 | <!-- The order of sequences in the test output differ between 32 bit and 64 bit machines. |
---|
| 11 | <tests> |
---|
| 12 | <test> |
---|
| 13 | <param name="input" value="fasta_collapser1.fasta" /> |
---|
| 14 | <output name="output" file="fasta_collapser1.out" /> |
---|
| 15 | </test> |
---|
| 16 | </tests> |
---|
| 17 | --> |
---|
| 18 | <outputs> |
---|
| 19 | <data format="fasta" name="output" metadata_source="input" /> |
---|
| 20 | </outputs> |
---|
| 21 | <help> |
---|
| 22 | |
---|
| 23 | **What it does** |
---|
| 24 | |
---|
| 25 | This tool collapses identical sequences in a FASTA file into a single sequence. |
---|
| 26 | |
---|
| 27 | -------- |
---|
| 28 | |
---|
| 29 | **Example** |
---|
| 30 | |
---|
| 31 | Example Input File (Sequence "ATAT" appears multiple times):: |
---|
| 32 | |
---|
| 33 | >CSHL_2_FC0042AGLLOO_1_1_605_414 |
---|
| 34 | TGCG |
---|
| 35 | >CSHL_2_FC0042AGLLOO_1_1_537_759 |
---|
| 36 | ATAT |
---|
| 37 | >CSHL_2_FC0042AGLLOO_1_1_774_520 |
---|
| 38 | TGGC |
---|
| 39 | >CSHL_2_FC0042AGLLOO_1_1_742_502 |
---|
| 40 | ATAT |
---|
| 41 | >CSHL_2_FC0042AGLLOO_1_1_781_514 |
---|
| 42 | TGAG |
---|
| 43 | >CSHL_2_FC0042AGLLOO_1_1_757_487 |
---|
| 44 | TTCA |
---|
| 45 | >CSHL_2_FC0042AGLLOO_1_1_903_769 |
---|
| 46 | ATAT |
---|
| 47 | >CSHL_2_FC0042AGLLOO_1_1_724_499 |
---|
| 48 | ATAT |
---|
| 49 | |
---|
| 50 | Example Output file:: |
---|
| 51 | |
---|
| 52 | >1-1 |
---|
| 53 | TGCG |
---|
| 54 | >2-4 |
---|
| 55 | ATAT |
---|
| 56 | >3-1 |
---|
| 57 | TGGC |
---|
| 58 | >4-1 |
---|
| 59 | TGAG |
---|
| 60 | >5-1 |
---|
| 61 | TTCA |
---|
| 62 | |
---|
| 63 | .. class:: infomark |
---|
| 64 | |
---|
| 65 | Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. |
---|
| 66 | |
---|
| 67 | The output sequence name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value. |
---|
| 68 | |
---|
| 69 | The following output:: |
---|
| 70 | |
---|
| 71 | >2-4 |
---|
| 72 | ATAT |
---|
| 73 | |
---|
| 74 | means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file. |
---|
| 75 | |
---|
| 76 | </help> |
---|
| 77 | </tool> |
---|