| 1 | <tool id="cshl_fastx_collapser" name="Collapse"> |
|---|
| 2 | <description>sequences</description> |
|---|
| 3 | <requirements><requirement type="package">fastx_toolkit</requirement></requirements>
|
|---|
| 4 | <command>zcat -f '$input' | fastx_collapser -v -o '$output' </command> |
|---|
| 5 | |
|---|
| 6 | <inputs> |
|---|
| 7 | <param format="fastqsolexa,fasta" name="input" type="data" label="Library to collapse" /> |
|---|
| 8 | </inputs> |
|---|
| 9 | |
|---|
| 10 | <!-- The order of sequences in the test output differ between 32 bit and 64 bit machines. |
|---|
| 11 | <tests> |
|---|
| 12 | <test> |
|---|
| 13 | <param name="input" value="fasta_collapser1.fasta" /> |
|---|
| 14 | <output name="output" file="fasta_collapser1.out" /> |
|---|
| 15 | </test> |
|---|
| 16 | </tests> |
|---|
| 17 | --> |
|---|
| 18 | <outputs> |
|---|
| 19 | <data format="fasta" name="output" metadata_source="input" /> |
|---|
| 20 | </outputs> |
|---|
| 21 | <help> |
|---|
| 22 | |
|---|
| 23 | **What it does** |
|---|
| 24 | |
|---|
| 25 | This tool collapses identical sequences in a FASTA file into a single sequence. |
|---|
| 26 | |
|---|
| 27 | -------- |
|---|
| 28 | |
|---|
| 29 | **Example** |
|---|
| 30 | |
|---|
| 31 | Example Input File (Sequence "ATAT" appears multiple times):: |
|---|
| 32 | |
|---|
| 33 | >CSHL_2_FC0042AGLLOO_1_1_605_414 |
|---|
| 34 | TGCG |
|---|
| 35 | >CSHL_2_FC0042AGLLOO_1_1_537_759 |
|---|
| 36 | ATAT |
|---|
| 37 | >CSHL_2_FC0042AGLLOO_1_1_774_520 |
|---|
| 38 | TGGC |
|---|
| 39 | >CSHL_2_FC0042AGLLOO_1_1_742_502 |
|---|
| 40 | ATAT |
|---|
| 41 | >CSHL_2_FC0042AGLLOO_1_1_781_514 |
|---|
| 42 | TGAG |
|---|
| 43 | >CSHL_2_FC0042AGLLOO_1_1_757_487 |
|---|
| 44 | TTCA |
|---|
| 45 | >CSHL_2_FC0042AGLLOO_1_1_903_769 |
|---|
| 46 | ATAT |
|---|
| 47 | >CSHL_2_FC0042AGLLOO_1_1_724_499 |
|---|
| 48 | ATAT |
|---|
| 49 | |
|---|
| 50 | Example Output file:: |
|---|
| 51 | |
|---|
| 52 | >1-1 |
|---|
| 53 | TGCG |
|---|
| 54 | >2-4 |
|---|
| 55 | ATAT |
|---|
| 56 | >3-1 |
|---|
| 57 | TGGC |
|---|
| 58 | >4-1 |
|---|
| 59 | TGAG |
|---|
| 60 | >5-1 |
|---|
| 61 | TTCA |
|---|
| 62 | |
|---|
| 63 | .. class:: infomark |
|---|
| 64 | |
|---|
| 65 | Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. |
|---|
| 66 | |
|---|
| 67 | The output sequence name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value. |
|---|
| 68 | |
|---|
| 69 | The following output:: |
|---|
| 70 | |
|---|
| 71 | >2-4 |
|---|
| 72 | ATAT |
|---|
| 73 | |
|---|
| 74 | means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file. |
|---|
| 75 | |
|---|
| 76 | </help> |
|---|
| 77 | </tool> |
|---|