1 | <tool id="MAF_To_Fasta1" name="MAF to FASTA" version="1.0.1">
|
---|
2 | <description>Converts a MAF formated file to FASTA format</description>
|
---|
3 | <command interpreter="python">
|
---|
4 | #if $fasta_target_type.fasta_type == "multiple" #maf_to_fasta_multiple_sets.py $input1 $out_file1 $fasta_target_type.species $fasta_target_type.complete_blocks
|
---|
5 | #else #maf_to_fasta_concat.py $fasta_target_type.species $input1 $out_file1
|
---|
6 | #end if#
|
---|
7 | </command>
|
---|
8 | <inputs>
|
---|
9 | <param format="maf" name="input1" type="data" label="MAF file to convert"/>
|
---|
10 | <conditional name="fasta_target_type">
|
---|
11 | <param name="fasta_type" type="select" label="Type of FASTA Output">
|
---|
12 | <option value="multiple" selected="true">Multiple Blocks</option>
|
---|
13 | <option value="concatenated">One Sequence per Species</option>
|
---|
14 | </param>
|
---|
15 | <when value="multiple">
|
---|
16 | <param name="species" type="select" label="Select species" display="checkboxes" multiple="true" help="checked taxa will be included in the output">
|
---|
17 | <options>
|
---|
18 | <filter type="data_meta" ref="input1" key="species" />
|
---|
19 | </options>
|
---|
20 | </param>
|
---|
21 | <param name="complete_blocks" type="select" label="Choose to">
|
---|
22 | <option value="partial_allowed">include blocks with missing species</option>
|
---|
23 | <option value="partial_disallowed">exclude blocks with missing species</option>
|
---|
24 | </param>
|
---|
25 | </when>
|
---|
26 | <when value="concatenated">
|
---|
27 | <param name="species" type="select" label="Species to extract" display="checkboxes" multiple="true">
|
---|
28 | <options>
|
---|
29 | <filter type="data_meta" ref="input1" key="species" />
|
---|
30 | </options>
|
---|
31 | </param>
|
---|
32 | </when>
|
---|
33 | </conditional>
|
---|
34 | </inputs>
|
---|
35 | <outputs>
|
---|
36 | <data format="fasta" name="out_file1" />
|
---|
37 | </outputs>
|
---|
38 | <tests>
|
---|
39 | <test>
|
---|
40 | <param name="input1" value="3.maf" ftype="maf"/>
|
---|
41 | <param name="species" value="canFam1"/>
|
---|
42 | <param name="fasta_type" value="concatenated"/>
|
---|
43 | <output name="out_file1" file="cf_maf2fasta_concat.dat" ftype="fasta"/>
|
---|
44 | </test>
|
---|
45 | <test>
|
---|
46 | <param name="input1" value="4.maf" ftype="maf"/>
|
---|
47 | <param name="species" value="hg17,panTro1,rheMac2,rn3,mm7,canFam2,bosTau2,dasNov1"/>
|
---|
48 | <param name="complete_blocks" value="partial_allowed"/>
|
---|
49 | <param name="fasta_type" value="multiple"/>
|
---|
50 | <output name="out_file1" file="cf_maf2fasta_new.dat" ftype="fasta"/>
|
---|
51 | </test>
|
---|
52 | </tests>
|
---|
53 | <help>
|
---|
54 |
|
---|
55 | **Types of MAF to FASTA conversion**
|
---|
56 |
|
---|
57 | * **Multiple Blocks** converts a single MAF block to a single FASTA block. For example, if you have 6 MAF blocks, they will be converted to 6 FASTA blocks.
|
---|
58 | * **One Sequence per Species** converts MAF blocks to a single aggregated FASTA block. For example, if you have 6 MAF blocks, they will be converted and concatenated into a single FASTA block.
|
---|
59 |
|
---|
60 | -------
|
---|
61 |
|
---|
62 | **What it does**
|
---|
63 |
|
---|
64 | This tool converts MAF blocks to FASTA format and concatenates them into a single FASTA block or outputs multiple FASTA blocks separated by empty lines.
|
---|
65 |
|
---|
66 | The interface for this tool contains two pages (steps):
|
---|
67 |
|
---|
68 | * **Step 1 of 2**. Choose multiple alignments from history to be converted to FASTA format.
|
---|
69 | * **Step 2 of 2**. Choose the type of output as well as the species from the alignment to be included in the output.
|
---|
70 |
|
---|
71 | Multiple Block output has additional options:
|
---|
72 |
|
---|
73 | * **Choose species** - the tool reads the alignment provided during Step 1 and generates a list of species contained within that alignment. Using checkboxes you can specify taxa to be included in the output (all species are selected by default).
|
---|
74 | * **Choose to include/exclude blocks with missing species** - if an alignment block does not contain any one of the species you selected within **Choose species** menu and this option is set to **exclude blocks with missing species**, then such a block **will not** be included in the output (see **Example 2** below). For example, if you want to extract human, mouse, and rat from a series of alignments and one of the blocks does not contain mouse sequence, then this block will not be converted to FASTA and will not be returned.
|
---|
75 |
|
---|
76 |
|
---|
77 | -----
|
---|
78 |
|
---|
79 | **Example 1**:
|
---|
80 |
|
---|
81 | In the concatenated approach, the following alignment::
|
---|
82 |
|
---|
83 | ##maf version=1
|
---|
84 | a score=68686.000000
|
---|
85 | s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
---|
86 | s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
---|
87 | s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
---|
88 | s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
---|
89 | s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
---|
90 |
|
---|
91 | a score=10289.000000
|
---|
92 | s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
93 | s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
94 | s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
95 |
|
---|
96 | will be converted to (**note** that because mm8 (mouse) and canFam2 (dog) are absent from the second block, they are replaced with gaps after concatenation)::
|
---|
97 |
|
---|
98 | >canFam2
|
---|
99 | CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C-------------------------------------
|
---|
100 | >hg18
|
---|
101 | GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
102 | >mm8
|
---|
103 | AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC--------------------------------------------
|
---|
104 | >panTro2
|
---|
105 | GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
106 | >rheMac2
|
---|
107 | GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
108 |
|
---|
109 | ------
|
---|
110 |
|
---|
111 | **Example 2a**: Multiple Block Approach **Include all species** and **include blocks with missing species**:
|
---|
112 |
|
---|
113 | The following alignment::
|
---|
114 |
|
---|
115 | ##maf version=1
|
---|
116 | a score=68686.000000
|
---|
117 | s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
---|
118 | s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
---|
119 | s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
---|
120 | s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
---|
121 | s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
---|
122 |
|
---|
123 | a score=10289.000000
|
---|
124 | s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
125 | s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
126 | s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
127 |
|
---|
128 | will be converted to::
|
---|
129 |
|
---|
130 | >hg18.chr20(+):56827368-56827443|hg18_0
|
---|
131 | GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
---|
132 | >panTro2.chr20(+):56528685-56528760|panTro2_0
|
---|
133 | GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
---|
134 | >rheMac2.chr10(-):89144112-89144181|rheMac2_0
|
---|
135 | GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
---|
136 | >mm8.chr2(+):173910832-173910893|mm8_0
|
---|
137 | AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
---|
138 | >canFam2.chr24(+):46551822-46551889|canFam2_0
|
---|
139 | CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
---|
140 |
|
---|
141 | >hg18.chr20(+):56827443-56827480|hg18_1
|
---|
142 | ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
143 | >panTro2.chr20(+):56528760-56528797|panTro2_1
|
---|
144 | ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
145 | >rheMac2.chr10(-):89144181-89144218|rheMac2_1
|
---|
146 | ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
147 |
|
---|
148 | -----
|
---|
149 |
|
---|
150 | **Example 2b**: Multiple Block Approach **Include hg18 and mm8** and **exclude blocks with missing species**:
|
---|
151 |
|
---|
152 | The following alignment::
|
---|
153 |
|
---|
154 | ##maf version=1
|
---|
155 | a score=68686.000000
|
---|
156 | s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
---|
157 | s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
---|
158 | s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
---|
159 | s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
---|
160 | s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
---|
161 |
|
---|
162 | a score=10289.000000
|
---|
163 | s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
164 | s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
165 | s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
---|
166 |
|
---|
167 | will be converted to (**note** that the second MAF block, which does not have mm8, is not included in the output)::
|
---|
168 |
|
---|
169 | >hg18.chr20(+):56827368-56827443|hg18_0
|
---|
170 | GACAGGGTGCATCTGGGAGGGCCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC
|
---|
171 | >mm8.chr2(+):173910832-173910893|mm8_0
|
---|
172 | AGAAGGATCCACCT---------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------
|
---|
173 |
|
---|
174 | ------
|
---|
175 |
|
---|
176 | .. class:: infomark
|
---|
177 |
|
---|
178 | **About formats**
|
---|
179 |
|
---|
180 | **MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes.
|
---|
181 |
|
---|
182 | - The .maf format is line-oriented. Each multiple alignment ends with a blank line.
|
---|
183 | - Each sequence in an alignment is on a single line.
|
---|
184 | - Lines starting with # are considered to be comments.
|
---|
185 | - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment.
|
---|
186 | - Some MAF files may contain two optional line types:
|
---|
187 |
|
---|
188 | - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line;
|
---|
189 | - An "e" line containing information about the size of the gap between the alignments that span the current block.
|
---|
190 |
|
---|
191 |
|
---|
192 | </help>
|
---|
193 | </tool>
|
---|