1 | <tool id="MAF_To_BED1" name="Maf to BED" force_history_refresh="True"> |
---|
2 | <description>Converts a MAF formated file to the BED format</description> |
---|
3 | <command interpreter="python">maf_to_bed.py $input1 $out_file1 $species $complete_blocks $__new_file_path__</command> |
---|
4 | <inputs> |
---|
5 | <param format="maf" name="input1" type="data" label="MAF file to convert"/> |
---|
6 | <param name="species" type="select" label="Select species" display="checkboxes" multiple="true" help="a separate history item will be created for each checked species"> |
---|
7 | <options> |
---|
8 | <filter type="data_meta" ref="input1" key="species" /> |
---|
9 | </options> |
---|
10 | </param> |
---|
11 | <param name="complete_blocks" type="select" label="Exclude blocks which have a requested species missing"> |
---|
12 | <option value="partial_allowed">include blocks with missing species</option> |
---|
13 | <option value="partial_disallowed">exclude blocks with missing species</option> |
---|
14 | </param> |
---|
15 | </inputs> |
---|
16 | <outputs> |
---|
17 | <data format="bed" name="out_file1" /> |
---|
18 | </outputs> |
---|
19 | <tests> |
---|
20 | <test> |
---|
21 | <param name="input1" value="4.maf"/> |
---|
22 | <param name="species" value="hg17"/> |
---|
23 | <param name="complete_blocks" value="partial_disallowed"/> |
---|
24 | <output name="out_file1" file="cf_maf_to_bed.dat"/> |
---|
25 | </test> |
---|
26 | </tests> |
---|
27 | <help> |
---|
28 | |
---|
29 | **What it does** |
---|
30 | |
---|
31 | This tool converts every MAF block to an interval line (in BED format; scroll down for description of MAF and BED formats) describing position of that alignment block within a corresponding genome. |
---|
32 | |
---|
33 | The interface for this tool contains two pages (steps): |
---|
34 | |
---|
35 | * **Step 1 of 2**. Choose multiple alignments from history to be converted to BED format. |
---|
36 | * **Step 2 of 2**. Choose species from the alignment to be included in the output and specify how to deal with alignment blocks that lack one or more species: |
---|
37 | |
---|
38 | * **Choose species** - the tool reads the alignment provided during Step 1 and generates a list of species contained within that alignment. Using checkboxes you can specify taxa to be included in the output (only reference genome, shown in **bold**, is selected by default). If you select more than one species, then more than one history item will be created. |
---|
39 | * **Choose to include/exclude blocks with missing species** - if an alignment block does not contain any one of the species you selected within **Choose species** menu and this option is set to **exclude blocks with missing species**, then coordinates of such a block **will not** be included in the output (see **Example 2** below). |
---|
40 | |
---|
41 | |
---|
42 | ----- |
---|
43 | |
---|
44 | **Example 1**: **Include only reference genome** (hg18 in this case) and **include blocks with missing species**: |
---|
45 | |
---|
46 | For the following alignment:: |
---|
47 | |
---|
48 | ##maf version=1 |
---|
49 | a score=68686.000000 |
---|
50 | s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- |
---|
51 | s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- |
---|
52 | s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- |
---|
53 | s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- |
---|
54 | s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C |
---|
55 | |
---|
56 | a score=10289.000000 |
---|
57 | s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
58 | s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
59 | s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
60 | |
---|
61 | the tool will create **a single** history item containing the following (**note** that field 4 is added to the output and is numbered iteratively: hg18_0, hg18_1 etc.):: |
---|
62 | |
---|
63 | chr20 56827368 56827443 hg18_0 0 + |
---|
64 | chr20 56827443 56827480 hg18_1 0 + |
---|
65 | |
---|
66 | ----- |
---|
67 | |
---|
68 | **Example 2**: **Include hg18 and mm8** and **exclude blocks with missing species**: |
---|
69 | |
---|
70 | For the following alignment:: |
---|
71 | |
---|
72 | ##maf version=1 |
---|
73 | a score=68686.000000 |
---|
74 | s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- |
---|
75 | s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- |
---|
76 | s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- |
---|
77 | s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- |
---|
78 | s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C |
---|
79 | |
---|
80 | a score=10289.000000 |
---|
81 | s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
82 | s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
83 | s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
84 | |
---|
85 | the tool will create **two** history items (one for hg18 and one fopr mm8) containing the following (**note** that both history items contain only one line describing the first alignment block. The second MAF block is not included in the output because it does not contain mm8): |
---|
86 | |
---|
87 | History item **1** (for hg18):: |
---|
88 | |
---|
89 | chr20 56827368 56827443 hg18_0 0 + |
---|
90 | |
---|
91 | History item **2** (for mm8):: |
---|
92 | |
---|
93 | chr2 173910832 173910893 mm8_0 0 + |
---|
94 | |
---|
95 | ------- |
---|
96 | |
---|
97 | .. class:: infomark |
---|
98 | |
---|
99 | **About formats** |
---|
100 | |
---|
101 | **MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes. |
---|
102 | |
---|
103 | - The .maf format is line-oriented. Each multiple alignment ends with a blank line. |
---|
104 | - Each sequence in an alignment is on a single line. |
---|
105 | - Lines starting with # are considered to be comments. |
---|
106 | - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment. |
---|
107 | - Some MAF files may contain two optional line types: |
---|
108 | |
---|
109 | - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line; |
---|
110 | - An "e" line containing information about the size of the gap between the alignments that span the current block. |
---|
111 | |
---|
112 | **BED format** Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. It has three required fields and a number of additional optional ones: |
---|
113 | |
---|
114 | The first three BED fields (required) are:: |
---|
115 | |
---|
116 | 1. chrom - The name of the chromosome (e.g. chr1, chrY_random). |
---|
117 | 2. chromStart - The starting position in the chromosome. (The first base in a chromosome is numbered 0.) |
---|
118 | 3. chromEnd - The ending position in the chromosome, plus 1 (i.e., a half-open interval). |
---|
119 | |
---|
120 | Additional (optional) fields are:: |
---|
121 | |
---|
122 | 4. name - The name of the BED line. |
---|
123 | 5. score - A score between 0 and 1000. |
---|
124 | 6. strand - Defines the strand - either '+' or '-'. |
---|
125 | |
---|
126 | |
---|
127 | </help> |
---|
128 | <code file="maf_to_bed_code.py"/> |
---|
129 | </tool> |
---|
130 | |
---|