[2] | 1 | <tool id="MAF_To_Interval1" name="MAF to Interval" force_history_refresh="True"> |
---|
| 2 | <description>Converts a MAF formatted file to the Interval format</description> |
---|
| 3 | <command interpreter="python">maf_to_interval.py $input1 $out_file1 $out_file1.id $__new_file_path__ $input1.dbkey $species $input1.metadata.species $complete_blocks $remove_gaps</command> |
---|
| 4 | <inputs> |
---|
| 5 | <param format="maf" name="input1" type="data" label="MAF file to convert"/> |
---|
| 6 | <param name="species" type="select" label="Select additional species" display="checkboxes" multiple="true" help="The species matching the dbkey of the alignment is always included. A separate history item will be created for each species."> |
---|
| 7 | <options> |
---|
| 8 | <filter type="data_meta" ref="input1" key="species" /> |
---|
| 9 | <filter type="remove_value" meta_ref="input1" key="dbkey" /> |
---|
| 10 | </options> |
---|
| 11 | </param> |
---|
| 12 | <param name="complete_blocks" type="select" label="Exclude blocks which have a species missing"> |
---|
| 13 | <option value="partial_allowed">include blocks with missing species</option> |
---|
| 14 | <option value="partial_disallowed">exclude blocks with missing species</option> |
---|
| 15 | </param> |
---|
| 16 | <param name="remove_gaps" type="select" label="Remove Gap characters from sequences"> |
---|
| 17 | <option value="keep_gaps">keep gaps</option> |
---|
| 18 | <option value="remove_gaps">remove gaps</option> |
---|
| 19 | </param> |
---|
| 20 | </inputs> |
---|
| 21 | <outputs> |
---|
| 22 | <data format="interval" name="out_file1" /> |
---|
| 23 | </outputs> |
---|
| 24 | <tests> |
---|
| 25 | <test> |
---|
| 26 | <param name="input1" value="4.maf" dbkey="hg17"/> |
---|
| 27 | <param name="complete_blocks" value="partial_disallowed"/> |
---|
| 28 | <param name="remove_gaps" value="keep_gaps"/> |
---|
| 29 | <param name="species" value="panTro1" /> |
---|
| 30 | <output name="out_file1" file="maf_to_interval_out_hg17.interval"/> |
---|
| 31 | <output name="out_file1" file="maf_to_interval_out_panTro1.interval"/> |
---|
| 32 | </test> |
---|
| 33 | </tests> |
---|
| 34 | <help> |
---|
| 35 | |
---|
| 36 | **What it does** |
---|
| 37 | |
---|
| 38 | This tool converts every MAF block to a set of genomic intervals describing the position of that alignment block within a corresponding genome. Sequences from aligning species are also included in the output. |
---|
| 39 | |
---|
| 40 | The interface for this tool contains several options: |
---|
| 41 | |
---|
| 42 | * **MAF file to convert**. Choose multiple alignments from history to be converted to BED format. |
---|
| 43 | * **Choose species**. Choose additional species from the alignment to be included in the output |
---|
| 44 | * **Exclude blocks which have a species missing**. if an alignment block does not contain any one of the species found in the alignment set and this option is set to **exclude blocks with missing species**, then coordinates of such a block **will not** be included in the output (see **Example 2** below). |
---|
| 45 | * **Remove Gap characters from sequences**. Gaps can be removed from sequences before they are output. |
---|
| 46 | |
---|
| 47 | |
---|
| 48 | ----- |
---|
| 49 | |
---|
| 50 | **Example 1**: **Include only reference genome** (hg18 in this case) and **include blocks with missing species**: |
---|
| 51 | |
---|
| 52 | For the following alignment:: |
---|
| 53 | |
---|
| 54 | ##maf version=1 |
---|
| 55 | a score=68686.000000 |
---|
| 56 | s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- |
---|
| 57 | s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- |
---|
| 58 | s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- |
---|
| 59 | s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- |
---|
| 60 | s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C |
---|
| 61 | |
---|
| 62 | a score=10289.000000 |
---|
| 63 | s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
| 64 | s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
| 65 | s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
| 66 | |
---|
| 67 | the tool will create **a single** history item containing the following (**note** the name field is numbered iteratively: hg18_0_0, hg18_1_0 etc. where the first number is the block number and the second number is the iteration through the block (if a species appears twice in a block, that interval will be repeated) and sequences for each species are included in the order specified in the header: the field is left empty when no sequence is available for that species):: |
---|
| 68 | |
---|
| 69 | #chrom start end strand score name canFam2 hg18 mm8 panTro2 rheMac2 |
---|
| 70 | chr20 56827368 56827443 + 68686.0 hg18_0_0 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- |
---|
| 71 | chr20 56827443 56827480 + 10289.0 hg18_1_0 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
| 72 | |
---|
| 73 | |
---|
| 74 | ----- |
---|
| 75 | |
---|
| 76 | **Example 2**: **Include hg18 and mm8** and **exclude blocks with missing species**: |
---|
| 77 | |
---|
| 78 | For the following alignment:: |
---|
| 79 | |
---|
| 80 | ##maf version=1 |
---|
| 81 | a score=68686.000000 |
---|
| 82 | s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- |
---|
| 83 | s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- |
---|
| 84 | s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- |
---|
| 85 | s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- |
---|
| 86 | s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C |
---|
| 87 | |
---|
| 88 | a score=10289.000000 |
---|
| 89 | s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
| 90 | s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
| 91 | s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG |
---|
| 92 | |
---|
| 93 | the tool will create **two** history items (one for hg18 and one for mm8) containing the following (**note** that both history items contain only one line describing the first alignment block. The second MAF block is not included in the output because it does not contain mm8): |
---|
| 94 | |
---|
| 95 | History item **1** (for hg18):: |
---|
| 96 | |
---|
| 97 | #chrom start end strand score name canFam2 hg18 mm8 panTro2 rheMac2 |
---|
| 98 | chr20 56827368 56827443 + 68686.0 hg18_0_0 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- |
---|
| 99 | |
---|
| 100 | |
---|
| 101 | History item **2** (for mm8):: |
---|
| 102 | |
---|
| 103 | #chrom start end strand score name canFam2 hg18 mm8 panTro2 rheMac2 |
---|
| 104 | chr2 173910832 173910893 + 68686.0 mm8_0_0 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA------- |
---|
| 105 | |
---|
| 106 | |
---|
| 107 | ------- |
---|
| 108 | |
---|
| 109 | .. class:: infomark |
---|
| 110 | |
---|
| 111 | **About formats** |
---|
| 112 | |
---|
| 113 | **MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes. |
---|
| 114 | |
---|
| 115 | - The .maf format is line-oriented. Each multiple alignment ends with a blank line. |
---|
| 116 | - Each sequence in an alignment is on a single line. |
---|
| 117 | - Lines starting with # are considered to be comments. |
---|
| 118 | - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment. |
---|
| 119 | - Some MAF files may contain two optional line types: |
---|
| 120 | |
---|
| 121 | - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line; |
---|
| 122 | - An "e" line containing information about the size of the gap between the alignments that span the current block. |
---|
| 123 | |
---|
| 124 | |
---|
| 125 | </help> |
---|
| 126 | </tool> |
---|
| 127 | |
---|