[2] | 1 | <tool id="Extract genomic DNA 1" name="Extract Genomic DNA" version="2.2.1"> |
---|
| 2 | <description>using coordinates from assembled/unassembled genomes</description> |
---|
| 3 | <command interpreter="python"> |
---|
| 4 | extract_genomic_dna.py $input $out_file1 -d $dbkey -o $out_format -g ${GALAXY_DATA_INDEX_DIR} |
---|
| 5 | #if isinstance( $input.datatype, $__app__.datatypes_registry.get_datatype_by_extension('gff').__class__): |
---|
| 6 | -1 1,4,5,7 --gff |
---|
| 7 | #else: |
---|
| 8 | -1 ${input.metadata.chromCol},${input.metadata.startCol},${input.metadata.endCol},${input.metadata.strandCol} |
---|
| 9 | #end if |
---|
| 10 | </command> |
---|
| 11 | <inputs> |
---|
| 12 | <param format="interval,gff" name="input" type="data" label="Fetch sequences corresponding to Query"> |
---|
| 13 | <validator type="unspecified_build" /> |
---|
| 14 | <validator type="dataset_metadata_in_file" filename="alignseq.loc" metadata_name="dbkey" metadata_column="1" message="Sequences are not currently available for the specified build." line_startswith="seq" /> |
---|
| 15 | </param> |
---|
| 16 | <param name="out_format" type="select" label="Output data type"> |
---|
| 17 | <option value="fasta">FASTA</option> |
---|
| 18 | <option value="interval">Interval</option> |
---|
| 19 | </param> |
---|
| 20 | </inputs> |
---|
| 21 | <outputs> |
---|
| 22 | <data format="input" name="out_file1" metadata_source="input"> |
---|
| 23 | <change_format> |
---|
| 24 | <when input="out_format" value="fasta" format="fasta" /> |
---|
| 25 | </change_format> |
---|
| 26 | </data> |
---|
| 27 | </outputs> |
---|
| 28 | <tests> |
---|
| 29 | <test> |
---|
| 30 | <param name="input" value="1.bed" dbkey="hg17" ftype="bed" /> |
---|
| 31 | <param name="out_format" value="fasta"/> |
---|
| 32 | <output name="out_file1" file="extract_genomic_dna_out1.fasta" /> |
---|
| 33 | </test> |
---|
| 34 | <test> |
---|
| 35 | <param name="input" value="droPer1.bed" dbkey="droPer1" ftype="bed" /> |
---|
| 36 | <param name="out_format" value="fasta"/> |
---|
| 37 | <output name="out_file1" file="extract_genomic_dna_out2.fasta" /> |
---|
| 38 | </test> |
---|
| 39 | <test> |
---|
| 40 | <param name="input" value="1.bed" dbkey="hg17" ftype="bed" /> |
---|
| 41 | <param name="out_format" value="interval"/> |
---|
| 42 | <output name="out_file1" file="extract_genomic_dna_out3.interval" /> |
---|
| 43 | </test> |
---|
| 44 | <!-- Test GFF file support. --> |
---|
| 45 | <test> |
---|
| 46 | <param name="input" value="gff_filter_by_attribute_out1.gff" dbkey="mm9" ftype="gff" /> |
---|
| 47 | <param name="out_format" value="interval"/> |
---|
| 48 | <output name="out_file1" file="extract_genomic_dna_out4.gff" /> |
---|
| 49 | </test> |
---|
| 50 | <test> |
---|
| 51 | <param name="input" value="gff_filter_by_attribute_out1.gff" dbkey="mm9" ftype="gff" /> |
---|
| 52 | <param name="out_format" value="fasta"/> |
---|
| 53 | <output name="out_file1" file="extract_genomic_dna_out5.fasta" /> |
---|
| 54 | </test> |
---|
| 55 | </tests> |
---|
| 56 | <help> |
---|
| 57 | |
---|
| 58 | .. class:: warningmark |
---|
| 59 | |
---|
| 60 | This tool requires tabular formatted data. If your data is not TAB delimited, use *Text Manipulation->Convert*. |
---|
| 61 | |
---|
| 62 | .. class:: warningmark |
---|
| 63 | |
---|
| 64 | Make sure that the genome build is specified for the dataset from which you are extracting sequences (click the pencil icon in the history item if it is not specified). |
---|
| 65 | |
---|
| 66 | .. class:: warningmark |
---|
| 67 | |
---|
| 68 | All of the following will cause a line from the input dataset to be skipped and a warning generated. The number of warnings and skipped lines is documented in the resulting history item. |
---|
| 69 | - Any lines that do not contain at least 3 columns, a chromosome and numerical start and end coordinates. |
---|
| 70 | - Sequences that fall outside of the range of a line's start and end coordinates. |
---|
| 71 | - Chromosome, start or end coordinates that are invalid for the specified build. |
---|
| 72 | - Any lines whose data columns are not separated by a **TAB** character ( other white-space characters are invalid ). |
---|
| 73 | |
---|
| 74 | .. class:: infomark |
---|
| 75 | |
---|
| 76 | **Extract genomic DNA using coordinates from ASSEMBLED genomes and UNassembled genomes** previously were achieved by two separate tools. |
---|
| 77 | |
---|
| 78 | ----- |
---|
| 79 | |
---|
| 80 | **What it does** |
---|
| 81 | |
---|
| 82 | This tool uses coordinate, strand, and build information to fetch genomic DNAs in FASTA or interval format. |
---|
| 83 | |
---|
| 84 | If strand is not defined, the default value is "+". |
---|
| 85 | |
---|
| 86 | ----- |
---|
| 87 | |
---|
| 88 | **Example** |
---|
| 89 | |
---|
| 90 | If the input dataset is:: |
---|
| 91 | |
---|
| 92 | chr7 127475281 127475310 NM_000230 0 + |
---|
| 93 | chr7 127485994 127486166 NM_000230 0 + |
---|
| 94 | chr7 127486011 127486166 D49487 0 + |
---|
| 95 | |
---|
| 96 | Extracting sequences with **FASTA** output data type returns:: |
---|
| 97 | |
---|
| 98 | >hg17_chr7_127475281_127475310_+ |
---|
| 99 | GTAGGAATCGCAGCGCCAGCGGTTGCAAG |
---|
| 100 | >hg17_chr7_127485994_127486166_+ |
---|
| 101 | GCCCAAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCG |
---|
| 102 | GATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATC |
---|
| 103 | CAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAG |
---|
| 104 | GATCAATGACATTTCACACACG |
---|
| 105 | >hg17_chr7_127486011_127486166_+ |
---|
| 106 | TGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGG |
---|
| 107 | CCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGA |
---|
| 108 | CACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCAC |
---|
| 109 | ACACG |
---|
| 110 | |
---|
| 111 | Extracting sequences with **Interval** output data type returns:: |
---|
| 112 | |
---|
| 113 | chr7 127475281 127475310 NM_000230 0 + GTAGGAATCGCAGCGCCAGCGGTTGCAAG |
---|
| 114 | chr7 127485994 127486166 NM_000230 0 + GCCCAAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACG |
---|
| 115 | chr7 127486011 127486166 D49487 0 + TGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACG |
---|
| 116 | |
---|
| 117 | </help> |
---|
| 118 | </tool> |
---|