| 1 | <tool id="Extract genomic DNA 1" name="Extract Genomic DNA" version="2.2.1"> |
|---|
| 2 | <description>using coordinates from assembled/unassembled genomes</description> |
|---|
| 3 | <command interpreter="python"> |
|---|
| 4 | extract_genomic_dna.py $input $out_file1 -d $dbkey -o $out_format -g ${GALAXY_DATA_INDEX_DIR} |
|---|
| 5 | #if isinstance( $input.datatype, $__app__.datatypes_registry.get_datatype_by_extension('gff').__class__): |
|---|
| 6 | -1 1,4,5,7 --gff |
|---|
| 7 | #else: |
|---|
| 8 | -1 ${input.metadata.chromCol},${input.metadata.startCol},${input.metadata.endCol},${input.metadata.strandCol} |
|---|
| 9 | #end if |
|---|
| 10 | </command> |
|---|
| 11 | <inputs> |
|---|
| 12 | <param format="interval,gff" name="input" type="data" label="Fetch sequences corresponding to Query"> |
|---|
| 13 | <validator type="unspecified_build" /> |
|---|
| 14 | <validator type="dataset_metadata_in_file" filename="alignseq.loc" metadata_name="dbkey" metadata_column="1" message="Sequences are not currently available for the specified build." line_startswith="seq" /> |
|---|
| 15 | </param> |
|---|
| 16 | <param name="out_format" type="select" label="Output data type"> |
|---|
| 17 | <option value="fasta">FASTA</option> |
|---|
| 18 | <option value="interval">Interval</option> |
|---|
| 19 | </param> |
|---|
| 20 | </inputs> |
|---|
| 21 | <outputs> |
|---|
| 22 | <data format="input" name="out_file1" metadata_source="input"> |
|---|
| 23 | <change_format> |
|---|
| 24 | <when input="out_format" value="fasta" format="fasta" /> |
|---|
| 25 | </change_format> |
|---|
| 26 | </data> |
|---|
| 27 | </outputs> |
|---|
| 28 | <tests> |
|---|
| 29 | <test> |
|---|
| 30 | <param name="input" value="1.bed" dbkey="hg17" ftype="bed" /> |
|---|
| 31 | <param name="out_format" value="fasta"/> |
|---|
| 32 | <output name="out_file1" file="extract_genomic_dna_out1.fasta" /> |
|---|
| 33 | </test> |
|---|
| 34 | <test> |
|---|
| 35 | <param name="input" value="droPer1.bed" dbkey="droPer1" ftype="bed" /> |
|---|
| 36 | <param name="out_format" value="fasta"/> |
|---|
| 37 | <output name="out_file1" file="extract_genomic_dna_out2.fasta" /> |
|---|
| 38 | </test> |
|---|
| 39 | <test> |
|---|
| 40 | <param name="input" value="1.bed" dbkey="hg17" ftype="bed" /> |
|---|
| 41 | <param name="out_format" value="interval"/> |
|---|
| 42 | <output name="out_file1" file="extract_genomic_dna_out3.interval" /> |
|---|
| 43 | </test> |
|---|
| 44 | <!-- Test GFF file support. --> |
|---|
| 45 | <test> |
|---|
| 46 | <param name="input" value="gff_filter_by_attribute_out1.gff" dbkey="mm9" ftype="gff" /> |
|---|
| 47 | <param name="out_format" value="interval"/> |
|---|
| 48 | <output name="out_file1" file="extract_genomic_dna_out4.gff" /> |
|---|
| 49 | </test> |
|---|
| 50 | <test> |
|---|
| 51 | <param name="input" value="gff_filter_by_attribute_out1.gff" dbkey="mm9" ftype="gff" /> |
|---|
| 52 | <param name="out_format" value="fasta"/> |
|---|
| 53 | <output name="out_file1" file="extract_genomic_dna_out5.fasta" /> |
|---|
| 54 | </test> |
|---|
| 55 | </tests> |
|---|
| 56 | <help> |
|---|
| 57 | |
|---|
| 58 | .. class:: warningmark |
|---|
| 59 | |
|---|
| 60 | This tool requires tabular formatted data. If your data is not TAB delimited, use *Text Manipulation->Convert*. |
|---|
| 61 | |
|---|
| 62 | .. class:: warningmark |
|---|
| 63 | |
|---|
| 64 | Make sure that the genome build is specified for the dataset from which you are extracting sequences (click the pencil icon in the history item if it is not specified). |
|---|
| 65 | |
|---|
| 66 | .. class:: warningmark |
|---|
| 67 | |
|---|
| 68 | All of the following will cause a line from the input dataset to be skipped and a warning generated. The number of warnings and skipped lines is documented in the resulting history item. |
|---|
| 69 | - Any lines that do not contain at least 3 columns, a chromosome and numerical start and end coordinates. |
|---|
| 70 | - Sequences that fall outside of the range of a line's start and end coordinates. |
|---|
| 71 | - Chromosome, start or end coordinates that are invalid for the specified build. |
|---|
| 72 | - Any lines whose data columns are not separated by a **TAB** character ( other white-space characters are invalid ). |
|---|
| 73 | |
|---|
| 74 | .. class:: infomark |
|---|
| 75 | |
|---|
| 76 | **Extract genomic DNA using coordinates from ASSEMBLED genomes and UNassembled genomes** previously were achieved by two separate tools. |
|---|
| 77 | |
|---|
| 78 | ----- |
|---|
| 79 | |
|---|
| 80 | **What it does** |
|---|
| 81 | |
|---|
| 82 | This tool uses coordinate, strand, and build information to fetch genomic DNAs in FASTA or interval format. |
|---|
| 83 | |
|---|
| 84 | If strand is not defined, the default value is "+". |
|---|
| 85 | |
|---|
| 86 | ----- |
|---|
| 87 | |
|---|
| 88 | **Example** |
|---|
| 89 | |
|---|
| 90 | If the input dataset is:: |
|---|
| 91 | |
|---|
| 92 | chr7 127475281 127475310 NM_000230 0 + |
|---|
| 93 | chr7 127485994 127486166 NM_000230 0 + |
|---|
| 94 | chr7 127486011 127486166 D49487 0 + |
|---|
| 95 | |
|---|
| 96 | Extracting sequences with **FASTA** output data type returns:: |
|---|
| 97 | |
|---|
| 98 | >hg17_chr7_127475281_127475310_+ |
|---|
| 99 | GTAGGAATCGCAGCGCCAGCGGTTGCAAG |
|---|
| 100 | >hg17_chr7_127485994_127486166_+ |
|---|
| 101 | GCCCAAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCG |
|---|
| 102 | GATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATC |
|---|
| 103 | CAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAG |
|---|
| 104 | GATCAATGACATTTCACACACG |
|---|
| 105 | >hg17_chr7_127486011_127486166_+ |
|---|
| 106 | TGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGG |
|---|
| 107 | CCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGA |
|---|
| 108 | CACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCAC |
|---|
| 109 | ACACG |
|---|
| 110 | |
|---|
| 111 | Extracting sequences with **Interval** output data type returns:: |
|---|
| 112 | |
|---|
| 113 | chr7 127475281 127475310 NM_000230 0 + GTAGGAATCGCAGCGCCAGCGGTTGCAAG |
|---|
| 114 | chr7 127485994 127486166 NM_000230 0 + GCCCAAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACG |
|---|
| 115 | chr7 127486011 127486166 D49487 0 + TGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACG |
|---|
| 116 | |
|---|
| 117 | </help> |
|---|
| 118 | </tool> |
|---|