1 | <tool id="Extract genomic DNA 1" name="Extract Genomic DNA" version="2.2.1"> |
---|
2 | <description>using coordinates from assembled/unassembled genomes</description> |
---|
3 | <command interpreter="python"> |
---|
4 | extract_genomic_dna.py $input $out_file1 -d $dbkey -o $out_format -g ${GALAXY_DATA_INDEX_DIR} |
---|
5 | #if isinstance( $input.datatype, $__app__.datatypes_registry.get_datatype_by_extension('gff').__class__): |
---|
6 | -1 1,4,5,7 --gff |
---|
7 | #else: |
---|
8 | -1 ${input.metadata.chromCol},${input.metadata.startCol},${input.metadata.endCol},${input.metadata.strandCol} |
---|
9 | #end if |
---|
10 | </command> |
---|
11 | <inputs> |
---|
12 | <param format="interval,gff" name="input" type="data" label="Fetch sequences corresponding to Query"> |
---|
13 | <validator type="unspecified_build" /> |
---|
14 | <validator type="dataset_metadata_in_file" filename="alignseq.loc" metadata_name="dbkey" metadata_column="1" message="Sequences are not currently available for the specified build." line_startswith="seq" /> |
---|
15 | </param> |
---|
16 | <param name="out_format" type="select" label="Output data type"> |
---|
17 | <option value="fasta">FASTA</option> |
---|
18 | <option value="interval">Interval</option> |
---|
19 | </param> |
---|
20 | </inputs> |
---|
21 | <outputs> |
---|
22 | <data format="input" name="out_file1" metadata_source="input"> |
---|
23 | <change_format> |
---|
24 | <when input="out_format" value="fasta" format="fasta" /> |
---|
25 | </change_format> |
---|
26 | </data> |
---|
27 | </outputs> |
---|
28 | <tests> |
---|
29 | <test> |
---|
30 | <param name="input" value="1.bed" dbkey="hg17" ftype="bed" /> |
---|
31 | <param name="out_format" value="fasta"/> |
---|
32 | <output name="out_file1" file="extract_genomic_dna_out1.fasta" /> |
---|
33 | </test> |
---|
34 | <test> |
---|
35 | <param name="input" value="droPer1.bed" dbkey="droPer1" ftype="bed" /> |
---|
36 | <param name="out_format" value="fasta"/> |
---|
37 | <output name="out_file1" file="extract_genomic_dna_out2.fasta" /> |
---|
38 | </test> |
---|
39 | <test> |
---|
40 | <param name="input" value="1.bed" dbkey="hg17" ftype="bed" /> |
---|
41 | <param name="out_format" value="interval"/> |
---|
42 | <output name="out_file1" file="extract_genomic_dna_out3.interval" /> |
---|
43 | </test> |
---|
44 | <!-- Test GFF file support. --> |
---|
45 | <test> |
---|
46 | <param name="input" value="gff_filter_by_attribute_out1.gff" dbkey="mm9" ftype="gff" /> |
---|
47 | <param name="out_format" value="interval"/> |
---|
48 | <output name="out_file1" file="extract_genomic_dna_out4.gff" /> |
---|
49 | </test> |
---|
50 | <test> |
---|
51 | <param name="input" value="gff_filter_by_attribute_out1.gff" dbkey="mm9" ftype="gff" /> |
---|
52 | <param name="out_format" value="fasta"/> |
---|
53 | <output name="out_file1" file="extract_genomic_dna_out5.fasta" /> |
---|
54 | </test> |
---|
55 | </tests> |
---|
56 | <help> |
---|
57 | |
---|
58 | .. class:: warningmark |
---|
59 | |
---|
60 | This tool requires tabular formatted data. If your data is not TAB delimited, use *Text Manipulation->Convert*. |
---|
61 | |
---|
62 | .. class:: warningmark |
---|
63 | |
---|
64 | Make sure that the genome build is specified for the dataset from which you are extracting sequences (click the pencil icon in the history item if it is not specified). |
---|
65 | |
---|
66 | .. class:: warningmark |
---|
67 | |
---|
68 | All of the following will cause a line from the input dataset to be skipped and a warning generated. The number of warnings and skipped lines is documented in the resulting history item. |
---|
69 | - Any lines that do not contain at least 3 columns, a chromosome and numerical start and end coordinates. |
---|
70 | - Sequences that fall outside of the range of a line's start and end coordinates. |
---|
71 | - Chromosome, start or end coordinates that are invalid for the specified build. |
---|
72 | - Any lines whose data columns are not separated by a **TAB** character ( other white-space characters are invalid ). |
---|
73 | |
---|
74 | .. class:: infomark |
---|
75 | |
---|
76 | **Extract genomic DNA using coordinates from ASSEMBLED genomes and UNassembled genomes** previously were achieved by two separate tools. |
---|
77 | |
---|
78 | ----- |
---|
79 | |
---|
80 | **What it does** |
---|
81 | |
---|
82 | This tool uses coordinate, strand, and build information to fetch genomic DNAs in FASTA or interval format. |
---|
83 | |
---|
84 | If strand is not defined, the default value is "+". |
---|
85 | |
---|
86 | ----- |
---|
87 | |
---|
88 | **Example** |
---|
89 | |
---|
90 | If the input dataset is:: |
---|
91 | |
---|
92 | chr7 127475281 127475310 NM_000230 0 + |
---|
93 | chr7 127485994 127486166 NM_000230 0 + |
---|
94 | chr7 127486011 127486166 D49487 0 + |
---|
95 | |
---|
96 | Extracting sequences with **FASTA** output data type returns:: |
---|
97 | |
---|
98 | >hg17_chr7_127475281_127475310_+ |
---|
99 | GTAGGAATCGCAGCGCCAGCGGTTGCAAG |
---|
100 | >hg17_chr7_127485994_127486166_+ |
---|
101 | GCCCAAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCG |
---|
102 | GATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATC |
---|
103 | CAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAG |
---|
104 | GATCAATGACATTTCACACACG |
---|
105 | >hg17_chr7_127486011_127486166_+ |
---|
106 | TGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGG |
---|
107 | CCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGA |
---|
108 | CACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCAC |
---|
109 | ACACG |
---|
110 | |
---|
111 | Extracting sequences with **Interval** output data type returns:: |
---|
112 | |
---|
113 | chr7 127475281 127475310 NM_000230 0 + GTAGGAATCGCAGCGCCAGCGGTTGCAAG |
---|
114 | chr7 127485994 127486166 NM_000230 0 + GCCCAAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACG |
---|
115 | chr7 127486011 127486166 D49487 0 + TGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACG |
---|
116 | |
---|
117 | </help> |
---|
118 | </tool> |
---|