using coordinates from assembled/unassembled genomes extract_genomic_dna.py $input $out_file1 -d $dbkey -o $out_format -g ${GALAXY_DATA_INDEX_DIR} #if isinstance( $input.datatype, $__app__.datatypes_registry.get_datatype_by_extension('gff').__class__): -1 1,4,5,7 --gff #else: -1 ${input.metadata.chromCol},${input.metadata.startCol},${input.metadata.endCol},${input.metadata.strandCol} #end if .. class:: warningmark This tool requires tabular formatted data. If your data is not TAB delimited, use *Text Manipulation->Convert*. .. class:: warningmark Make sure that the genome build is specified for the dataset from which you are extracting sequences (click the pencil icon in the history item if it is not specified). .. class:: warningmark All of the following will cause a line from the input dataset to be skipped and a warning generated. The number of warnings and skipped lines is documented in the resulting history item. - Any lines that do not contain at least 3 columns, a chromosome and numerical start and end coordinates. - Sequences that fall outside of the range of a line's start and end coordinates. - Chromosome, start or end coordinates that are invalid for the specified build. - Any lines whose data columns are not separated by a **TAB** character ( other white-space characters are invalid ). .. class:: infomark **Extract genomic DNA using coordinates from ASSEMBLED genomes and UNassembled genomes** previously were achieved by two separate tools. ----- **What it does** This tool uses coordinate, strand, and build information to fetch genomic DNAs in FASTA or interval format. If strand is not defined, the default value is "+". ----- **Example** If the input dataset is:: chr7 127475281 127475310 NM_000230 0 + chr7 127485994 127486166 NM_000230 0 + chr7 127486011 127486166 D49487 0 + Extracting sequences with **FASTA** output data type returns:: >hg17_chr7_127475281_127475310_+ GTAGGAATCGCAGCGCCAGCGGTTGCAAG >hg17_chr7_127485994_127486166_+ GCCCAAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCG GATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATC CAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAG GATCAATGACATTTCACACACG >hg17_chr7_127486011_127486166_+ TGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGG CCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGA CACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCAC ACACG Extracting sequences with **Interval** output data type returns:: chr7 127475281 127475310 NM_000230 0 + GTAGGAATCGCAGCGCCAGCGGTTGCAAG chr7 127485994 127486166 NM_000230 0 + GCCCAAGAAGCCCATCCTGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACG chr7 127486011 127486166 D49487 0 + TGGGAAGGAAAATGCATTGGGGAACCCTGTGCGGATTCTTGTGGCTTTGGCCCTATCTTTTCTATGTCCAAGCTGTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACG