Search translated nucleotide database with protein query sequence(s) ## The command is a Cheetah template which allows some Python based syntax. ## Lines starting hash hash are comments. Galaxy will turn newlines into spaces tblastn -query "$query" #if $db_opts.db_opts_selector == "db": -db "$db_opts.database" #else: -subject "$db_opts.subject" #end if -evalue $evalue_cutoff $adv_opts.filter_query $adv_opts.matrix -out $output1 $out_format -num_threads 8 ## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string ## Note -max_target_seqs overrides -num_descriptions and -num_alignments #if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0): -max_target_seqs $adv_opts.max_hits #end if #if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0): -word_size $adv_opts.word_size #end if ##Ungapped disabled for now - see comments below ##$adv_opts.ungapped blastn .. class:: warningmark **Note**. Database searches may take substantial amount of time. For large input datasets it is advisable to allow overnight processing. ----- **What it does** Search a *translated nucleotide database* using a *protein query*, using the NCBI BLAST+ tblastn command line tool. ----- **Output format** Because Galaxy focuses on processing tabular data, the default output of this tool is tabular. This contains 12 columns: 1. Id of your sequence 2. GI of the database hit 3. % identity 4. Alignment length 5. # mismatches 6. # gaps 7. Start position in your sequence 8. End position in your sequence 9. Start position in database hit 10. End position in database hit 11. E-value 12. Bit score The second option is BLAST XML output, which is designed to be parsed by another program, and is understood by other Galaxy tools. You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program). The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website. The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query. The two query anchored outputs show a multiple sequence alignment between the query and all the matches, and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences). ------- **References** Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.