bwa_wrapper.py
--threads="4"
#if $genomeSource.refGenomeSource == "history":
--ref=$genomeSource.ownFile
#else:
--ref=$genomeSource.indices
#end if
--fastq=$paired.input1
#if $paired.sPaired == "paired":
--rfastq=$paired.input2
#else:
--rfastq="None"
#end if
--output=$output --genAlignType=$paired.sPaired --params=$params.source_select --fileSource=$genomeSource.refGenomeSource
#if $params.source_select == "pre_set":
--maxEditDist="None" --fracMissingAligns="None" --maxGapOpens="None" --maxGapExtens="None" --disallowLongDel="None" --disallowIndel="None" --seed="None" --maxEditDistSeed="None" --mismatchPenalty="None" --gapOpenPenalty="None" --gapExtensPenalty="None" --suboptAlign="None" --noIterSearch="None" --outputTopN="None" --maxInsertSize="None" --maxOccurPairing="None"
#else:
--maxEditDist=$params.maxEditDist --fracMissingAligns=$params.fracMissingAligns --maxGapOpens=$params.maxGapOpens --maxGapExtens=$params.maxGapExtens --disallowLongDel=$params.disallowLongDel --disallowIndel=$params.disallowIndel --seed=$params.seed --maxEditDistSeed=$params.maxEditDistSeed --mismatchPenalty=$params.mismatchPenalty --gapOpenPenalty=$params.gapOpenPenalty --gapExtensPenalty=$params.gapExtensPenalty --suboptAlign=$params.suboptAlign --noIterSearch=$params.noIterSearch --outputTopN=$params.outputTopN --maxInsertSize=$params.maxInsertSize --maxOccurPairing=$params.maxOccurPairing
#end if
#if $genomeSource.refGenomeSource == "history":
--dbkey=$dbkey
#else:
--dbkey="None"
#end if
--suppressHeader=$suppressHeader
bwa
**What it does**
BWA is a fast light-weighted tool that aligns relatively short sequences (queries) to a sequence database (large), such as the human reference genome. It is developed by Heng Li at the Sanger Insitute. Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-60.
------
**Know what you are doing**
.. class:: warningmark
There is no such thing (yet) as an automated gearshift in short read mapping. It is all like stick-shift driving in San Francisco. In other words = running this tool with default parameters will probably not give you meaningful results. A way to deal with this is to **understand** the parameters by carefully reading the `documentation`__ and experimenting. Fortunately, Galaxy makes experimenting easy.
.. __: http://bio-bwa.sourceforge.net/
------
**Input formats**
BWA accepts files in Sanger FASTQ format. Use the FASTQ Groomer to prepare your files.
------
**Outputs**
The output is in SAM format, and has the following columns::
Column Description
-------- --------------------------------------------------------
1 QNAME Query (pair) NAME
2 FLAG bitwise FLAG
3 RNAME Reference sequence NAME
4 POS 1-based leftmost POSition/coordinate of clipped sequence
5 MAPQ MAPping Quality (Phred-scaled)
6 CIGAR extended CIGAR string
7 MRNM Mate Reference sequence NaMe ('=' if same as RNAME)
8 MPOS 1-based Mate POSition
9 ISIZE Inferred insert SIZE
10 SEQ query SEQuence on the same strand as the reference
11 QUAL query QUALity (ASCII-33 gives the Phred base quality)
12 OPT variable OPTional fields in the format TAG:VTYPE:VALU
The flags are as follows::
Flag Description
------ -------------------------------------
0x0001 the read is paired in sequencing
0x0002 the read is mapped in a proper pair
0x0004 the query sequence itself is unmapped
0x0008 the mate is unmapped
0x0010 strand of the query (1 for reverse)
0x0020 strand of the mate
0x0040 the read is the first read in a pair
0x0080 the read is the second read in a pair
0x0100 the alignment is not primary
It looks like this (scroll sideways to see the entire example)::
QNAME FLAG RNAME POS MAPQ CIAGR MRNM MPOS ISIZE SEQ QUAL OPT
HWI-EAS91_1_30788AAXX:1:1:1761:343 4 * 0 0 * * 0 0 AAAAAAANNAAAAAAAAAAAAAAAAAAAAAAAAAAACNNANNGAGTNGNNNNNNNGCTTCCCACAGNNCTGG hhhhhhh;;hhhhhhhhhhh^hOhhhhghhhfhhhgh;;h;;hhhh;h;;;;;;;hhhhhhghhhh;;Phhh
HWI-EAS91_1_30788AAXX:1:1:1578:331 4 * 0 0 * * 0 0 GTATAGANNAATAAGAAAAAAAAAAATGAAGACTTTCNNANNTCTGNANNNNNNNTCTTTTTTCAGNNGTAG hhhhhhh;;hhhhhhhhhhhhhhhhhhhhhhhhhhhh;;h;;hhhh;h;;;;;;;hhhhhhhhhhh;;hhVh
-------
**BWA settings**
All of the options have a default value. You can change any of them. All of the options in BWA have been implemented here.
------
**BWA parameter list**
This is an exhaustive list of BWA options:
For **aln**::
-n NUM Maximum edit distance if the value is INT, or the fraction of missing
alignments given 2% uniform base error rate if FLOAT. In the latter
case, the maximum edit distance is automatically chosen for different
read lengths. [0.04]
-o INT Maximum number of gap opens [1]
-e INT Maximum number of gap extensions, -1 for k-difference mode
(disallowing long gaps) [-1]
-d INT Disallow a long deletion within INT bp towards the 3'-end [16]
-i INT Disallow an indel within INT bp towards the ends [5]
-l INT Take the first INT subsequence as seed. If INT is larger than the
query sequence, seeding will be disabled. For long reads, this option
is typically ranged from 25 to 35 for '-k 2'. [inf]
-k INT Maximum edit distance in the seed [2]
-t INT Number of threads (multi-threading mode) [1]
-M INT Mismatch penalty. BWA will not search for suboptimal hits with a score
lower than (bestScore-misMsc). [3]
-O INT Gap open penalty [11]
-E INT Gap extension penalty [4]
-c Reverse query but not complement it, which is required for alignment
in the color space.
-R Proceed with suboptimal alignments even if the top hit is a repeat. By
default, BWA only searches for suboptimal alignments if the top hit is
unique. Using this option has no effect on accuracy for single-end
reads. It is mainly designed for improving the alignment accuracy of
paired-end reads. However, the pairing procedure will be slowed down,
especially for very short reads (~32bp).
-N Disable iterative search. All hits with no more than maxDiff
differences will be found. This mode is much slower than the default.
For **samse**::
-n INT Output up to INT top hits. Value -1 to disable outputting multiple
hits. NOTE: Entering a value other than -1 will result in output that
is not in SAM format, and therefore not usable further down the
pipeline. Check the BWA documentation for details on the format of
the output. [-1]
For **sampe**::
-a INT Maximum insert size for a read pair to be considered as being mapped
properly. Since version 0.4.5, this option is only used when there
are not enough good alignment to infer the distribution of insert
sizes. [500]
-o INT Maximum occurrences of a read for pairing. A read with more
occurrences will be treated as a single-end read. Reducing this
parameter helps faster pairing. [100000]