Index: galaxy-central/tools/unix_tools/remove_ending.xml
===================================================================
--- galaxy-central/tools/unix_tools/remove_ending.xml (revision 3)
+++ galaxy-central/tools/unix_tools/remove_ending.xml (revision 3)
@@ -0,0 +1,43 @@
+<tool id="Remove ending" name="Remove ending">
+  <description>of a file</description>
+  <command interpreter="sh">remove_ending.sh $num_lines $input $out_file1</command>
+  <inputs>
+    <param name="num_lines" size="5" type="integer" value="1" label="Remove last" help="lines"/>
+    <param format="txt" name="input" type="data" label="from"/>
+  </inputs>
+  <tests>
+	  <test>
+		  <param name="input" value="remove_ending_input1.txt" />
+		  <output name="out_file1" file="remove_ending_output1.txt" />
+		  <param name="num_lines" value="2" />
+	  </test>
+  </tests>
+  <outputs>
+    <data format="input" name="out_file1" metadata_source="input"/>
+  </outputs>
+  <help>
+
+**What it does**
+
+This tool removes specified number of lines from the ending of a dataset
+
+-----
+
+**Example**
+
+Input File::
+
+    chr7  56632  56652   D17003_CTCF_R6  310  +
+    chr7  56736  56756   D17003_CTCF_R7  354  +
+    chr7  56761  56781   D17003_CTCF_R4  220  +
+    chr7  56772  56792   D17003_CTCF_R7  372  +
+    chr7  56775  56795   D17003_CTCF_R4  207  +
+
+After removing the last 2 lines the dataset will look like this::
+
+    chr7  56632  56652   D17003_CTCF_R6  310  +
+    chr7  56736  56756   D17003_CTCF_R7  354  +
+    chr7  56761  56781   D17003_CTCF_R4  220  +
+
+</help>
+</tool>
Index: galaxy-central/tools/unix_tools/word_list_grep.xml
===================================================================
--- galaxy-central/tools/unix_tools/word_list_grep.xml (revision 3)
+++ galaxy-central/tools/unix_tools/word_list_grep.xml (revision 3)
@@ -0,0 +1,106 @@
+<tool id="cshl_word_list_grep" name="Select lines">
+<description>by word list</description>
+<command interpreter="perl">
+	word_list_grep.pl 
+	#if $searchwhere.choice == "column":
+		-c $searchwhere.column
+	#end if
+	-o $output 
+	$inverse 
+	$caseinsensitive 
+	$wholewords 
+	$skip_first_line
+	$wordlist 
+	$input
+</command>
+
+<inputs>
+	<param name="input" format="txt" type="data" label="input file" />
+	<param name="wordlist" format="txt" type="data" label="word list file" />
+
+
+	<param name="inverse" type="boolean" checked="false" truevalue="-v" falsevalue="" label="Inverse filter" 
+		help="Report lines NOT matching the word list" />
+
+	<param name="caseinsensitive" type="boolean" checked="false" truevalue="-i" falsevalue="" label="Case-Insensitive search" 
+		help="" />
+
+	<param name="wholewords" type="boolean" checked="false" truevalue="-w" falsevalue="" label="find whole-words" 
+		help="ignore partial matches (e.g. 'apple' will not match 'snapple') " />
+
+	<param name="skip_first_line" type="boolean" checked="false" truevalue="-s" falsevalue="" label="Ignore first line" 
+		help="Select this option if the first line contains column headers. First line will not be filtered. " />
+
+	<conditional name="searchwhere">
+		<param name="choice" type="select" label="Search words in">
+			<option value="line" selected="true">entire line</option>
+			<option value="column">specific column</option>
+		</param>
+
+		<when value="line">
+		</when>
+
+		<when value="column">
+    			<param name="column" label="in column" type="data_column" data_ref="input" accept_default="true" />
+		</when>
+	</conditional>
+
+</inputs>
+
+<outputs>
+	<data name="output" format="input" metadata_source="input" />
+</outputs>
+
+<help>
+**What it does**
+
+This tool selects lines that match words from a word list.
+
+--------
+
+**Example**
+
+Input file (UCSC's rmsk track from dm3)::
+
+    585	787	66	241	11	chrXHet	2860	3009	-201103	-	DNAREP1_DM	LINE	Penelope	0	594	435	1
+    585	1383	78	220	0	chrXHet	3012	3320	-200792	-	DNAREP1_DM	LINE	Penelope	-217	377	2	1
+    585	244	103	0	0	chrXHet	3737	3776	-200336	-	DNAREP1_DM	LINE	Penelope	-555	39	1	1
+    585	2270	83	144	0	chrXHet	7907	8426	-195686	+	DNAREP1_DM	LINE	Penelope	1	594	0	1
+    585	617	189	73	68	chrXHet	10466	10671	-193441	+	DNAREP1_DM	LINE	Penelope	368	573	-21	1
+    586	1122	71	185	0	chrXHet	173138	173322	-30790	-	PROTOP	DNA	P	-4033	447	230	1
+    ...
+    ...
+
+
+Word list file::
+
+  STALKER
+  PROTOP
+
+ 
+
+Output sequence (searching in column 11)::
+
+    586	1122	71	185	0	chrXHet	173138	173322	-30790	        -	PROTOP	DNA	P	-4033	447	230	1
+    586	228	162	0	0	chrXHet	181026	181063	-23049	        +	STALKER4_I	LTR	Gypsy	9	45	-6485	1
+    585	245	105	26	0	chr3R	41609	41647	-27863406	+	PROTOP_B	DNA	P	507	545	-608	4
+    586	238	91	0	0	chr3R	140224	140257	-27764796	-	PROTOP_B	DNA	P	-617	536	504	4
+    ...
+    ...
+
+( With **find whole-words** not selected, *PROTOP* matched *PROTOP_B*, *STALKER* matched *STALKER4_I* )
+
+
+
+
+Output sequence (searching in column 11, and whole-words only)::
+
+    586	670	90	38	57	chrXHet	168356	168462	-35650	-	PROTOP	DNA	P	-459	4021	3918	1
+    586	413	139	70	0	chrXHet	168462	168548	-35564	-	PROTOP	DNA	P	-3406	1074	983	1
+    586	1122	71	185	0	chrXHet	173138	173322	-30790	-	PROTOP	DNA	P	-4033	447	230	1
+    ...
+    ...
+
+</help>
+
+</tool>
Index: galaxy-central/tools/unix_tools/sort_tool.xml
===================================================================
--- galaxy-central/tools/unix_tools/sort_tool.xml (revision 3)
+++ galaxy-central/tools/unix_tools/sort_tool.xml (revision 3)
@@ -0,0 +1,134 @@
+<tool id="cshl_sort_tool" name="Sort">
+  <!-- 
+   	note 1:
+	  the 'version' sort (or natual order sort)
+	  requires GNU Coreutils 7.1 or later
+
+	note 2:
+	  for greater efficiency, sort buffer size is very large.
+	  If your Galaxy server doesn't have so much memory (or the
+	  sorts you use don't require it) - you can decrease the memory size.
+	  (argument is "-S 2G")
+  -->
+  <command>sort -S 2G $unique 
+      #for $key in $sortkeys
+       '-k ${key.column},${key.column}${key.order}${key.style}'
+      #end for
+  	$input > $out_file1
+  </command>
+
+  <inputs>
+	<param format="txt" name="input" type="data" label="Sort Query" />
+		
+	<param name="unique" type="select" label="Output only unique values?">
+		<option value="">No</option>
+		<option value="-u">Yes</option>
+	</param>
+
+	<repeat name="sortkeys" title="sort key">
+	    <param name="column" label="on column" type="data_column" data_ref="input" accept_default="true" />
+	    <param name="order" type="select" display="radio" label="in">
+	      <option value="r">Descending order</option>
+	      <option value="">Ascending order</option>
+	    </param>
+	    <param name="style" type="select" display="radio" label="Flavor">
+	      <option value="n">Fast numeric sort ([-n])</option>
+	      <option value="g">General numeric sort ( scientific notation [-g])</option>
+	      <option value="V">Natural/Version sort ([-V]) </option>
+	      <option value="">Alphabetical sort</option>
+	    </param>
+	</repeat>
+  </inputs>
+  <tests>
+	  <test>
+		  <!-- Sort Descending numerical order,
+		       with scientific notation -->
+		  <param name="input" value="unix_sort_input1.txt" />
+		  <output name="output" file="unix_sort_output1.txt" />
+		  <param name="unique" value="No" />
+		  <param name="column" value="2" />
+		  <param name="order"  value="r" />
+		  <param name="style"  value="g" />
+	  </test>
+	  <test>
+		  <!-- Sort Ascending numerical order,
+		  with scientific notation - outputing unique values only 
+
+		  The catch:
+		  	chr15 appears twice, with the same value (0.0314 and 3.14e-2).
+			In the output, it should appear only once because of the unique flag
+		  -->
+		  <param name="input" value="unix_sort_input1.txt" />
+		  <output name="output" file="unix_sort_output2.txt" />
+		  <param name="unique" value="Yes" />
+		  <param name="column" value="2" />
+		  <param name="order"  value="" />
+		  <param name="style"  value="g" />
+	  </test>
+	  <test>
+		  <!-- Sort Ascending 'natural' order -->
+		  <param name="input" value="unix_sort_input1.txt" />
+		  <output name="output" file="unix_sort_output3.txt" />
+		  <param name="unique" value="No" />
+		  <param name="column" value="1" />
+		  <param name="order"  value="" />
+		  <param name="style"  value="V" />
+	  </test>
+  </tests>
+  <outputs>
+    <data format="input" name="out_file1" metadata_source="input"/>
+  </outputs>
+  <help>
+
+**What it does**
+
+This tool runs the unix **sort** command on the selected data file.
+
+-----
+
+**Sorting Styles**
+
+* **Fast Numeric**: sort by numeric values. Handles integer values (e.g. 43, 134) and decimal-point values (e.g. 3.14). *Does not* handle scientific notation (e.g. -2.32e2).
+* **General Numeric**: sort by numeric values. Handles all numeric notations (including scientific notation). Slower than *fast numeric*, so use only when necessary.
+* **Natural Sort**: Sort in 'natural' order (natural to humans, not to computers). See example below.
+* **Alphabetical sort**: Sort in strict alphabetical order. See example below.
+
+
+
+
+**Sorting Examples**
+
+Given the following list::
+
+    chr4
+    chr13
+    chr1
+    chr10
+    chr20
+    chr2
+
+**Alphabetical sort** would produce the following sorted list::
+
+    chr1
+    chr10
+    chr13
+    chr2
+    chr20
+    chr4
+
+**Natural Sort** would produce the following sorted list::
+
+    chr1
+    chr2
+    chr4
+    chr10
+    chr13
+    chr20
+
+
+.. class:: infomark
+
+If you're planning to use the file with another tool that expected sorted files (such as *join*), you should use the **Alphabetical sort**,  not the **Natural Sort**. Natural sort order is easier for humans, but is unnatural for computer programs.
+
+  </help>
+</tool>
Index: galaxy-central/tools/unix_tools/sed_wrapper.sh
===================================================================
--- galaxy-central/tools/unix_tools/sed_wrapper.sh (revision 3)
+++ galaxy-central/tools/unix_tools/sed_wrapper.sh (revision 3)
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+##
+## Galaxy wrapper for SED command
+##
+
+##
+## command line arguments:
+##   input_file
+##   output_file
+##   sed-program
+##   [other parameters passed on to sed]
+
+INPUT="$1"
+OUTPUT="$2"
+PROG="$3"
+
+shift 3
+
+if [ -z "$PROG" ]; then
+	echo usage: $0 INPUTFILE OUTPUTFILE SED-PROGRAM [other sed patameters] >&2
+	exit 1
+fi
+
+if [ ! -r "$INPUT" ]; then
+	echo "error: input file ($INPUT) not found!" >&2
+	exit 1
+fi
+
+# Messages printed to STDOUT will be displayed in the "INFO" field in the galaxy dataset.
+# This way the user can tell what was the command
+echo "sed" "$@" "$PROG"
+
+sed -r --sandbox "$@" "$PROG" "$INPUT" > "$OUTPUT"
+if (( $? ));  then exit; fi
+
+exit 0
Index: galaxy-central/tools/unix_tools/cut_tool.xml
===================================================================
--- galaxy-central/tools/unix_tools/cut_tool.xml (revision 3)
+++ galaxy-central/tools/unix_tools/cut_tool.xml (revision 3)
@@ -0,0 +1,94 @@
+<tool id="cshl_cut_tool" name="cut">
+  <description>columns from files</description>
+  <command interpreter="sh">
+  	cut_wrapper.sh '$complement' '$cutwhat' '$list' '$input' '$output'
+  </command>
+
+  <inputs>
+	<param format="txt" name="input" type="data" label="file to cut" />
+		
+    	<param name="complement" type="select" label="Operation">
+	      <option value="">Keep</option>
+	      <option value="--complement">Discard</option>
+	</param>
+
+    	<param name="cutwhat" type="select" label="Cut by">
+	      <option value="-f">fields</option>
+	      <option value="-c">characters</option>
+	</param>
+
+	<param name="list" type="text" size="20" label="List of Fields/Characters/Bytes" help="These will be kept/discarded (depending on 'operation'). &lt;BR /&gt; Examples: 1,3,4 or 2-5" value = "" />
+  </inputs>
+
+  <tests>
+	  <test>
+		  <param name="input" value="unix_cut_input1.txt" />
+		  <output name="output" file="unix_cut_output1.txt" />
+		  <param name="complement" value="Keep" />
+		  <param name="cutwhat" value="fields" />
+		  <param name="list"  value="1,3,4" />
+	  </test>
+	  <test>
+		  <param name="input" value="unix_cut_input1.txt" />
+		  <output name="output" file="unix_cut_output1.txt" />
+		  <param name="complement" value="Discard" />
+		  <param name="cutwhat" value="fields" />
+		  <param name="list"  value="2" />
+	  </test>
+  </tests>
+
+  <outputs>
+    <data format="input" name="output" metadata_source="input"/>
+  </outputs>
+  <help>
+
+**What it does**
+
+This tool runs the **cut** unix command, which extract or delete columns from a file.
+
+-----
+
+Field List Example:
+
+**1,3,7** - Cut specific fields/characters.
+
+**3-**    - Cut from the third field/character to the end of the line.
+
+**2-5**   - Cut from the second to the fifth field/character.
+
+**-8**    - Cut from the first to the eight field/characters.
+
+
+
+
+Input Example::
+
+    fruit	color	price	weight
+    apple	red	1.4	0.5
+    orange	orange	1.5	0.3
+    banana	yellow	0.9	0.3
+
+
+Output Example ( **Keeping fields 1,3,4** )::
+
+    fruit	price	weight
+    apple	1.4	0.5
+    orange	1.5	0.3
+    banana	0.9	0.3
+
+Output Example ( **Discarding field 2** )::
+
+    fruit	price	weight
+    apple	1.4	0.5
+    orange	1.5	0.3
+    banana	0.9	0.3
+
+Output Example ( **Keeping 3 characters** )::
+
+    fru
+    app
+    ora
+    ban
+
+  </help>
+</tool>
Index: galaxy-central/tools/unix_tools/grep_tool.xml
===================================================================
--- galaxy-central/tools/unix_tools/grep_tool.xml (revision 3)
+++ galaxy-central/tools/unix_tools/grep_tool.xml (revision 3)
@@ -0,0 +1,130 @@
+<tool id="cshl_grep_tool" name="grep">
+  <description></description>
+  <command interpreter="sh">grep_wrapper.sh $input $output '$url_paste' $color -A $lines_after -B $lines_before $invert $case_sensitive</command>
+  <inputs>
+    <param format="txt" name="input" type="data" label="Select lines from" />
+
+    <param name="invert" type="select" label="that">
+      <option value="">Match</option>
+      <option value="-v">Don't Match</option>
+    </param>
+
+    <!-- Note: the parameter ane MUST BE 'url_paste' -
+         This is a hack in the galaxy library (see ./lib/galaxy/util/__init__.py line 142)
+	 If the name is 'url_paste' the string won't be sanitized, and all the non-alphanumeric characters 
+	 will be passed to the shell script -->
+    <param name="url_paste" type="text" size="40" label="Regular Expression" help=""> 
+    	<validator type="expression" message="Invalid Program!">value.find('\'')==-1</validator>
+    </param>
+
+    <param name="case_sensitive" type="select"  label="Match type"> 
+      <option value="-i">case insensitive</option>
+      <option value="">case sensitive</option>
+    </param>
+
+    <param name="lines_before" type="integer"  label="Show lines preceding the matched line" help="(same as grep -B, leave it at zero unless you know what you're doing)" value="0" /> 
+    <param name="lines_after" type="integer"  label="Show lines trailing the matched line" help="(same as grep -A, leave it at zero unless you know what you're doing)" value="0" /> 
+
+    <param name="color" type="select"  label="Output"> 
+      <option value="NOCOLOR">text file (for further processing)</option>
+      <option value="COLOR">Highlighted HTML (for easier viewing)</option>
+    </param>
+
+  </inputs>
+  <tests>
+	  <test>
+		  <!-- grep a FASTA file for sequences with specific motif -->
+		  <param name="input" value="unix_grep_input1.txt" />
+		  <output name="output" file="unix_grep_output1.txt" />
+		  <param name="case_sensitive" value="case sensitive" />
+		  <param name="invert" value="" />
+		  <param name="url_paste" value="AA.{2}GT" />
+		  <param name="lines_before" value="1" />
+		  <param name="lines_after" value="0" />
+		  <param name="color" value="NOCOLOR" />
+	  </test>
+	  <test>
+		  <!-- grep a FASTA file for sequences with specific motif -
+		 	show highlighed output -->
+		  <param name="input" value="unix_grep_input1.txt" />
+		  <output name="output" file="unix_grep_output2.html" />
+		  <param name="case_sensitive" value="case sensitive" />
+		  <param name="invert" value="" />
+		  <param name="url_paste" value="AA.{2}GT" />
+		  <param name="lines_before" value="0" />
+		  <param name="lines_after" value="0" />
+		  <param name="color" value="COLOR" />
+	  </test>
+  </tests>
+  <outputs>
+	  <data format="input" name="output" metadata_source="input" >
+		<change_format>
+			<when input="color" value="COLOR" format="HTML" />
+		</change_format>
+ 	  </data>
+  </outputs>
+<help>
+
+**What it does**
+
+This tool runs the unix **grep** command on the selected data file.
+
+.. class:: infomark
+
+**TIP:** This tool uses the **perl** regular expression syntax (same as running 'grep -P'). This is **NOT** the POSIX or POSIX-extended syntax (unlike the awk/sed tools).
+
+
+**Further reading**
+
+- Wikipedia's Regular Expression page (http://en.wikipedia.org/wiki/Regular_expression)
+- Regular Expressions cheat-sheet (PDF) (http://www.addedbytes.com/cheat-sheets/download/regular-expressions-cheat-sheet-v2.pdf)
+- Grep Tutorial (http://www.panix.com/~elflord/unix/grep.html)
+
+-----
+
+**Grep Examples**
+
+- **AGC.AAT** would match lines with AGC followed by any character, followed by AAT (e.g. **AGCQAAT**, **AGCPAAT**, **AGCwAAT**)
+- **C{2,5}AGC** would match lines with 2 to 5 consecutive Cs followed by AGC
+- **TTT.{4,10}AAA** would match lines with 3 Ts, followed by 4 to 10 characters (any characeters), followed by 3 As.
+- **^chr([0-9A-Za-z])+** would match lines that begin with chromsomes, such as lines in a BED format file.
+- **(ACGT){1,5}** would match at least 1 "ACGT" and at most 5 "ACGT" consecutively.
+- **hsa|mmu** would match lines containing "hsa" or "mmu" (or both).
+ 
+-----
+
+**Regular Expression Syntax**
+
+The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text. 
+
+- **( ) { } [ ] . * ? + \ ^ $** are all special characters. **\\** can be used to "escape" a special character, allowing that special character to be searched for.
+- **^** matches the beginning of a string(but not an internal line).
+- **\\d** matches a digit, same as [0-9].
+- **\\D** matches a non-digit.
+- **\\s** matches a whitespace character.
+- **\\S** matches anything BUT a whitespace.
+- **\\t** matches a tab.
+- **\\w** matches an alphanumeric character ( A to Z, 0 to 9 and underscore )
+- **\\W** matches anything but an alphanumeric character.
+- **(** .. **)** groups a particular pattern.
+- **\\Z** matches the end of a string(but not a internal line).
+- **{** n or n, or n,m **}** specifies an expected number of repetitions of the preceding pattern.
+
+  - **{n}** The preceding item is matched exactly n times.
+  - **{n,}** The preceding item ismatched n or more times. 
+  - **{n,m}** The preceding item is matched at least n times but not more than m times. 
+
+- **[** ... **]** creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as **a-z**.
+- **.** Matches any single character except a newline.
+- ***** The preceding item will be matched zero or more times.
+- **?** The preceding item is optional and matched at most once.
+- **+** The preceding item will be matched one or more times.
+- **^** has two meaning:
+  - matches the beginning of a line or string. 
+  - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets.
+- **$** matches the end of a line or string.
+- **\|** Separates alternate possibilities. 
+
+
+</help>
+</tool>
Index: galaxy-central/tools/unix_tools/remove_ending.sh
===================================================================
--- galaxy-central/tools/unix_tools/remove_ending.sh (revision 3)
+++ galaxy-central/tools/unix_tools/remove_ending.sh (revision 3)
@@ -0,0 +1,69 @@
+#!/bin/sh
+
+# Version 0.1 ,  15aug08
+# Written by Assaf Gordon (gordon@cshl.edu)
+#
+
+LINES="$1"
+INFILE="$2"
+OUTFILE="$3"
+
+if [ "$LINES" == "" ]; then
+	cat >&2 <<EOF 
+Remove Ending Lines
+
+Usage: $0 LINES [INFILE] [OUTFILE]
+
+   LINES - number of lines to remove from the end of the file
+   [INFILE] - input file (if not specified - defaults to STDIN)
+   [OUTFILE]- output file (if not specified - defaults to STDOUT)
+
+Input Example:
+
+#Chr	Start	End
+chr1	10	15
+chr1	40	20
+chr1	21	14
+total   3 chromosomes
+
+Removing 1 line (the last line) produces:
+
+#Chr	Start	End
+chr1	10	15
+chr1	20	40
+chr	14	21
+
+Usage Example:
+   
+   \$ $0 1 < my_input_file.txt > my_output_file.txt
+
+EOF
+	
+	exit 1
+fi
+
+#Validate line argument - remove non-digits characters
+LINES=${LINES//[^[:digit:]]/}
+
+#Make sure the line strings isn't empty
+#(after the regex above, they will either contains digits or be empty)
+if [ -z "$LINES" ]; then
+	echo "Error: bad line value (must be numeric)" >&2
+	exit 1
+fi
+
+# Use default (stdin/out) values if infile / outfile not specified
+[ -z "$INFILE" ] && INFILE="/dev/stdin"
+[ -z "$OUTFILE" ] && OUTFILE="/dev/stdout"
+
+#Make sure the input file (if specified) exists.
+if [ ! -r "$INFILE" ]; then
+	echo "Error: input file ($INFILE) not found!" >&2
+	exit 1
+fi
+
+
+# The "gunzip -f" trick allows
+# piping a file (gzip or plain text, real file name or "/dev/stdin") to sed 
+gunzip -f <"$INFILE" | sed -n -e :a -e "1,${LINES}!{P;N;D;};N;ba" > "$OUTFILE"
+
Index: galaxy-central/tools/unix_tools/join_tool.xml
===================================================================
--- galaxy-central/tools/unix_tools/join_tool.xml (revision 3)
+++ galaxy-central/tools/unix_tools/join_tool.xml (revision 3)
@@ -0,0 +1,54 @@
+<tool id="cshl_join_tool" name="join">
+  <description>two files</description>
+  <command interpreter="sh">join_tool.sh "$jointype" "$output_format" 
+  				"$empty_string_filler" "$delimiter"
+				"$ignore_case"
+				"$input1" "$column1"
+				"$input2" "$column2"
+				"$output"
+  </command>
+  
+  <inputs>
+	<param format="txt" name="input1" type="data" label="1st file" />
+	<param name="column1" label="Column to use from 1st file" type="data_column" data_ref="input1" accept_default="true" />
+
+	<param format="txt" name="input2" type="data" label="2nd File" />
+	<param name="column2" label="Column to use from 2nd file" type="data_column" data_ref="input2" accept_default="true" />
+
+	<param name="jointype" type="select" label="Output lines appearing in">
+	      <option value=" ">BOTH 1st &amp; 2nd file.</option>
+	      <option value="-v 1">1st but not in 2nd file. [-v 1]</option>
+	      <option value="-v 2">2nd but not in 1st file. [-v 2]</option>
+	      <option value="-a 1">both 1st &amp; 2nd file, plus unpairable lines from 1st file. [-a 1]</option>
+	      <option value="-a 2">both 1st &amp; 2nd file, plus unpairable lines from 2st file. [-a 2]</option>
+	      <option value="-a 1 -a 2">All Lines [-a 1 -a 2]</option>
+	</param>
+
+	    <param name="delimiter" type="select" label="field-separator [-t]">
+		<option value=",">comma (,)</option>
+		<option value=":">colons (:) </option>
+		<option value=" ">single space</option>
+		<option value=".">dot (.)</option>
+		<option value="-">dash (-)</option>
+		<option value="|">pipe (|)</option>
+		<option value="_">underscore (_)</option>
+		<option selected="True" value="tab">tab</option>
+	    </param>
+
+	<param name="ignore_case" type="select" label="Case sensitivity">
+	      <option value="">Case sensitive</option>
+	      <option value="-i">Case INsensitive [-i]</option>
+	</param>
+
+	<param name="empty_string_filler" type="text" size="20" label="String replacement for empty fields [-e EMPTY]" help="Leave empty unless you know what you're doing. Use this when specifing output format" /> 
+
+	<param name="output_format" type="text" size="30" label="Output line format [-o FORMAT]" help="Leave empty unless you know what you're doing. Example: 1.1,2.1,2.1" /> 
+
+  </inputs>
+  <outputs>
+    <data name="output" format="input" metadata_source="input1" />
+  </outputs>
+  
+<help>
+</help>
+</tool>
Index: galaxy-central/tools/unix_tools/awk_wrapper.sh
===================================================================
--- galaxy-central/tools/unix_tools/awk_wrapper.sh (revision 3)
+++ galaxy-central/tools/unix_tools/awk_wrapper.sh (revision 3)
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+##
+## Galaxy wrapper for AWK command
+##
+
+##
+## command line arguments:
+##   input_file
+##   output_file
+##   awk-program
+##   input-field-separator
+##   output-field-separator
+
+INPUT="$1"
+OUTPUT="$2"
+PROG="$3"
+FS="$4"
+OFS="$5"
+
+shift 5
+
+if [ -z "$OFS" ]; then
+	echo usage: $0 INPUTFILE OUTPUTFILE AWK-PROGRAM FS OFS>&2
+	exit 1
+fi
+
+if [ ! -r "$INPUT" ]; then
+	echo "error: input file ($INPUT) not found!" >&2
+	exit 1
+fi
+
+if [ "$FS" == "tab" ]; then
+	FS="\t"
+fi
+if [ "$OFS" == "tab" ]; then
+	OFS="\t"
+fi
+
+# Messages printed to STDOUT will be displayed in the "INFO" field in the galaxy dataset.
+# This way the user can tell what was the command
+echo "awk" "$PROG"
+
+awk --sandbox -v OFS="$OFS" -v FS="$FS" --re-interval "$PROG" "$INPUT" > "$OUTPUT"
+if (( $? ));  then exit; fi
+
+exit 0
Index: galaxy-central/tools/unix_tools/find_and_replace.xml
===================================================================
--- galaxy-central/tools/unix_tools/find_and_replace.xml (revision 3)
+++ galaxy-central/tools/unix_tools/find_and_replace.xml (revision 3)
@@ -0,0 +1,154 @@
+<tool id="cshl_find_and_replace" name="Find and Replace">
+  <description>text</description>
+  <command interpreter="perl">
+	find_and_replace.pl
+	#if $searchwhere.choice == "column":
+		-c $searchwhere.column
+	#end if
+	-o $output 
+	$caseinsensitive 
+	$wholewords 
+	$skip_first_line
+	$is_regex
+	'$url_paste'
+	'$file_data'
+	'$input'
+  </command>
+  <inputs>
+    <param format="txt" name="input" type="data" label="File to process" />
+
+    <!-- Note: the parameter ane MUST BE 'url_paste' -
+         This is a hack in the galaxy library (see ./lib/galaxy/util/__init__.py line 142)
+	 If the name is 'url_paste' the string won't be sanitized, and all the non-alphanumeric characters 
+	 will be passed to the shell script -->
+	 <param name="url_paste" type="text" size="20" label="Find pattern" help="Use simple text, or a valid regular expression (without backslashes // ) " > 
+    		<validator type="expression" message="Invalid Program!">value.find('\'')==-1</validator>
+	</param>
+
+	 <param name="file_data" type="text" size="20" label="Replace with" help="Use simple text, or &amp; (ampersand) and \\1 \\2 \\3 to refer to matched text. See examples below." >
+    		<validator type="expression" message="Invalid Program!">value.find('\'')==-1</validator>
+	</param>
+
+	<param name="is_regex" type="boolean" checked="false" truevalue="-r" falsevalue="" label="Find-Pattern is a regular expression" 
+		help="see help section for details." />
+
+	<param name="caseinsensitive" type="boolean" checked="false" truevalue="-i" falsevalue="" label="Case-Insensitive search" 
+		help="" />
+
+	<param name="wholewords" type="boolean" checked="false" truevalue="-w" falsevalue="" label="find whole-words" 
+		help="ignore partial matches (e.g. 'apple' will not match 'snapple') " />
+
+	<param name="skip_first_line" type="boolean" checked="false" truevalue="-s" falsevalue="" label="Ignore first line" 
+		help="Select this option if the first line contains column headers. Text in the line will not be replaced. " />
+
+	<conditional name="searchwhere">
+		<param name="choice" type="select" label="Replace text in">
+			<option value="line" selected="true">entire line</option>
+			<option value="column">specific column</option>
+		</param>
+
+		<when value="line">
+		</when>
+
+		<when value="column">
+    			<param name="column" label="in column" type="data_column" data_ref="input" accept_default="true" />
+		</when>
+	</conditional>
+  </inputs>
+
+  <outputs>
+    <data format="input" name="output" metadata_source="input" />
+  </outputs>
+
+<help>
+
+**What it does**
+
+This tool finds &amp; replaces text in an input dataset.
+
+.. class:: infomark
+
+The **pattern to find** can be a simple text string, or a perl **regular expression** string (depending on *pattern is a regex* check-box).
+
+.. class:: infomark
+
+When using regular expressions, the **replace pattern** can contain back-references ( e.g. \\1 )
+
+.. class:: infomark
+
+This tool uses Perl regular expression syntax.
+
+-----
+
+**Examples of *regular-expression* Find Patterns**
+
+- **HELLO**     The word 'HELLO' (case sensitive).
+- **AG.T**      The letters A,G followed by any single character, followed by the letter T.
+- **A{4,}**     Four or more consecutive A's.
+- **chr2[012]\\t**       The words 'chr20' or 'chr21' or 'chr22' followed by a tab character.
+- **hsa-mir-([^ ]+)**        The text 'hsa-mir-' followed by one-or-more non-space characters. When using parenthesis, the matched content of the parenthesis can be accessed with **\1** in the **replace** pattern.
+
+
+**Examples of Replace Patterns**
+
+- **WORLD**  The word 'WORLD' will be placed whereever the find pattern was found.
+- **FOO-&amp;-BAR**  Each time the find pattern is found, it will be surrounded with 'FOO-' at the begining and '-BAR' at the end. **&amp;** (ampersand) represents the matched find pattern.
+- **\\1**   The text which matched the first parenthesis in the Find Pattern.
+
+
+-----
+
+**Example 1**
+
+**Find Pattern:** HELLO
+**Replace Pattern:** WORLD
+**Regular Expression:** no
+**Replace what:** entire line
+
+Every time the word HELLO is found, it will be replaced with the word WORLD. 
+
+-----
+
+**Example 2**
+
+**Find Pattern:** ^chr 
+**Replace Pattern:** (empty)
+**Regular Expression:** yes
+**Replace what:** column 11
+
+If column 11 (of every line) begins with ther letters 'chr', they will be removed. Effectively, it'll turn "chr4" into "4" and "chrXHet" into "XHet"
+
+
+-----
+
+**Perl's Regular Expression Syntax**
+
+The Find &amp; Replace tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text. 
+
+- **( ) { } [ ] . * ? + \\ ^ $** are all special characters. **\\** can be used to "escape" a special character, allowing that special character to be searched for.
+- **^** matches the beginning of a string(but not an internal line).
+- **(** .. **)** groups a particular pattern.
+- **{** n or n, or n,m **}** specifies an expected number of repetitions of the preceding pattern.
+
+  - **{n}** The preceding item is matched exactly n times.
+  - **{n,}** The preceding item ismatched n or more times. 
+  - **{n,m}** The preceding item is matched at least n times but not more than m times. 
+
+- **[** ... **]** creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as **a-z**.
+- **.** Matches any single character except a newline.
+- ***** The preceding item will be matched zero or more times.
+- **?** The preceding item is optional and matched at most once.
+- **+** The preceding item will be matched one or more times.
+- **^** has two meaning:
+  - matches the beginning of a line or string. 
+  - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets.
+- **$** matches the end of a line or string.
+- **\\|** Separates alternate possibilities. 
+- **\\d** matches a single digit
+- **\\w** matches a single letter or digit or an underscore.
+- **\\s** matches a single white-space (space or tabs).
+
+
+</help>
+
+</tool>
Index: galaxy-central/tools/unix_tools/word_list_grep.pl
===================================================================
--- galaxy-central/tools/unix_tools/word_list_grep.pl (revision 3)
+++ galaxy-central/tools/unix_tools/word_list_grep.pl (revision 3)
@@ -0,0 +1,182 @@
+#!/usr/bin/perl
+use strict;
+use warnings;
+use Getopt::Std;
+
+sub parse_command_line();
+sub load_word_list();
+sub compile_regex(@);
+sub usage();
+
+my $word_list_file;
+my $input_file ;
+my $output_file;
+my $find_complete_words ;
+my $find_inverse; 
+my $find_in_specific_column ;
+my $find_case_insensitive ;
+my $skip_first_line ;
+
+
+##
+## Program Start
+##
+usage() if @ARGV==0;
+parse_command_line();
+
+my @words = load_word_list();
+
+my $regex = compile_regex(@words);
+
+# Allow first line to pass without filtering?
+if ( $skip_first_line ) {
+	my $line = <$input_file>;
+	print $output_file $line ;
+}
+
+
+##
+## Main loop
+##
+while ( <$input_file> ) {
+	my $target = $_;
+
+
+	# If searching in a specific column (and not in the entire line)
+	# extract the content of that one column
+	if ( $find_in_specific_column ) {
+		my @columns = split ;
+
+		#not enough columns in this line - skip it
+		next if ( @columns < $find_in_specific_column ) ;
+
+		$target = $columns [ $find_in_specific_column - 1 ] ;
+	}
+
+	# Match ?
+	if ( ($target =~ $regex) ^ ($find_inverse) ) {
+		print $output_file $_ ;
+	}
+}
+
+close $input_file;
+close $output_file;
+
+##
+## Program end
+##
+
+
+sub parse_command_line()
+{
+	my %opts ;
+	getopts('siwvc:o:', \%opts) or die "$0: Invalid option specified\n";
+
+	die "$0: missing word-list file name\n" if (@ARGV==0); 
+
+	$word_list_file = $ARGV[0];
+	die "$0: Word-list file '$word_list_file' not found\n" unless -e $word_list_file ;
+
+	$find_complete_words = ( exists $opts{w} ) ;
+	$find_inverse = ( exists $opts{v} ) ;
+	$find_case_insensitive = ( exists $opts{i} ) ;
+	$skip_first_line = ( exists $opts{s} ) ;
+
+
+	# Search in specific column ?
+	if ( defined $opts{c} ) {
+		$find_in_specific_column = $opts{c};
+
+		die "$0: invalid column number ($find_in_specific_column).\n"
+			unless $find_in_specific_column =~ /^\d+$/ ;
+			
+		die "$0: invalid column number ($find_in_specific_column).\n"
+			if $find_in_specific_column <= 0; 
+	}
+	else {
+		$find_in_specific_column = 0 ;
+	}
+
+
+	# Output File specified (instead of STDOUT) ?
+	if ( defined $opts{o} ) {
+		my $filename = $opts{o};
+		open $output_file, ">$filename" or die "$0: Failed to create output file '$filename': $!\n" ;
+	} else {
+		$output_file = *STDOUT ;
+	}
+
+
+
+	# Input file Specified (instead of STDIN) ?
+	if ( @ARGV>1 ) {
+		my $filename = $ARGV[1];
+		open $input_file, "<$filename" or die "$0: Failed to open input file '$filename': $!\n" ;
+	} else {
+		$input_file = *STDIN;
+	}
+}
+
+sub load_word_list()
+{
+	open WORDLIST, "<$word_list_file" or die "$0: Failed to open word-list file '$word_list_file'\n" ;
+	my @words ;
+	while ( <WORDLIST> ) {
+		chomp ;
+		s/^\s+//;
+		s/\s+$//;
+		next if length==0;
+		push @words,quotemeta $_;
+	}
+	close WORDLIST;
+
+	die "$0: Error: word-list file '$word_list_file' is empty!\n" 
+       		unless @words;
+
+	return @words;	
+}
+
+sub compile_regex(@)
+{
+	my @words = @_;
+
+	my $regex_string = join ( '|', @words ) ;
+	if ( $find_complete_words ) {
+		$regex_string = "\\b($regex_string)\\b"; 
+	}
+	my $regex;
+
+	if ( $find_case_insensitive ) {
+		$regex = qr/$regex_string/i ;
+	} else {
+		$regex = qr/$regex_string/;
+	}
+
+	return $regex;
+}
+
+sub usage()
+{
+print <<EOF;
+
+Word-List Grep
+Copyright (C) 2009 - by A. Gordon ( gordon at cshl dot edu )
+
+Usage: $0 [-o OUTPUT] [-s] [-w] [-i] [-c N] [-v] WORD-LIST-FILE [INPUT-FILE]
+
+   -s   - do not filter first line - always output the first line from the input file.
+   -w   - search for complete words (not partial sub-strings).
+   -i   - case insensitive search.
+   -v   - inverse - output lines NOT matching the word list.
+   -c N - check only column N, instead of entire line (line split by whitespace).
+   -o OUT - specify output file (default = STDOUT).
+   WORD-LIST-FILE - file containing one word per line. These will be used
+          for the search. 
+   INPUT-FILE - (optional) read from file (default = from STDIN).
+
+
+
+EOF
+
+	exit;
+}
Index: galaxy-central/tools/unix_tools/cut_wrapper.sh
===================================================================
--- galaxy-central/tools/unix_tools/cut_wrapper.sh (revision 3)
+++ galaxy-central/tools/unix_tools/cut_wrapper.sh (revision 3)
@@ -0,0 +1,52 @@
+#!/bin/sh
+
+##
+## Galaxy wrapper for cut command.
+##
+
+##
+## command line arguments:
+##   complement flag (might be empty string)
+##   what to cut (fields or characters)
+##   cut list (e.g. 1,2,3,4)
+##   input_file
+##   output_file
+
+COMPLEMENT="$1"
+CUTWHAT="$2"
+CUTLIST="$3"
+INPUT="$4"
+OUTPUT="$5"
+
+if [ -z "$OUTPUT" ]; then
+	echo "This script should be run from inside galaxy!" >&2
+	exit 1
+fi
+
+if [ ! -r "$INPUT" ]; then
+	echo "error: input file ($INPUT) not found!" >&2
+	exit 1
+fi
+
+# Messages printed to STDOUT will be displayed in the "INFO" field in the galaxy dataset.
+# This way the user can tell what was the command
+if [ -z "$COMPLEMENT" ]; then
+	echo -n "Extracting " 
+else
+	echo "Deleting "
+fi
+
+case $CUTWHAT in
+	-f)	echo -n "field(s) "
+		;;
+		
+	-c)	echo -n "character(s) "
+		;;
+esac
+
+echo "$CUTLIST"
+
+
+cut $COMPLEMENT $CUTWHAT $CUTLIST < $INPUT > $OUTPUT
+
+exit 
Index: galaxy-central/tools/unix_tools/join_tool.sh
===================================================================
--- galaxy-central/tools/unix_tools/join_tool.sh (revision 3)
+++ galaxy-central/tools/unix_tools/join_tool.sh (revision 3)
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+#
+# NOTE:
+#  This is a wrapper for GNU's join under galaxy
+#  not ment to be used from command line (if you're using the command line, simply run 'join' directly...)
+#
+# All parameters must be supplied.
+# the join_tool.xml file takes care of that.
+
+JOINTYPE="$1"
+OUTPUT_FORMAT="$2"
+EMPTY_STRING="$3"
+DELIMITER="$4"
+IGNORE_CASE="$5"
+
+INPUT1="$6"
+COLUMN1="$7"
+INPUT2="$8"
+COLUMN2="$9"
+OUTPUT="${10}"
+
+if [ "$OUTPUT" == "" ]; then	
+	echo "This script is part of galaxy. Don't run it manually.\n" >&2
+	exit 1;
+fi
+
+#This a TAB hack for galaxy (which can't transfer a "\t" as a parameter)
+[ "$DELIMITER" == "tab" ] && DELIMITER="	"
+
+#Remove spaces from the output format (if the user entered any)
+OUTPUT_FORMAT=${OUTPUT_FORMAT// /}
+[ "$OUTPUT_FORMAT" != "" ] && OUTPUT_FORMAT="-o $OUTPUT_FORMAT"
+
+echo join $OUTPUT_FORMAT -t "$DELIMITER" -e "$EMPTY_STRING" $IGNORE_CASE $JOINTYPE -1 "$COLUMN1" -2 "$COLUMN2" 
+#echo join $OUTPUT_FORMAT -t "$DELIMITER" -e "$EMPTY_STRING" $IGNORE_CASE $JOINTYPE -1 "$COLUMN1" -2 "$COLUMN2" "$INPUT1" "$INPUT2" \> "$OUTPUT" 
+join $OUTPUT_FORMAT -t "$DELIMITER" -e "$EMPTY_STRING" $JOINTYPE -1 "$COLUMN1" -2 "$COLUMN2" "$INPUT1" "$INPUT2" > "$OUTPUT" || exit 1
Index: galaxy-central/tools/unix_tools/grep_wrapper.sh
===================================================================
--- galaxy-central/tools/unix_tools/grep_wrapper.sh (revision 3)
+++ galaxy-central/tools/unix_tools/grep_wrapper.sh (revision 3)
@@ -0,0 +1,62 @@
+#!/bin/sh
+
+##
+## Galaxy wrapper for GREP command.
+##
+
+##
+## command line arguments:
+##   input_file
+##   output_file
+##   regex
+##   COLOR or NOCOLOR
+##   [other parameters passed on to grep]
+
+INPUT="$1"
+OUTPUT="$2"
+REGEX="$3"
+COLOR="$4"
+
+shift 4
+
+if [ -z "$COLOR" ]; then
+	echo usage: $0 INPUTFILE OUTPUTFILE REGEX COLOR\|NOCOLOR [other grep patameters] >&2
+	exit 1
+fi
+
+if [ ! -r "$INPUT" ]; then
+	echo "error: input file ($INPUT) not found!" >&2
+	exit 1
+fi
+
+# Messages printed to STDOUT will be displayed in the "INFO" field in the galaxy dataset.
+# This way the user can tell what was the command
+echo "grep" "$@" "$REGEX"
+
+if [ "$COLOR" == "COLOR" ]; then
+	#
+	# What the heck is going on here???
+	# 1. "GREP_COLORS" is an environment variable, telling GREP which ANSI colors to use.
+	# 2. "--colors=always" tells grep to actually use colors (according to the GREP_COLORS variable)
+	# 3. first sed command translates the ANSI color to a <FONT> tag with blue color (and a <B> tag, too)
+	# 4. second sed command translates the no-color ANSI command to a </FONT> tag (and a </B> tag, too)
+	# 5. htmlize_pre scripts takes a text input and wraps it in <HTML><BODY><PRE> tags, making it a fixed-font HTML file.
+
+	GREP_COLORS="ms=31" grep --color=always -P "$@" -- "$REGEX" "$INPUT" | \
+		grep -v "^\[36m\[K--\[m\[K$" | \
+		sed -r 's/\[[0123456789;]+m\[K?/<font color="blue"><b>/g' | \
+		sed -r 's/\[m\[K?/<\/b><\/font>/g' | \
+		htmlize_pre.sh > "$OUTPUT"
+
+
+	if (( $? ));  then exit; fi
+
+elif [ "$COLOR" == "NOCOLOR" ]; then
+	grep -P "$@" -- "$REGEX" "$INPUT" | grep -v "^--$" > "$OUTPUT"
+	if (( $? ));  then exit; fi
+else
+	echo Error: third parameter must be "COLOR" or "NOCOLOR" >&2
+	exit 1
+fi
+
+exit 0
Index: galaxy-central/tools/unix_tools/sed_tool.xml
===================================================================
--- galaxy-central/tools/unix_tools/sed_tool.xml (revision 3)
+++ galaxy-central/tools/unix_tools/sed_tool.xml (revision 3)
@@ -0,0 +1,92 @@
+<tool id="cshl_sed_tool" name="sed">
+  <description></description>
+  <!-- NOTE
+  	  'sandbox' is a patched SED program,
+	  which blocks executing shell commands and file reading/writing.
+
+	  Hopefully, it is safe enough to allow users to execute their own SED commands
+	  -->
+  <command interpreter="sh">sed_wrapper.sh $silent $input $output '$url_paste'</command>
+  <inputs>
+    <param format="txt" name="input" type="data" label="File to process" />
+
+    <!-- Note: the parameter ane MUST BE 'url_paste' -
+         This is a hack in the galaxy library (see ./lib/galaxy/util/__init__.py line 142)
+	 If the name is 'url_paste' the string won't be sanitized, and all the non-alphanumeric characters 
+	 will be passed to the shell script -->
+    <param name="url_paste" type="text" area="true" size="5x35" label="SED Program" help=""> 
+    	<validator type="expression" message="Invalid Program!">value.find('\'')==-1</validator>
+    </param>
+
+    <param name="silent" type="select"  label="operation mode" help="(Same as 'sed -n', leave at 'normal' unless you know what you're doing)" > 
+      <option value="">normal</option>
+      <option value="-n">silent</option>
+    </param>
+
+  </inputs>
+  <outputs>
+    <data format="input" name="output" metadata_source="input" />
+  </outputs>
+<help>
+
+**What it does**
+
+This tool runs the unix **sed** command on the selected data file.
+
+.. class:: infomark
+
+**TIP:** This tool uses the **extended regular** expression syntax (same as running 'sed -r').
+
+
+
+**Further reading**
+
+- Short sed tutorial (http://www.linuxhowtos.org/System/sed_tutorial.htm)
+- Long sed tutorial (http://www.grymoire.com/Unix/Sed.html)
+- sed faq with good examples (http://sed.sourceforge.net/sedfaq.html)
+- sed cheat-sheet (http://www.catonmat.net/download/sed.stream.editor.cheat.sheet.pdf)
+- Collection of useful sed one-liners (http://student.northpark.edu/pemente/sed/sed1line.txt)
+
+-----
+
+**Sed commands**
+
+The most useful sed command is **s** (substitute).
+
+**Examples**
+
+- **s/hsa//**  will remove the first instance of 'hsa' in every line.
+- **s/hsa//g**  will remove all instances (beacuse of the **g**) of 'hsa' in every line.
+- **s/A{4,}/--&amp;--/g**  will find sequences of 4 or more consecutive A's, and once found, will surround them with two dashes from each side. The **&amp;** marker is a place holder for 'whatever matched the regular expression'.
+- **s/hsa-mir-([^ ]+)/short name: \\1 full name: &amp;/**  will find strings such as 'hsa-mir-43a' (the regular expression is 'hsa-mir-' followed by non-space characters) and will replace it will string such as 'short name: 43a full name: hsa-mir-43a'.  The **\\1** marker is a place holder for 'whatever matched the first parenthesis' (similar to perl's **$1**) .
+
+
+**sed's Regular Expression Syntax**
+
+The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text. 
+
+- **( ) { } [ ] . * ? + \ ^ $** are all special characters. **\\** can be used to "escape" a special character, allowing that special character to be searched for.
+- **^** matches the beginning of a string(but not an internal line).
+- **(** .. **)** groups a particular pattern.
+- **{** n or n, or n,m **}** specifies an expected number of repetitions of the preceding pattern.
+
+  - **{n}** The preceding item is matched exactly n times.
+  - **{n,}** The preceding item ismatched n or more times. 
+  - **{n,m}** The preceding item is matched at least n times but not more than m times. 
+
+- **[** ... **]** creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as **a-z**.
+- **.** Matches any single character except a newline.
+- ***** The preceding item will be matched zero or more times.
+- **?** The preceding item is optional and matched at most once.
+- **+** The preceding item will be matched one or more times.
+- **^** has two meaning:
+  - matches the beginning of a line or string. 
+  - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets.
+- **$** matches the end of a line or string.
+- **\|** Separates alternate possibilities. 
+
+
+**Note**: SED uses extended regular expression syntax, not Perl syntax. **\\d**, **\\w**, **\\s** etc. are **not** supported.
+
+</help>
+</tool>
Index: galaxy-central/tools/unix_tools/find_and_replace.pl
===================================================================
--- galaxy-central/tools/unix_tools/find_and_replace.pl (revision 3)
+++ galaxy-central/tools/unix_tools/find_and_replace.pl (revision 3)
@@ -0,0 +1,202 @@
+#!/usr/bin/perl
+use strict;
+use warnings;
+use Getopt::Std;
+
+sub parse_command_line();
+sub build_regex_string();
+sub usage();
+
+my $input_file ;
+my $output_file;
+my $find_pattern ;
+my $replace_pattern ;
+my $find_complete_words ;
+my $find_pattern_is_regex ;
+my $find_in_specific_column ;
+my $find_case_insensitive ;
+my $replace_global ;
+my $skip_first_line ;
+
+
+##
+## Program Start
+##
+usage() if @ARGV<2;
+parse_command_line();
+my $regex_string = build_regex_string() ;
+
+# Allow first line to pass without filtering?
+if ( $skip_first_line ) {
+	my $line = <$input_file>;
+	print $output_file $line ;
+}
+
+
+##
+## Main loop
+##
+
+## I LOVE PERL (and hate it, at the same time...)
+##
+## So what's going on with the self-compiling perl code?
+##
+## 1. The program gets the find-pattern and the replace-pattern from the user (as strings).
+## 2. If both the find-pattern and replace-pattern are simple strings (not regex), 
+##    it would be possible to pre-compile a regex (with qr//) and use it in a 's///'
+## 3. If the find-pattern is a regex but the replace-pattern is a simple text string (with out back-references)
+##    it is still possible to pre-compile the regex and use it in a 's///'
+## However,
+## 4. If the replace-pattern contains back-references, pre-compiling is not possible.
+##    (in perl, you can't precompile a substitute regex).
+##    See these examples:
+##    http://www.perlmonks.org/?node_id=84420
+##    http://stackoverflow.com/questions/125171/passing-a-regex-substitution-as-a-variable-in-perl
+##
+##    The solution:
+##    we build the regex string as valid perl code (in 'build_regex()', stored in $regex_string ),
+##    Then eval() a new perl code that contains the substitution regex as inlined code.
+##    Gotta love perl!
+
+my $perl_program ;
+if ( $find_in_specific_column ) {
+	# Find & replace in specific column
+
+	$perl_program = <<EOF;
+	while ( <STDIN> ) {
+		chomp ;
+		my \@columns = split ;
+
+		#not enough columns in this line - skip it
+		next if ( \@columns < $find_in_specific_column ) ;
+
+		\$columns [ $find_in_specific_column - 1 ] =~ $regex_string ;
+
+		print STDOUT join("\t", \@columns), "\n" ;
+	}
+EOF
+
+} else {
+	# Find & replace the entire line
+	$perl_program = <<EOF;
+		while ( <STDIN> ) {
+			$regex_string ;
+			print STDOUT;
+		}
+EOF
+}
+
+
+# The dynamic perl code reads from STDIN and writes to STDOUT,
+# so connect these handles (if the user didn't specifiy input / output
+# file names, these might be already be STDIN/OUT, so the whole could be a no-op).
+*STDIN = $input_file ;
+*STDOUT = $output_file ;
+eval $perl_program ;
+
+
+##
+## Program end
+##
+
+
+sub parse_command_line()
+{
+	my %opts ;
+	getopts('grsiwc:o:', \%opts) or die "$0: Invalid option specified\n";
+
+	die "$0: missing Find-Pattern argument\n" if (@ARGV==0); 
+	$find_pattern = $ARGV[0];
+	die "$0: missing Replace-Pattern argument\n" if (@ARGV==1); 
+	$replace_pattern = $ARGV[1];
+
+	$find_complete_words = ( exists $opts{w} ) ;
+	$find_case_insensitive = ( exists $opts{i} ) ;
+	$skip_first_line = ( exists $opts{s} ) ;
+	$find_pattern_is_regex = ( exists $opts{r} ) ;
+	$replace_global = ( exists $opts{g} ) ;
+
+	# Search in specific column ?
+	if ( defined $opts{c} ) {
+		$find_in_specific_column = $opts{c};
+
+		die "$0: invalid column number ($find_in_specific_column).\n"
+			unless $find_in_specific_column =~ /^\d+$/ ;
+			
+		die "$0: invalid column number ($find_in_specific_column).\n"
+			if $find_in_specific_column <= 0; 
+	}
+	else {
+		$find_in_specific_column = 0 ;
+	}
+
+	# Output File specified (instead of STDOUT) ?
+	if ( defined $opts{o} ) {
+		my $filename = $opts{o};
+		open $output_file, ">$filename" or die "$0: Failed to create output file '$filename': $!\n" ;
+	} else {
+		$output_file = *STDOUT ;
+	}
+
+
+	# Input file Specified (instead of STDIN) ?
+	if ( @ARGV>2 ) {
+		my $filename = $ARGV[2];
+		open $input_file, "<$filename" or die "$0: Failed to open input file '$filename': $!\n" ;
+	} else {
+		$input_file = *STDIN;
+	}
+}
+
+sub build_regex_string()
+{
+	my $find_string ;
+	my $replace_string ;
+
+	if ( $find_pattern_is_regex ) {
+		$find_string = $find_pattern ;
+		$replace_string = $replace_pattern ;
+	} else {
+		$find_string = quotemeta $find_pattern ;
+		$replace_string = quotemeta $replace_pattern;
+	}
+
+	if ( $find_complete_words ) {
+		$find_string = "\\b($find_string)\\b"; 
+	}
+
+	my $regex_string = "s/$find_string/$replace_string/";
+
+	$regex_string .= "i" if ( $find_case_insensitive );
+	$regex_string .= "g" if ( $replace_global ) ;
+	
+
+	return $regex_string;
+}
+
+sub usage()
+{
+print <<EOF;
+
+Find and Replace
+Copyright (C) 2009 - by A. Gordon ( gordon at cshl dot edu )
+
+Usage: $0 [-o OUTPUT] [-g] [-r] [-w] [-i] [-c N] [-l] FIND-PATTERN REPLACE-PATTERN [INPUT-FILE]
+
+   -g   - Global replace - replace all occurences in line/column. 
+          Default - replace just the first instance.
+   -w   - search for complete words (not partial sub-strings).
+   -i   - case insensitive search.
+   -c N - check only column N, instead of entire line (line split by whitespace).
+   -l   - skip first line (don't replace anything in it)
+   -r   - FIND-PATTERN and REPLACE-PATTERN are perl regular expression,
+          usable inside a 's///' statement.
+          By default, they are used as verbatim text strings.
+   -o OUT - specify output file (default = STDOUT).
+   INPUT-FILE - (optional) read from file (default = from STDIN).
+
+
+EOF
+
+	exit;
+}
Index: galaxy-central/tools/unix_tools/uniq_tool.xml
===================================================================
--- galaxy-central/tools/unix_tools/uniq_tool.xml (revision 3)
+++ galaxy-central/tools/unix_tools/uniq_tool.xml (revision 3)
@@ -0,0 +1,25 @@
+<tool id="cshl_uniq_tool" name="uniq">
+  <command>
+  	uniq -f $skipfields $count $repeated $ignorecase $uniqueonly $input $output
+  </command>
+
+  <inputs>
+	<param format="txt" name="input" type="data" label="file to scan for unique values" />
+		
+	<param name="count" type="boolean" label="count [-c]" help="prefix lines by the number of occurrences" truevalue="-c" falsevalue="" />
+
+	<param name="repeated" type="boolean" label="repeated [-d]" help="only print duplicate lines" truevalue="-d" falsevalue="" />
+
+	<param name="ignorecase" type="boolean" label="ignore case [-i]" help="ignore differences in case when comparing" truevalue="-i" falsevalue="" />
+
+	<param name="uniqueonly" type="boolean" label="unique only [-u]" help="only print unique lines" truevalue="-u" falsevalue="" />
+
+	<param name="skipfields" type="integer" label="skip fields [-f]" help="avoind comparing the first N fields. (use zero to start from the first field)" size="2" value="0" />
+  </inputs>
+
+  <outputs>
+    <data format="input" name="output" metadata_source="input"/>
+  </outputs>
+  <help>
+  </help>
+</tool>
Index: galaxy-central/tools/unix_tools/awk_tool.xml
===================================================================
--- galaxy-central/tools/unix_tools/awk_tool.xml (revision 3)
+++ galaxy-central/tools/unix_tools/awk_tool.xml (revision 3)
@@ -0,0 +1,138 @@
+<tool id="cshl_awk_tool" name="awk">
+  <description></description>
+  <command interpreter="sh">awk_wrapper.sh $input $output '$file_data' '$FS' '$OFS'</command>
+  <inputs>
+    <param format="txt" name="input" type="data" label="File to process" />
+
+    <param name="FS" type="select" label="Input field-separator">
+	<option value=",">comma (,)</option>
+	<option value=":">colons (:) </option>
+	<option value=" ">single space</option>
+	<option value=".">dot (.)</option>
+	<option value="-">dash (-)</option>
+	<option value="|">pipe (|)</option>
+	<option value="_">underscore (_)</option>
+	<option selected="True" value="tab">tab</option>
+    </param>
+
+    <param name="OFS" type="select" label="Output field-separator">
+	<option value=",">comma (,)</option>
+	<option value=":">colons (:)</option>
+	<option value=" ">space ( )</option>
+	<option value="-">dash (-)</option>
+	<option value=".">dot (.)</option>
+	<option value="|">pipe (|)</option>
+	<option value="_">underscore (_)</option>
+	<option selected="True" value="tab">tab</option>
+    </param>
+
+
+    <!-- Note: the parameter ane MUST BE 'url_paste' -
+         This is a hack in the galaxy library (see ./lib/galaxy/util/__init__.py line 142)
+	 If the name is 'url_paste' the string won't be sanitized, and all the non-alphanumeric characters 
+	 will be passed to the shell script -->
+    <param name="file_data" type="text" area="true" size="5x35" label="AWK Program" help=""> 
+    	<validator type="expression" message="Invalid Program!">value.find('\'')==-1</validator>
+    </param>
+
+  </inputs>
+  <tests>
+	  <test>
+		  <param name="input" value="unix_awk_input1.txt" />
+		  <output name="output" file="unix_awk_output1.txt" />
+		  <param name="FS" value="tab" />
+		  <param name="OFS" value="tab" />
+		  <param name="file_data"  value="$2>0.5 { print $2*9, $1 }" />
+	  </test>
+  </tests>
+  <outputs>
+    <data format="input" name="output" metadata_source="input" />
+  </outputs>
+<help>
+
+**What it does**
+
+This tool runs the unix **awk** command on the selected data file.
+
+.. class:: infomark
+
+**TIP:** This tool uses the **extended regular** expression syntax (not the perl syntax).
+
+
+**Further reading**
+
+- Awk by Example (http://www.ibm.com/developerworks/linux/library/l-awk1.html)
+- Long AWK tutorial (http://www.grymoire.com/Unix/Awk.html)
+- Learn AWK in 1 hour (http://www.selectorweb.com/awk.html)
+- awk cheat-sheet (http://cbi.med.harvard.edu/people/peshkin/sb302/awk_cheatsheets.pdf)
+- Collection of useful awk one-liners (http://student.northpark.edu/pemente/awk/awk1line.txt)
+
+-----
+
+**AWK programs**
+
+Most AWK programs consist of **patterns** (i.e. rules that match lines of text) and **actions** (i.e. commands to execute when a pattern matches a line).
+
+The basic form of AWK program is::
+
+    pattern { action 1; action 2; action 3; }
+
+
+
+
+
+**Pattern Examples**
+
+- **$2 == "chr3"**  will match lines whose second column is the string 'chr3'
+- **$5-$4>23**  will match lines that after subtracting the value of the fourth column from the value of the fifth column, gives value alrger than 23.
+- **/AG..AG/** will match lines that contain the regular expression **AG..AG** (meaning the characeters AG followed by any two characeters followed by AG). (This is the way to specify regular expressions on the entire line, similar to GREP.)
+- **$7 ~ /A{4}U/**  will match lines whose seventh column contains 4 consecutive A's followed by a U. (This is the way to specify regular expressions on a specific field.)
+- **10000 &lt; $4 &amp;&amp; $4 &lt; 20000** will match lines whose fourth column value is larger than 10,000 but smaller than 20,000
+- If no pattern is specified, all lines match (meaning the **action** part will be executed on all lines).
+
+
+
+**Action Examples**
+
+- **{ print }** or **{ print $0 }**   will print the entire input line (the line that matched in **pattern**). **$0** is a special marker meaning 'the entire line'.
+- **{ print $1, $4, $5 }** will print only the first, fourth and fifth fields of the input line.
+- **{ print $4, $5-$4 }** will print the fourth column and the difference between the fifth and fourth column. (If the fourth column was start-position in the input file, and the fifth column was end-position - the output file will contain the start-position, and the length).
+- If no action part is specified (not even the curly brackets) - the default action is to print the entire line.
+
+
+
+
+
+
+
+
+
+**AWK's Regular Expression Syntax**
+
+The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text. 
+
+- **( ) { } [ ] . * ? + \ ^ $** are all special characters. **\\** can be used to "escape" a special character, allowing that special character to be searched for.
+- **^** matches the beginning of a string(but not an internal line).
+- **(** .. **)** groups a particular pattern.
+- **{** n or n, or n,m **}** specifies an expected number of repetitions of the preceding pattern.
+
+  - **{n}** The preceding item is matched exactly n times.
+  - **{n,}** The preceding item ismatched n or more times. 
+  - **{n,m}** The preceding item is matched at least n times but not more than m times. 
+
+- **[** ... **]** creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as **a-z**.
+- **.** Matches any single character except a newline.
+- ***** The preceding item will be matched zero or more times.
+- **?** The preceding item is optional and matched at most once.
+- **+** The preceding item will be matched one or more times.
+- **^** has two meaning:
+  - matches the beginning of a line or string. 
+  - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets.
+- **$** matches the end of a line or string.
+- **\|** Separates alternate possibilities. 
+
+
+**Note**: AWK uses extended regular expression syntax, not Perl syntax. **\\d**, **\\w**, **\\s** etc. are **not** supported.
+
+</help>
+</tool>
