[2] | 1 | <tool id="Grouping1" name="Group" version="1.9.1"> |
---|
| 2 | <description>data by a column and perform aggregate operation on other columns.</description> |
---|
| 3 | <command interpreter="python"> |
---|
| 4 | grouping.py |
---|
| 5 | $out_file1 |
---|
| 6 | $input1 |
---|
| 7 | $groupcol |
---|
| 8 | $ignorecase |
---|
| 9 | #for $op in $operations |
---|
| 10 | '${op.optype} |
---|
| 11 | ${op.opcol} |
---|
| 12 | ${op.opround}' |
---|
| 13 | #end for |
---|
| 14 | </command> |
---|
| 15 | <inputs> |
---|
| 16 | <param format="tabular" name="input1" type="data" label="Select data" help="Query missing? See TIP below."/> |
---|
| 17 | <param name="groupcol" label="Group by column" type="data_column" data_ref="input1" /> |
---|
| 18 | <param name="ignorecase" type="boolean" truevalue="1" falsevalue="0"> |
---|
| 19 | <label>Ignore case while grouping?</label> |
---|
| 20 | </param> |
---|
| 21 | <repeat name="operations" title="Operation"> |
---|
| 22 | <param name="optype" type="select" label="Type"> |
---|
| 23 | <option value="mean">Mean</option> |
---|
| 24 | <option value="median">Median</option> |
---|
| 25 | <option value="Mode">Mode</option> |
---|
| 26 | <option value="max">Maximum</option> |
---|
| 27 | <option value="min">Minimum</option> |
---|
| 28 | <option value="sum">Sum</option> |
---|
| 29 | <option value="length">Count</option> |
---|
| 30 | <option value="unique">Count Distinct</option> |
---|
| 31 | <option value="c">Concatenate</option> |
---|
| 32 | <option value="cuniq">Concatenate Distinct</option> |
---|
| 33 | <option value="random">Randomly pick</option> |
---|
| 34 | </param> |
---|
| 35 | <param name="opcol" label="On column" type="data_column" data_ref="input1" /> |
---|
| 36 | <param name="opround" type="select" label="Round result to nearest integer?"> |
---|
| 37 | <option value="no">NO</option> |
---|
| 38 | <option value="yes">YES</option> |
---|
| 39 | </param> |
---|
| 40 | </repeat> |
---|
| 41 | </inputs> |
---|
| 42 | <outputs> |
---|
| 43 | <data format="tabular" name="out_file1" /> |
---|
| 44 | </outputs> |
---|
| 45 | <requirements> |
---|
| 46 | <requirement type="python-module">rpy</requirement> |
---|
| 47 | </requirements> |
---|
| 48 | <tests> |
---|
| 49 | <!-- Test valid data --> |
---|
| 50 | <test> |
---|
| 51 | <param name="input1" value="1.bed"/> |
---|
| 52 | <param name="groupcol" value="1"/> |
---|
| 53 | <param name="ignorecase" value="true"/> |
---|
| 54 | <param name="optype" value="mean"/> |
---|
| 55 | <param name="opcol" value="2"/> |
---|
| 56 | <param name="opround" value="no"/> |
---|
| 57 | <output name="out_file1" file="groupby_out1.dat"/> |
---|
| 58 | </test> |
---|
| 59 | |
---|
| 60 | <!-- Test data with an invalid value in a column --> |
---|
| 61 | <test> |
---|
| 62 | <param name="input1" value="1.tabular"/> |
---|
| 63 | <param name="groupcol" value="1"/> |
---|
| 64 | <param name="ignorecase" value="true"/> |
---|
| 65 | <param name="optype" value="mean"/> |
---|
| 66 | <param name="opcol" value="2"/> |
---|
| 67 | <param name="opround" value="no"/> |
---|
| 68 | <output name="out_file1" file="groupby_out2.dat"/> |
---|
| 69 | </test> |
---|
| 70 | </tests> |
---|
| 71 | <help> |
---|
| 72 | |
---|
| 73 | .. class:: infomark |
---|
| 74 | |
---|
| 75 | **TIP:** If your data is not TAB delimited, use *Text Manipulation->Convert* |
---|
| 76 | |
---|
| 77 | ----- |
---|
| 78 | |
---|
| 79 | **Syntax** |
---|
| 80 | |
---|
| 81 | This tool allows you to group the input dataset by a particular column and perform aggregate functions like Mean, Median, Mode, Sum, Max, Min, Count, Random draw and Concatenate on other columns. |
---|
| 82 | |
---|
| 83 | - All invalid, blank and comment lines are skipped when performing the aggregate functions. The number of skipped lines is displayed in the resulting history item. |
---|
| 84 | |
---|
| 85 | - If multiple modes are present, all are reported. |
---|
| 86 | |
---|
| 87 | ----- |
---|
| 88 | |
---|
| 89 | **Example** |
---|
| 90 | |
---|
| 91 | - For the following input:: |
---|
| 92 | |
---|
| 93 | chr22 1000 1003 TTT |
---|
| 94 | chr22 2000 2003 aaa |
---|
| 95 | chr10 2200 2203 TTT |
---|
| 96 | chr10 1200 1203 ttt |
---|
| 97 | chr22 1600 1603 AAA |
---|
| 98 | |
---|
| 99 | - **Grouping on column 4** while ignoring case, and performing operation **Count on column 1** will return:: |
---|
| 100 | |
---|
| 101 | AAA 2 |
---|
| 102 | TTT 3 |
---|
| 103 | |
---|
| 104 | - **Grouping on column 4** while not ignoring case, and performing operation **Count on column 1** will return:: |
---|
| 105 | |
---|
| 106 | aaa 1 |
---|
| 107 | AAA 1 |
---|
| 108 | ttt 1 |
---|
| 109 | TTT 2 |
---|
| 110 | </help> |
---|
| 111 | </tool> |
---|