1 | <tool id="Grouping1" name="Group" version="1.9.1"> |
---|
2 | <description>data by a column and perform aggregate operation on other columns.</description> |
---|
3 | <command interpreter="python"> |
---|
4 | grouping.py |
---|
5 | $out_file1 |
---|
6 | $input1 |
---|
7 | $groupcol |
---|
8 | $ignorecase |
---|
9 | #for $op in $operations |
---|
10 | '${op.optype} |
---|
11 | ${op.opcol} |
---|
12 | ${op.opround}' |
---|
13 | #end for |
---|
14 | </command> |
---|
15 | <inputs> |
---|
16 | <param format="tabular" name="input1" type="data" label="Select data" help="Query missing? See TIP below."/> |
---|
17 | <param name="groupcol" label="Group by column" type="data_column" data_ref="input1" /> |
---|
18 | <param name="ignorecase" type="boolean" truevalue="1" falsevalue="0"> |
---|
19 | <label>Ignore case while grouping?</label> |
---|
20 | </param> |
---|
21 | <repeat name="operations" title="Operation"> |
---|
22 | <param name="optype" type="select" label="Type"> |
---|
23 | <option value="mean">Mean</option> |
---|
24 | <option value="median">Median</option> |
---|
25 | <option value="Mode">Mode</option> |
---|
26 | <option value="max">Maximum</option> |
---|
27 | <option value="min">Minimum</option> |
---|
28 | <option value="sum">Sum</option> |
---|
29 | <option value="length">Count</option> |
---|
30 | <option value="unique">Count Distinct</option> |
---|
31 | <option value="c">Concatenate</option> |
---|
32 | <option value="cuniq">Concatenate Distinct</option> |
---|
33 | <option value="random">Randomly pick</option> |
---|
34 | </param> |
---|
35 | <param name="opcol" label="On column" type="data_column" data_ref="input1" /> |
---|
36 | <param name="opround" type="select" label="Round result to nearest integer?"> |
---|
37 | <option value="no">NO</option> |
---|
38 | <option value="yes">YES</option> |
---|
39 | </param> |
---|
40 | </repeat> |
---|
41 | </inputs> |
---|
42 | <outputs> |
---|
43 | <data format="tabular" name="out_file1" /> |
---|
44 | </outputs> |
---|
45 | <requirements> |
---|
46 | <requirement type="python-module">rpy</requirement> |
---|
47 | </requirements> |
---|
48 | <tests> |
---|
49 | <!-- Test valid data --> |
---|
50 | <test> |
---|
51 | <param name="input1" value="1.bed"/> |
---|
52 | <param name="groupcol" value="1"/> |
---|
53 | <param name="ignorecase" value="true"/> |
---|
54 | <param name="optype" value="mean"/> |
---|
55 | <param name="opcol" value="2"/> |
---|
56 | <param name="opround" value="no"/> |
---|
57 | <output name="out_file1" file="groupby_out1.dat"/> |
---|
58 | </test> |
---|
59 | |
---|
60 | <!-- Test data with an invalid value in a column --> |
---|
61 | <test> |
---|
62 | <param name="input1" value="1.tabular"/> |
---|
63 | <param name="groupcol" value="1"/> |
---|
64 | <param name="ignorecase" value="true"/> |
---|
65 | <param name="optype" value="mean"/> |
---|
66 | <param name="opcol" value="2"/> |
---|
67 | <param name="opround" value="no"/> |
---|
68 | <output name="out_file1" file="groupby_out2.dat"/> |
---|
69 | </test> |
---|
70 | </tests> |
---|
71 | <help> |
---|
72 | |
---|
73 | .. class:: infomark |
---|
74 | |
---|
75 | **TIP:** If your data is not TAB delimited, use *Text Manipulation->Convert* |
---|
76 | |
---|
77 | ----- |
---|
78 | |
---|
79 | **Syntax** |
---|
80 | |
---|
81 | This tool allows you to group the input dataset by a particular column and perform aggregate functions like Mean, Median, Mode, Sum, Max, Min, Count, Random draw and Concatenate on other columns. |
---|
82 | |
---|
83 | - All invalid, blank and comment lines are skipped when performing the aggregate functions. The number of skipped lines is displayed in the resulting history item. |
---|
84 | |
---|
85 | - If multiple modes are present, all are reported. |
---|
86 | |
---|
87 | ----- |
---|
88 | |
---|
89 | **Example** |
---|
90 | |
---|
91 | - For the following input:: |
---|
92 | |
---|
93 | chr22 1000 1003 TTT |
---|
94 | chr22 2000 2003 aaa |
---|
95 | chr10 2200 2203 TTT |
---|
96 | chr10 1200 1203 ttt |
---|
97 | chr22 1600 1603 AAA |
---|
98 | |
---|
99 | - **Grouping on column 4** while ignoring case, and performing operation **Count on column 1** will return:: |
---|
100 | |
---|
101 | AAA 2 |
---|
102 | TTT 3 |
---|
103 | |
---|
104 | - **Grouping on column 4** while not ignoring case, and performing operation **Count on column 1** will return:: |
---|
105 | |
---|
106 | aaa 1 |
---|
107 | AAA 1 |
---|
108 | ttt 1 |
---|
109 | TTT 2 |
---|
110 | </help> |
---|
111 | </tool> |
---|