[2] | 1 | <tool id="BestSubsetsRegression1" name="Perform Best-subsets Regression">
|
---|
| 2 | <description> </description>
|
---|
| 3 | <command interpreter="python">
|
---|
| 4 | best_regression_subsets.py
|
---|
| 5 | $input1
|
---|
| 6 | $response_col
|
---|
| 7 | $predictor_cols
|
---|
| 8 | $out_file1
|
---|
| 9 | $out_file2
|
---|
| 10 | 1>/dev/null |
---|
| 11 | 2>/dev/null
|
---|
| 12 | </command>
|
---|
| 13 | <inputs>
|
---|
| 14 | <param format="tabular" name="input1" type="data" label="Select data" help="Query missing? See TIP below."/>
|
---|
| 15 | <param name="response_col" label="Response column (Y)" type="data_column" data_ref="input1" />
|
---|
| 16 | <param name="predictor_cols" label="Predictor columns (X)" type="data_column" data_ref="input1" multiple="true" > |
---|
| 17 | <validator type="no_options" message="Please select at least one column."/> |
---|
| 18 | </param>
|
---|
| 19 | </inputs>
|
---|
| 20 | <outputs>
|
---|
| 21 | <data format="input" name="out_file1" metadata_source="input1" />
|
---|
| 22 | <data format="pdf" name="out_file2" />
|
---|
| 23 | </outputs> |
---|
| 24 | <requirements> |
---|
| 25 | <requirement type="python-module">rpy</requirement> |
---|
| 26 | </requirements>
|
---|
| 27 | <tests> |
---|
| 28 | <!-- Testing this tool will not be possible because this tool produces a pdf output file. |
---|
| 29 | --> |
---|
| 30 | </tests>
|
---|
| 31 | <help>
|
---|
| 32 |
|
---|
| 33 | .. class:: infomark
|
---|
| 34 |
|
---|
| 35 | **TIP:** If your data is not TAB delimited, use *Edit Queries->Convert characters*
|
---|
| 36 |
|
---|
| 37 | -----
|
---|
| 38 |
|
---|
| 39 | .. class:: infomark
|
---|
| 40 |
|
---|
| 41 | **What it does**
|
---|
| 42 |
|
---|
| 43 | This tool uses the 'regsubsets' function from R statistical package for regression subset selection. It outputs two files, one containing a table with the best subsets and the corresponding summary statistics, and the other containing the graphical representation of the results.
|
---|
| 44 |
|
---|
| 45 | -----
|
---|
| 46 |
|
---|
| 47 | .. class:: warningmark
|
---|
| 48 |
|
---|
| 49 | **Note**
|
---|
| 50 |
|
---|
| 51 | - This tool currently treats all predictor and response variables as continuous variables.
|
---|
| 52 |
|
---|
| 53 | - Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.
|
---|
| 54 |
|
---|
| 55 | - The 6 columns in the output are described below:
|
---|
| 56 |
|
---|
| 57 | - Column 1 (Vars): denotes the number of variables in the model
|
---|
| 58 | - Column 2 ([c2 c3 c4...]): represents a list of the user-selected predictor variables (full model). An asterix denotes the presence of the corresponding predictor variable in the selected model.
|
---|
| 59 | - Column 3 (R-sq): the fraction of variance explained by the model
|
---|
| 60 | - Column 4 (Adj. R-sq): the above R-squared statistic adjusted, penalizing for higher number of predictors (p)
|
---|
| 61 | - Column 5 (Cp): Mallow's Cp statistics
|
---|
| 62 | - Column 6 (bic): Bayesian Information Criterion.
|
---|
| 63 |
|
---|
| 64 |
|
---|
| 65 | </help>
|
---|
| 66 | </tool>
|
---|