LASSO-Patternsearch algorithm
lps_tool_wrapper.sh $lambda_fac $input_file $label_column $output_file $log_file
Initialization 0
#if $advanced.options == "true":
Sample $advanced.sample
Verbosity $advanced.verbosity
Standardize $advanced.standardize
initialLambda $advanced.initialLambda
#if $advanced.continuation.continuation == "1":
Continuation $advanced.continuation.continuation
continuationSteps $advanced.continuation.continuationSteps
accurateIntermediates $advanced.continuation.accurateIntermediates
#end if
printFreq $advanced.printFreq
#if $advanced.newton.newton == "1":
Newton $advanced.newton.newton
NewtonThreshold $advanced.newton.newtonThreshold
#end if
HessianSampleFraction $advanced.hessianSampleFraction
BB 0
Monotone 0
FullGradient $advanced.fullGradient
GradientFraction $advanced.gradientFraction
InitialAlpha $advanced.initialAlpha
AlphaIncrease $advanced.alphaIncrease
AlphaDecrease $advanced.alphaDecrease
AlphaMax $advanced.alphaMax
c1 $advanced.c1
MaxIter $advanced.maxIter
StopTol $advanced.stopTol
IntermediateTol $advanced.intermediateTol
FinalOnly $advanced.finalOnly
#end if
lps_tool
**Dataset formats**
The input and output datasets are tabular_. The columns are described below.
There is a second output dataset (a log) that is in text_ format.
(`Dataset missing?`_)
.. _tabular: ./static/formatHelp.html#tab
.. _text: ./static/formatHelp.html#text
.. _Dataset missing?: ./static/formatHelp.html
-----
**What it does**
The LASSO-Patternsearch algorithm fits your dataset to an L1-regularized
logistic regression model. A benefit of using L1-regularization is
that it typically yields a weight vector with relatively few non-zero
coefficients.
For example, say you have a dataset containing M rows (subjects)
and N columns (attributes) where one of these N attributes is binary,
indicating whether or not the subject has some property of interest P.
In simple terms, LPS calculates a weight for each of the other attributes
in your dataset. This weight indicates how "relevant" that attribute
is for predicting whether or not a given subject has property P.
The L1-regularization causes most of these weights to be equal to zero,
which means LPS will find a "small" subset of the remaining N-1 attributes
in your dataset that can be used to predict P.
In other words, LPS can be used for feature selection.
The input dataset is tabular, and must contain a label column which
indicates whether or not a given row has property P. In the current
version of this tool, P must be encoded using +1 and -1. The Lambda_fac
parameter ranges from 0 to 1, and controls how sparse the weight
vector will be. At the low end, when Lambda_fac = 0, there will be
no regularization. At the high end, when Lambda_fac = 1, there will be
"too much" regularization, and all of the weights will equal zero.
The LPS tool creates two output datasets. The first, called the results
file, is a tabular dataset containing one column of weights for each
value of the regularization parameter lambda that was tried. The weight
columns are in order from left to right by decreasing values of lambda.
The first N-1 rows in each column are the weights for the N-1 attributes
in your input dataset. The final row is a constant, the intercept.
Let **x** be a row from your input dataset and let **b** be a column
from the results file. To compute the probability that row **x** has
a label value of +1:
Probability(row **x** has label value = +1) = 1 / [1 + exp{**x** \* **b**\[1..N-1\] + **b**\[N\]}]
where **x** \* **b**\[1..N-1\] represents matrix multiplication.
The second output dataset, called the log file, is a text file which
contains additional data about the fitted L1-regularized logistic
regression model. These data include the number of features, the
computed value of lambda_max, the actual values of lambda used, the
optimal values of the log-likelihood and regularized log-likelihood
functions, the number of non-zeros, and the number of iterations.
Website: http://pages.cs.wisc.edu/~swright/LPS/
-----
**Example**
- input file::
+1 1 0 0 0 0 1 0 1 1 ...
+1 1 1 1 0 0 1 0 1 1 ...
+1 1 0 1 0 1 0 1 0 1 ...
etc.
- output results file::
0
0
0
0
0.025541
etc.
- output log file::
Data set has 100 vectors with 50 features.
calculateLambdaMax: n=50, m=100, m+=50, m-=50
computed value of lambda_max: 5.0000e-01
lambda=2.96e-02 solution:
optimal log-likelihood function value: 6.46e-01
optimal *regularized* log-likelihood function value: 6.79e-01
number of nonzeros at the optimum: 5
number of iterations required: 43
etc.
-----
**References**
Koh K, Kim S-J, Boyd S. (2007)
An interior-point method for large-scale l1-regularized logistic regression.
Journal of Machine Learning Research. 8:1519-1555.
Shi W, Wahba G, Wright S, Lee K, Klein R, Klein B. (2008)
LASSO-Patternsearch algorithm with application to ophthalmology and genomic data.
Stat Interface. 1(1):137-153.