Regularized Networks with Convex and Non-convex Penalties
Source:R/network.regularization.R
network.regularization.RdA general function to estimate Gaussian graphical models using regularization penalties. All non-convex penalties are implemented using the Local Linear Approximation (LLA: Fan & Li, 2001; Zou & Li, 2008)
Usage
network.regularization(
data,
n = NULL,
corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"),
na.data = c("pairwise", "listwise"),
penalty = c("atan", "bridge", "cauchy", "exp", "gumbel", "l1", "l2", "mcp", "scad",
"weibull"),
gamma = NULL,
lambda = NULL,
adaptive.gamma = FALSE,
nlambda = 50,
lambda.min.ratio,
penalize.diagonal = TRUE,
ic = c("AIC", "AICc", "BIC", "BIC0", "EBIC", "MBIC"),
ebic.gamma = 0.5,
fast = TRUE,
LLA = FALSE,
LLA.threshold = 0.0001,
LLA.iter = 100,
network.only = TRUE,
verbose = FALSE,
...
)Arguments
- data
Matrix or data frame. Should consist only of variables to be used in the analysis
- n
Numeric (length = 1). Sample size must be provided if
dataprovided is a correlation matrix- corr
Character (length = 1). Method to compute correlations. Defaults to
"auto". Available options:"auto"— Automatically computes appropriate correlations for the data using Pearson's for continuous, polychoric for ordinal, tetrachoric for binary, and polyserial/biserial for ordinal/binary with continuous. To change the number of categories that are considered ordinal, useordinal.categories(seepolychoric.matrixfor more details)"cor_auto"— Usescor_autoto compute correlations. Arguments can be passed along to the function"cosine"— Usescosineto compute cosine similarity"pearson"— Pearson's correlation is computed for all variables regardless of categories"spearman"— Spearman's rank-order correlation is computed for all variables regardless of categories
For other similarity measures, compute them first and input them into
datawith the sample size (n)- na.data
Character (length = 1). How should missing data be handled? Defaults to
"pairwise". Available options:"pairwise"— Computes correlation for all available cases between two variables"listwise"— Computes correlation for all complete cases in the dataset
- penalty
Character (length = 1). Available options:
"atan"— Arctangent (Wang & Zhu, 2016) $$\lambda \cdot (\gamma + 2 \pi) \cdot \arctan(\frac{|x|}{\gamma})$$"bridge"— Bridge (Fu, 1998) $$\lambda \cdot |x|^\gamma$$"cauchy"—- Cauchy $$\lambda \cdot \frac{1}{\pi} \cdot \arctan{\frac{|x|}{\gamma}} + 0.5$$"exp"— EXP (Wang, Fan, & Zhu, 2018) $$\lambda \cdot (1 - e^{-\frac{|x|}{\gamma}})$$"gumbel"— Gumbel $$\lambda \cdot e^{-e^{\frac{|x|}{\gamma}}}$$"l1"— LASSO (Tibshirani, 1996) $$\lambda \cdot |x|$$"l2"— Ridge (Hoerl & Kennard, 1970) $$\lambda \cdot x^2$$"mcp"— Minimax Concave Penalty (Zhang, 2010) $$ P(x; \lambda, \gamma) = \begin{cases} \lambda |x| - \frac{x^2}{2\gamma} & \text{if } |x| \leq \gamma\lambda \\ \frac{\gamma \lambda^2}{2} & \text{if } |x| > \gamma\lambda \end{cases} $$"scad"— Smoothly Clipped Absolute Deviation (Fan & Li, 2001) $$ P(x; \lambda, \gamma) = \begin{cases} \lambda |x| & \text{if } |x| \leq \lambda \\ -\frac{|x|^2 - 2\gamma\lambda|x| + \lambda^2}{2(\gamma - 1)} & \text{if } \lambda < |x| \leq \gamma\lambda \\ \frac{(\gamma + 1)\lambda^2}{2} & \text{if } |x| > \gamma\lambda \end{cases} $$"weibull"— Weibull $$\lambda \cdot (1 - e^{\large(-\frac{|x|}{\gamma}\large)^k})$$
- gamma
Numeric (length = 1). Adjusts the shape of the penalty. Defaults:
"atan"= 0.01"bridge"= 1"cauchy"= 0.01"exp"= 0.01"gumbel"= 0.01"mcp"= 3"scad"= 3.7"weibull"= 0.01
- lambda
Numeric (length = 1). Adjusts the initial penalty provided to the penalty function
- adaptive.gamma
Boolean (length = 1). Whether data-adaptive (gamma) parameters should be used. Defaults to
FALSE. Set toTRUEto apply data-adaptive parameters based on the empirical partial correlation matrix. Available options:"cauchy"= uses half of the interquartile range of the absolute empirical partial correlations (Bloch, 1966)"exp"= uses median of distribution for the scale parameter (\(\frac{\log{(2)}}{\lambda}\))"gumbel"= uses the mean of the distribution for the scale parameter (\(\gamma \cdot \beta\) where \(beta\) is the Euler-Mascheroni constant)"weibull"= uses MLE estimate of shape parameter and median of distribution for the scale parameter (\(\lambda \cdot (\log{(2)})^{1/k} \))
- nlambda
Numeric (length = 1). Number of lambda values to test. Defaults to
100- lambda.min.ratio
Numeric (length = 1). Ratio of lowest lambda value compared to maximal lambda. Defaults to
0.01for all methods except for"exp"and"weibull"where it defaults to0.001- penalize.diagonal
Boolean (length = 1). Should the diagonal be penalized? Defaults to
TRUE- ic
Character (length = 1). What information criterion should be used for model selection? Available options include:
"AIC"— Akaike's information criterion: \(-2L + 2E\)"AICc"— AIC corrected: \(AIC + \frac{2E^2 + 2E}{n - E - 1}\)"BIC"— Bayesian information criterion: \(-2L + E \cdot \log{(n)}\)"BIC0"— Bayesian information criterion not (Dicker et al., 2013): \(\log{\large(\frac{D}{n - E}\large)} + \large(\frac{\log{(n)}}{n}\large) \cdot E\)"EBIC"— Extended BIC: \(BIC + 4E \cdot \gamma \cdot \log{(E)}\)"MBIC"— Modified Bayesian information criterion (Wang et al., 2018): \(\log{\large(\frac{D}{n - E}\large)} + \large(\frac{\log{(n)} \cdot E}{n}\large) \cdot \log{(\log{(p)}})\)
Term definitions:
\(n\) — sample size
\(p\) — number of variables
\(E\) — edges
\(S\) — empirical correlation matrix
\(K\) — estimated inverse covariance matrix (network)
\(L = \frac{n}{2} \cdot \log \text{det} K - \sum_{i=1}^p (SK)_{ii}\)
\(D = n \cdot \sum_{i=1}^p (SK)_{ii} - \log \text{det} K\)
Defaults to
"BIC"- ebic.gamma
Numeric (length = 1) Value to set gamma parameter in EBIC (see above). Defaults to
0.50Only used if
ic = "EBIC"- fast
Boolean (length = 1). Whether the
glassoFastversion should be used to estimate the GLASSO. Defaults toTRUE.The fast results may differ by less than floating point of the original GLASSO implemented by
glassoand should not impact reproducibility much (set toFALSEif concerned)- LLA
Boolean (length = 1). Should Local Linear Approximation be used to find optimal minimum? Defaults to
FALSEor a single-pass approximation, which can be significantly faster (Zou & Li, 2008). Set toTRUEto find global minimum based on convergence (LLA.threshold)- LLA.threshold
Numeric (length = 1). When performing the Local Linear Approximation, the maximum threshold until convergence is met. Defaults to
1e-04- LLA.iter
Numeric (length = 1). Maximum number of iterations to perform to reach convergence. Defaults to
100- network.only
Boolean (length = 1). Whether the network only should be output. Defaults to
TRUE. Set toFALSEto obtain all output for the network estimation method- verbose
Boolean (length = 1). Whether messages and (insignificant) warnings should be output. Defaults to
FALSE(silent calls). Set toTRUEto see all messages and warnings for every function call- ...
Additional arguments to be passed on to
auto.correlate
References
Half IQR for \(\gamma\) in Cauchy
Johnson, N. L., Kotz, S., & Balakrishnan, N. (1970).
Continuous univariate distributions (Vol. 1).
New York, NY: John Wiley & Sons.
BIC0
Dicker, L., Huang, B., & Lin, X. (2013).
Variable selection and estimation with the seamless-L0 penalty.
Statistica Sinica, 23(2), 929–962.
SCAD penalty and Local Linear Approximation
Fan, J., & Li, R. (2001).
Variable selection via nonconcave penalized likelihood and its oracle properties.
Journal of the American Statistical Association, 96(456), 1348–1360.
Bridge penalty
Fu, W. J. (1998).
Penalized regressions: The bridge versus the lasso.
Journal of Computational and Graphical Statistics, 7(3), 397–416.
L2 penalty
Hoerl, A. E., & Kennard, R. W. (1970).
Ridge regression: Biased estimation for nonorthogonal problems.
Technometrics, 12(1), 55–67.
L1 penalty
Tibshirani, R. (1996).
Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
EXP penalty
Wang, Y., Fan, Q., & Zhu, L. (2018).
Variable selection and estimation using a continuous approximation to the L0 penalty.
Annals of the Institute of Statistical Mathematics, 70(1), 191–214.
Atan penalty
Wang, Y., & Zhu, L. (2016).
Variable selection and parameter estimation with the Atan regularization method.
Journal of Probability and Statistics, 2016, 1–12.
Original simulation in psychometric networks
Williams, D. R. (2020).
Beyond lasso: A survey of nonconvex regularization in Gaussian graphical models.
PsyArXiv.
MCP penalty
Zhang, C.-H. (2010).
Nearly unbiased variable selection under minimax concave penalty.
Annals of Statistics, 38(2), 894–942.
One-step Local Linear Approximation
Zou, H., & Li, R. (2008).
One-step sparse estimates in nonconcave penalized likelihood models.
Annals of Statistics, 36(4), 1509–1533.
Author
Alexander P. Christensen <alexpaulchristensen at gmail.com> and Hudson Golino <hfg9s at virginia.edu>
Examples
# Obtain data
wmt <- wmt2[,7:24]
# Obtain network
l1_network <- network.regularization(data = wmt)
# Obtain Atan network
atan_network <- network.regularization(data = wmt, penalty = "atan")
# Obtain data-adaptive EXP network
exp_network <- network.regularization(data = wmt, penalty = "exp", adaptive.gamma = TRUE)