Estimate the Generalizability of Network
Source:R/network.generalizability.R
network.generalizability.Rd
General function to compute a network's predictive power on new data, following Haslbeck and Waldorp (2018) and Williams and Rodriguez (2022) and using generalizability methods of data splitting, k-folds cross-validation, and leave-one-out cross-validation
Uses network.predictability
as the basis to then perform
generalizability methods over
Usage
network.generalizability(
data,
method = c("split", "cv", "loocv"),
number,
corr = c("auto", "cor_auto", "pearson", "spearman"),
na.data = c("pairwise", "listwise"),
model = c("BGGM", "glasso", "TMFG"),
algorithm = c("leiden", "louvain", "walktrap"),
uni.method = c("expand", "LE", "louvain"),
seed = NULL,
...
)
Arguments
- data
Matrix or data frame. Should consist only of variables to be used in the analysis. Can be raw data or a correlation matrix
- method
Character (length = 1). Generalizability method. Available options:
"split"
— Performs train/test data split on the data usingnumber
to adjust the size of the training split"cv"
— (default) Performs k-folds cross-validation usingnumber
to adjust the number of folds (e.g., 5 = 80/20 splits; 10 = 90/10 splits)"loocv"
— Performs leave-one-out cross-validation. Leave-one-out has a tendency to overestimate the generalizability of the model and is not recommended (k-folds cross-validation should be preferred)
- number
Numeric (length = 1). Parameter to adjust the
method
argument. Ranges 0-1 formethod = "split"
and 1-N formethod = "cv"
. Defaults to0.80
and5
, respectively- corr
Character (length = 1). Method to compute correlations. Defaults to
"auto"
. Available options:"auto"
— Automatically computes appropriate correlations for the data using Pearson's for continuous, polychoric for ordinal, tetrachoric for binary, and polyserial/biserial for ordinal/binary with continuous. To change the number of categories that are considered ordinal, useordinal.categories
(seepolychoric.matrix
for more details)"cor_auto"
— Usescor_auto
to compute correlations. Arguments can be passed along to the function"pearson"
— Pearson's correlation is computed for all variables regardless of categories"spearman"
— Spearman's rank-order correlation is computed for all variables regardless of categories
For other similarity measures, compute them first and input them into
data
with the sample size (n
)- na.data
Character (length = 1). How should missing data be handled? Defaults to
"pairwise"
. Available options:"pairwise"
— Computes correlation for all available cases between two variables"listwise"
— Computes correlation for all complete cases in the dataset
- model
Character (length = 1). Defaults to
"glasso"
. Available options:"BGGM"
— Computes the Bayesian Gaussian Graphical Model. Set argumentordinal.categories
to determine levels allowed for a variable to be considered ordinal. See?BGGM::estimate
for more details"glasso"
— Computes the GLASSO with EBIC model selection. SeeEBICglasso.qgraph
for more details"TMFG"
— Computes the TMFG method. SeeTMFG
for more details
- algorithm
Character or
igraph
cluster_*
function (length = 1). Defaults to"walktrap"
. Three options are listed below but all are available (seecommunity.detection
for other options):"leiden"
— Seecluster_leiden
for more details"louvain"
— By default,"louvain"
will implement the Louvain algorithm using the consensus clustering method (seecommunity.consensus
for more information). This function will implementconsensus.method = "most_common"
andconsensus.iter = 1000
unless specified otherwise"walktrap"
— Seecluster_walktrap
for more details
- uni.method
Character (length = 1). What unidimensionality method should be used? Defaults to
"louvain"
. Available options:"expand"
— Expands the correlation matrix with four variables correlated 0.50. If number of dimension returns 2 or less in check, then the data are unidimensional; otherwise, regular EGA with no matrix expansion is used. This method was used in the Golino et al.'s (2020) Psychological Methods simulation"LE"
— Applies the Leading Eigenvector algorithm (cluster_leading_eigen
) on the empirical correlation matrix. If the number of dimensions is 1, then the Leading Eigenvector solution is used; otherwise, regular EGA is used. This method was used in the Christensen et al.'s (2023) Behavior Research Methods simulation"louvain"
— Applies the Louvain algorithm (cluster_louvain
) on the empirical correlation matrix. If the number of dimensions is 1, then the Louvain solution is used; otherwise, regular EGA is used. This method was validated Christensen's (2022) PsyArXiv simulation. Consensus clustering can be used by specifying either"consensus.method"
or"consensus.iter"
- seed
Numeric (length = 1). Defaults to
NULL
or random results. Set for reproducible results. See Reproducibility and PRNG for more details on random number generation inEGAnet
- ...
Additional arguments to be passed on to
auto.correlate
,network.estimation
,community.detection
,community.consensus
, andcommunity.unidimensional
Value
Returns a list containing:
- node
Node-wise metrics output from
network.predictability
- community
Community-wise metrics output from
tefi
Details
This implementation of network predictability proceeds in several steps with important assumptions:
1. Network was estimated using (partial) correlations (not regression like the
mgm
package!)
2. Original data that was used to estimate the network in 1. is necessary to apply the original scaling to the new data
3. (Linear) regression-like coefficients are obtained by reserve engineering the
inverse covariance matrix using the network's partial correlations (i.e.,
by setting the diagonal of the network to -1 and computing the inverse
of the opposite signed partial correlation matrix; see EGAnet:::pcor2inv
)
4. Predicted values are obtained by matrix multiplying the new data with these coefficients
5. Dichotomous and polytomous data are given categorical values based on the original data's thresholds and these thresholds are used to convert the continuous predicted values into their corresponding categorical values
6. Evaluation metrics:
dichotomous — Accuracy or the percent correctly predicted for the 0s and 1s
polytomous — Accuracy based on the correctly predicting the ordinal category exactly (i.e., 1 = 1, 2, = 2, etc.) and a weighted accuracy such that absolute distance of the predicted value from the actual value (e.g., |prediction - actual| = 1) is used as the power of 0.5. This weighted approach provides an overall distance in terms of accuracy where each predicted value away from the actual value is given a harsher penalty (absolute difference = accuracy value): 0 = 1.000, 1 = 0.500, 2 = 0.2500, 3 = 0.1250, 4 = 0.0625, etc.
continuous — R-sqaured and root mean square error
References
Original Implementation of Node Predictability
Haslbeck, J. M., & Waldorp, L. J. (2018).
How well do network models predict observations? On the importance of predictability in network models.
Behavior Research Methods, 50(2), 853–861.
Derivation of Regression Coefficients Used (Formula 3)
Williams, D. R., & Rodriguez, J. E. (2022).
Why overfitting is not (usually) a problem in partial correlation networks.
Psychological Methods, 27(5), 822–840.
Author
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>
Examples
# Data splitting
network.generalizability(
data = wmt2[,7:24], method = "split",
number = 0.80 # 80/20 training/testing
)
#> Error in network.generalizability(data = wmt2[, 7:24], method = "split", number = 0.8): could not find function "network.generalizability"
# k-folds cross-validation
network.generalizability(
data = wmt2[,7:24], method = "cv",
number = 5 # 5-fold cross-validation
)
#> Error in network.generalizability(data = wmt2[, 7:24], method = "cv", number = 5): could not find function "network.generalizability"
if (FALSE) { # \dontrun{
# Leave-one-out cross-validation
network.generalizability(
data = wmt2[,7:24], method = "loocv"
)} # }