# Estimate the Generalizability of Network

Source:`R/network.generalizability.R`

`network.generalizability.Rd`

General function to compute a network's predictive power on new data,
following Haslbeck and Waldorp (2018) and Williams and Rodriguez (2022) and using
generalizability methods of data splitting, *k*-folds cross-validation,
and leave-one-out cross-validation

Uses `network.predictability`

as the basis to then perform
generalizability methods over

## Usage

```
network.generalizability(
data,
method = c("split", "cv", "loocv"),
number,
corr = c("auto", "cor_auto", "pearson", "spearman"),
na.data = c("pairwise", "listwise"),
model = c("BGGM", "glasso", "TMFG"),
algorithm = c("leiden", "louvain", "walktrap"),
uni.method = c("expand", "LE", "louvain"),
seed = NULL,
...
)
```

## Arguments

- data
Matrix or data frame. Should consist only of variables to be used in the analysis. Can be raw data or a correlation matrix

- method
Character (length = 1). Generalizability method. Available options:

`"split"`

--- Performs train/test data split on the data using`number`

to adjust the size of the**training**split`"cv"`

--- (default) Performs*k*-folds cross-validation using`number`

to adjust the number of folds (e.g., 5 = 80/20 splits; 10 = 90/10 splits)`"loocv"`

--- Performs leave-one-out cross-validation. Leave-one-out has a tendency to**overestimate**the generalizability of the model and is not recommended (*k*-folds cross-validation should be preferred)

- number
Numeric (length = 1). Parameter to adjust the

`method`

argument. Ranges 0-1 for`method = "split"`

and 1-N for`method = "cv"`

. Defaults to`0.80`

and`5`

, respectively- corr
Character (length = 1). Method to compute correlations. Defaults to

`"auto"`

. Available options:`"auto"`

--- Automatically computes appropriate correlations for the data using Pearson's for continuous, polychoric for ordinal, tetrachoric for binary, and polyserial/biserial for ordinal/binary with continuous. To change the number of categories that are considered ordinal, use`ordinal.categories`

(see`polychoric.matrix`

for more details)`"cor_auto"`

--- Uses`cor_auto`

to compute correlations. Arguments can be passed along to the function`"pearson"`

--- Pearson's correlation is computed for all variables regardless of categories`"spearman"`

--- Spearman's rank-order correlation is computed for all variables regardless of categories

For other similarity measures, compute them first and input them into

`data`

with the sample size (`n`

)- na.data
Character (length = 1). How should missing data be handled? Defaults to

`"pairwise"`

. Available options:`"pairwise"`

--- Computes correlation for all available cases between two variables`"listwise"`

--- Computes correlation for all complete cases in the dataset

- model
Character (length = 1). Defaults to

`"glasso"`

. Available options:`"BGGM"`

--- Computes the Bayesian Gaussian Graphical Model. Set argument`ordinal.categories`

to determine levels allowed for a variable to be considered ordinal. See`?BGGM::estimate`

for more details`"glasso"`

--- Computes the GLASSO with EBIC model selection. See`EBICglasso.qgraph`

for more details`"TMFG"`

--- Computes the TMFG method. See`TMFG`

for more details

- algorithm
Character or

`igraph`

`cluster_*`

function (length = 1). Defaults to`"walktrap"`

. Three options are listed below but all are available (see`community.detection`

for other options):`"leiden"`

--- See`cluster_leiden`

for more details`"louvain"`

--- By default,`"louvain"`

will implement the Louvain algorithm using the consensus clustering method (see`community.consensus`

for more information). This function will implement`consensus.method = "most_common"`

and`consensus.iter = 1000`

unless specified otherwise`"walktrap"`

--- See`cluster_walktrap`

for more details

- uni.method
Character (length = 1). What unidimensionality method should be used? Defaults to

`"louvain"`

. Available options:`"expand"`

--- Expands the correlation matrix with four variables correlated 0.50. If number of dimension returns 2 or less in check, then the data are unidimensional; otherwise, regular EGA with no matrix expansion is used. This method was used in the Golino et al.'s (2020)*Psychological Methods*simulation`"LE"`

--- Applies the Leading Eigenvector algorithm (`cluster_leading_eigen`

) on the empirical correlation matrix. If the number of dimensions is 1, then the Leading Eigenvector solution is used; otherwise, regular EGA is used. This method was used in the Christensen et al.'s (2023)*Behavior Research Methods*simulation`"louvain"`

--- Applies the Louvain algorithm (`cluster_louvain`

) on the empirical correlation matrix. If the number of dimensions is 1, then the Louvain solution is used; otherwise, regular EGA is used. This method was validated Christensen's (2022)*PsyArXiv*simulation. Consensus clustering can be used by specifying either`"consensus.method"`

or`"consensus.iter"`

- seed
Numeric (length = 1). Defaults to

`NULL`

or random results. Set for reproducible results. See Reproducibility and PRNG for more details on random number generation in`EGAnet`

- ...
Additional arguments to be passed on to

`auto.correlate`

,`network.estimation`

,`community.detection`

,`community.consensus`

, and`community.unidimensional`

## Value

Returns a list containing:

- node
Node-wise metrics output from

`network.predictability`

- community
Community-wise metrics output from

`tefi`

## Details

This implementation of network predictability proceeds in several steps with important assumptions:

1. Network was estimated using (partial) correlations (not regression like the
`mgm`

package!)

2. Original data that was used to estimate the network in 1. is necessary to apply the original scaling to the new data

3. (Linear) regression-like coefficients are obtained by reserve engineering the
inverse covariance matrix using the network's partial correlations (i.e.,
by setting the diagonal of the network to -1 and computing the inverse
of the opposite signed partial correlation matrix; see `EGAnet:::pcor2inv`

)

4. Predicted values are obtained by matrix multiplying the new data with these coefficients

5. **Dichotomous and polytomous** data are given categorical values based
on the **original data's** thresholds and these thresholds are used to
convert the continuous predicted values into their corresponding categorical values

6. Evaluation metrics:

dichotomous --- Accuracy or the percent correctly predicted for the 0s and 1s

polytomous --- Accuracy based on the correctly predicting the ordinal category exactly (i.e., 1 = 1, 2, = 2, etc.) and a weighted accuracy such that absolute distance of the predicted value from the actual value (e.g., |prediction - actual| = 1) is used as the power of 0.5. This weighted approach provides an overall distance in terms of accuracy where each predicted value away from the actual value is given a harsher penalty (absolute difference = accuracy value): 0 = 1.000, 1 = 0.500, 2 = 0.2500, 3 = 0.1250, 4 = 0.0625, etc.

continuous --- R-sqaured and root mean square error

## References

**Original Implementation of Node Predictability**

Haslbeck, J. M., & Waldorp, L. J. (2018).
How well do network models predict observations? On the importance of predictability in network models.
*Behavior Research Methods*, *50*(2), 853–861.

**Derivation of Regression Coefficients Used (Formula 3)**

Williams, D. R., & Rodriguez, J. E. (2022).
Why overfitting is not (usually) a problem in partial correlation networks.
*Psychological Methods*, *27*(5), 822–840.

## Author

Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>

## Examples

```
# Data splitting
network.generalizability(
data = wmt2[,7:24], method = "split",
number = 0.80 # 80/20 training/testing
)
#> Argument 'seed' is set to `NULL`. Results will not be reproducible. Set 'seed' for reproducible results
#> Data Split (80/20)
#>
#> Dichotomous
#>
#> wmt1 wmt2 wmt3 wmt4 wmt5 wmt6 wmt7 wmt8 wmt9 wmt10 wmt11
#> Accuracy 0.755 0.857 0.781 0.692 0.759 0.772 0.696 0.688 0.684 0.705 0.696
#> Kappa 0.377 0.495 0.392 0.372 0.390 0.491 0.377 0.358 0.364 0.367 0.197
#> wmt12 wmt13 wmt14 wmt15 wmt16 wmt17 wmt18
#> Accuracy 0.755 0.717 0.654 0.700 0.713 0.819 0.776
#> Kappa 0.290 0.400 0.290 0.262 0.297 0.194 0.232
#>
#> ----
#>
#> Community Metrics
#>
#> TEFI
#> 0
# k-folds cross-validation
network.generalizability(
data = wmt2[,7:24], method = "cv",
number = 5 # 5-fold cross-validation
)
#> Argument 'seed' is set to `NULL`. Results will not be reproducible. Set 'seed' for reproducible results
#> 5-fold Cross-validation
#>
#> Node Metrics
#>
#> wmt1 wmt2 wmt3 wmt4 wmt5 wmt6 wmt7 wmt8 wmt9 wmt10 wmt11
#> Accuracy 0.739 0.854 0.799 0.711 0.743 0.755 0.723 0.701 0.749 0.733 0.705
#> Kappa 0.32 0.464 0.408 0.401 0.327 0.451 0.426 0.396 0.487 0.421 0.233
#> wmt12 wmt13 wmt14 wmt15 wmt16 wmt17 wmt18
#> Accuracy 0.731 0.677 0.697 0.739 0.712 0.795 0.749
#> Kappa 0.309 0.325 0.383 0.353 0.304 0.134 0.183
#>
#> ----
#>
#> Community Metrics
#>
#> Mean SD Median
#> TEFI -10.90097 0.4068079 -10.9325
if (FALSE) {
# Leave-one-out cross-validation
network.generalizability(
data = wmt2[,7:24], method = "loocv"
)}
```