bootEGA
Estimates the number of dimensions of iter
bootstraps
using the empirical zero-order correlation matrix ("parametric"
) or
"resampling"
from the empirical dataset (non-parametric). bootEGA
estimates a typical median network structure, which is formed by the median or
mean pairwise (partial) correlations over the iter bootstraps (see
Details for information about the typical median network structure).
Usage
bootEGA(
data,
n = NULL,
corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"),
na.data = c("pairwise", "listwise"),
model = c("BGGM", "glasso", "TMFG"),
algorithm = c("leiden", "louvain", "walktrap"),
uni.method = c("expand", "LE", "louvain"),
iter = 500,
type = c("parametric", "resampling"),
ncores,
EGA.type = c("EGA", "EGA.fit", "hierEGA", "riEGA"),
plot.itemStability = TRUE,
typicalStructure = FALSE,
plot.typicalStructure = FALSE,
seed = NULL,
verbose = TRUE,
...
)
Arguments
- data
Matrix or data frame. Should consist only of variables to be used in the analysis
- n
Numeric (length = 1). Sample size if
data
provided is a correlation matrix- corr
Character (length = 1). Method to compute correlations. Defaults to
"auto"
. Available options:"auto"
— Automatically computes appropriate correlations for the data using Pearson's for continuous, polychoric for ordinal, tetrachoric for binary, and polyserial/biserial for ordinal/binary with continuous. To change the number of categories that are considered ordinal, useordinal.categories
(seepolychoric.matrix
for more details)"cor_auto"
— Usescor_auto
to compute correlations. Arguments can be passed along to the function"cosine"
— Usescosine
to compute cosine similarity"pearson"
— Pearson's correlation is computed for all variables regardless of categories"spearman"
— Spearman's rank-order correlation is computed for all variables regardless of categories
For other similarity measures, compute them first and input them into
data
with the sample size (n
)- na.data
Character (length = 1). How should missing data be handled? Defaults to
"pairwise"
. Available options:"pairwise"
— Computes correlation for all available cases between two variables"listwise"
— Computes correlation for all complete cases in the dataset
- model
Character (length = 1). Defaults to
"glasso"
. Available options:"BGGM"
— Computes the Bayesian Gaussian Graphical Model. Set argumentordinal.categories
to determine levels allowed for a variable to be considered ordinal. See?BGGM::estimate
for more details"glasso"
— Computes the GLASSO with EBIC model selection. SeeEBICglasso.qgraph
for more details"TMFG"
— Computes the TMFG method. SeeTMFG
for more details
- algorithm
Character or
igraph
cluster_*
function (length = 1). Defaults to"walktrap"
. Three options are listed below but all are available (seecommunity.detection
for other options):"leiden"
— Seecluster_leiden
for more details"louvain"
— By default,"louvain"
will implement the Louvain algorithm using the consensus clustering method (seecommunity.consensus
for more information). This function will implementconsensus.method = "most_common"
andconsensus.iter = 1000
unless specified otherwise"walktrap"
— Seecluster_walktrap
for more details
- uni.method
Character (length = 1). What unidimensionality method should be used? Defaults to
"louvain"
. Available options:"expand"
— Expands the correlation matrix with four variables correlated 0.50. If number of dimension returns 2 or less in check, then the data are unidimensional; otherwise, regular EGA with no matrix expansion is used. This method was used in the Golino et al.'s (2020) Psychological Methods simulation"LE"
— Applies the Leading Eigenvector algorithm (cluster_leading_eigen
) on the empirical correlation matrix. If the number of dimensions is 1, then the Leading Eigenvector solution is used; otherwise, regular EGA is used. This method was used in the Christensen et al.'s (2023) Behavior Research Methods simulation"louvain"
— Applies the Louvain algorithm (cluster_louvain
) on the empirical correlation matrix. If the number of dimensions is 1, then the Louvain solution is used; otherwise, regular EGA is used. This method was validated Christensen's (2022) PsyArXiv simulation. Consensus clustering can be used by specifying either"consensus.method"
or"consensus.iter"
- iter
Numeric (length = 1). Number of replica samples to generate from the bootstrap analysis. Defaults to
500
(recommended)- type
Character (length = 1). What type of bootstrap should be performed? Defaults to
"parametric"
. Available options:"parametric"
— Generatesiter
new datasets from (multivariate normal random distributions) based on the original dataset usingmvrnorm
"resampling"
— Generatesiter
new datasets from random subsamples of the original data
- ncores
Numeric (length = 1). Number of cores to use in computing results. Defaults to
ceiling(parallel::detectCores() / 2)
or half of your computer's processing power. Set to1
to not use parallel computingIf you're unsure how many cores your computer has, then type:
parallel::detectCores()
- EGA.type
Character (length = 1). Type of EGA model to use. Defaults to
"EGA"
Available options:"EGA"
— Uses standard exploratory graph analysis"hierEGA"
— Uses hierarchical exploratory graph analysis"riEGA"
— Uses random-intercept exploratory graph analysis
Arguments for
EGA.type
can be added (see links for details on specific function arguments)- plot.itemStability
Boolean (length = 1). Should the plot be produced for
item.replication
? Defaults toTRUE
- typicalStructure
Boolean (length = 1). If
TRUE
, returns the median ("glasso"
or"BGGM"
) or mean ("TMFG"
) network structure and estimates its dimensions (see Details for more information). Defaults toFALSE
- plot.typicalStructure
Boolean (length = 1). If
TRUE
, returns a plot of the typical network structure. Defaults toFALSE
- seed
Numeric (length = 1). Defaults to
NULL
or random results. Set for reproducible results. See Reproducibility and PRNG for more details on random number generation inEGAnet
- verbose
Boolean (length = 1). Should progress be displayed? Defaults to
TRUE
. Set toFALSE
to not display progress- ...
Additional arguments that can be passed on to
auto.correlate
,network.estimation
,community.detection
,community.consensus
,EGA
,EGA.fit
,hierEGA
, andriEGA
Value
Returns a list containing:
- iter
Number of replica samples in bootstrap
- bootGraphs
A list containing the networks of each replica sample
- boot.wc
A matrix of membership assignments for each replica network with variables down the columns and replicas across the rows
- boot.ndim
Number of dimensions identified in each replica sample
- summary.table
A data frame containing number of replica samples, median, standard deviation, standard error, 95% confidence intervals, and quantiles (lower = 2.5% and upper = 97.5%)
- frequency
A data frame containing the proportion of times the number of dimensions was identified (e.g., .85 of 1,000 = 850 times that specific number of dimensions was found)
- TEFI
tefi
value for each replica sample- type
Type of bootstrap used
- EGA
Output of the empirical EGA results (output will vary based on
EGA.type
)- EGA.type
Type of
*EGA
function used- typicalGraph
A list containing:
graph
— Network matrix of the median network structuretypical.dim.variables
— An ordered matrix of item allocationwc
— Membership assignments of the median network
- plot.typical.ega
Plot output if
plot.typicalStructure = TRUE
Details
The typical network structure is derived from the median (or mean) value of each pairwise relationship. These values tend to reflect the "typical" value taken by an edge across the bootstrap networks. Afterward, the same community detection algorithm is applied to the typical network as the bootstrap networks.
Because the community detection algorithm is applied to the typical network structure,
there is a possibility that the community algorithm determines
a different number of dimensions than the median number derived from the bootstraps.
The typical network structure (and number of dimensions) may not
match the empirical EGA
number of dimensions or
the median number of dimensions from the bootstrap. This result is known
and not a bug.
References
Original implementation of bootEGA
Christensen, A. P., & Golino, H. (2021).
Estimating the stability of the number of factors via Bootstrap Exploratory Graph Analysis: A tutorial.
Psych, 3(3), 479-500.
See also
itemStability
to estimate the stability of
the variables in the empirical dimensions and
dimensionStability
to estimate the stability of
the dimensions (structural consistency)
Author
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>
Examples
# Load data
wmt <- wmt2[,7:24]
if (FALSE) { # \dontrun{
# Standard EGA parametric example
boot.wmt <- bootEGA(
data = wmt, iter = 500,
type = "parametric", ncores = 2
)
# Standard resampling example
boot.wmt <- bootEGA(
data = wmt, iter = 500,
type = "resampling", ncores = 2
)
# Example using {igraph} `cluster_*` function
boot.wmt.spinglass <- bootEGA(
data = wmt, iter = 500,
algorithm = igraph::cluster_spinglass,
# use any function from {igraph}
type = "parametric", ncores = 2
)
# EGA fit example
boot.wmt.fit <- bootEGA(
data = wmt, iter = 500,
EGA.type = "EGA.fit",
type = "parametric", ncores = 2
)
# Hierarchical EGA example
boot.wmt.hier <- bootEGA(
data = wmt, iter = 500,
EGA.type = "hierEGA",
type = "parametric", ncores = 2
)
# Random-intercept EGA example
boot.wmt.ri <- bootEGA(
data = wmt, iter = 500,
EGA.type = "riEGA",
type = "parametric", ncores = 2
)} # }