EGA Optimal Model Fit using the Total Entropy Fit Index (tefi)

Estimates the best fitting model using EGA. The number of steps in the cluster_walktrap detection algorithm is varied and unique community solutions are compared using tefi.

Usage

EGA.fit(
  data,
  n = NULL,
  corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"),
  na.data = c("pairwise", "listwise"),
  model = c("BGGM", "glasso", "TMFG"),
  algorithm = c("leiden", "louvain", "walktrap"),
  plot.EGA = TRUE,
  verbose = FALSE,
  ...
)

Arguments

data

Matrix or data frame. Should consist only of variables to be used in the analysis

n

Numeric (length = 1). Sample size if data is a correlation matrix

corr

Character (length = 1). Method to compute correlations. Defaults to "auto". Available options:

"auto" — Automatically computes appropriate correlations for the data using Pearson's for continuous, polychoric for ordinal, tetrachoric for binary, and polyserial/biserial for ordinal/binary with continuous. To change the number of categories that are considered ordinal, use ordinal.categories (see polychoric.matrix for more details)
"cor_auto" — Uses cor_auto to compute correlations. Arguments can be passed along to the function
"cosine" — Uses cosine to compute cosine similarity
"pearson" — Pearson's correlation is computed for all variables regardless of categories
"spearman" — Spearman's rank-order correlation is computed for all variables regardless of categories

For other similarity measures, compute them first and input them into data with the sample size (n)

na.data

Character (length = 1). How should missing data be handled? Defaults to "pairwise". Available options:

"pairwise" — Computes correlation for all available cases between two variables
"listwise" — Computes correlation for all complete cases in the dataset

model

Character (length = 1). Defaults to "glasso". Available options:

"BGGM" — Computes the Bayesian Gaussian Graphical Model. Set argument ordinal.categories to determine levels allowed for a variable to be considered ordinal. See ?BGGM::estimate for more details
"glasso" — Computes the GLASSO with EBIC model selection. See EBICglasso.qgraph for more details
"TMFG" — Computes the TMFG method. See TMFG for more details

algorithm

Character or igraph cluster_* function. Three options are listed below but all are available (see community.detection for other options):

"leiden" — See cluster_leiden for more details. Note: The Leiden algorithm will default to the Constant Potts Model objective function (objective_function = "CPM"). Set objective_function = "modularity" to use modularity instead (see examples). By default, searches along resolutions from 0 to max(abs(network)) or the maximum absolute edge weight in the network in 0.01 increments (resolution_parameter = seq.int(0, max(abs(network)), 0.01)). For modularity, searches along resolutions from 0 to 2 in 0.05 increments (resolution_parameter = seq.int(0, 2, 0.05)) by default. Use the argument resolution_parameter to change the search parameters (see examples)
"louvain" — See community.consensus for more details. By default, searches along resolutions from 0 to 2 in 0.05 increments (resolution_parameter = seq.int(0, 2, 0.05)). Use the argument resolution_parameter to change the search parameters (see examples)
"walktrap" — This algorithm is the default. See cluster_walktrap for more details. By default, searches along 3 to 8 steps (steps = 3:8). Use the argument steps to change the search parameters (see examples)

plot.EGA

Boolean. If TRUE, returns a plot of the network and its estimated dimensions. Defaults to TRUE

verbose

Boolean. Whether messages and (insignificant) warnings should be output. Defaults to FALSE (silent calls). Set to TRUE to see all messages and warnings for every function call

...

Additional arguments to be passed on to auto.correlate, network.estimation, community.detection, community.consensus, and EGA.estimate

Value

Returns a list containing:

EGA: EGA results of the best fitting solution
EntropyFit: tefi fit values for each solution
Lowest.EntropyFit: The best fitting solution based on tefi
parameter.space: Parameter values used in search space

References

Entropy fit measures
Golino, H., Moulder, R. G., Shi, D., Christensen, A. P., Garrido, L. E., Neito, M. D., Nesselroade, J., Sadana, R., Thiyagarajan, J. A., & Boker, S. M. (in press). Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables. Multivariate Behavioral Research.

Simulation for EGA.fit
Jamison, L., Christensen, A. P., & Golino, H. (under review). Optimizing Walktrap's community detection in networks using the Total Entropy Fit Index. PsyArXiv.

Leiden algorithm
Traag, V. A., Waltman, L., & Van Eck, N. J. (2019). From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports, 9(1), 1-12.

Louvain algorithm
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.

Walktrap algorithm
Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10, 191-218.

Author

Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>

Examples

# Load data
wmt <- wmt2[,7:24]

# Estimate optimal EGA with Walktrap
fit.walktrap <- EGA.fit(
  data = wmt, algorithm = "walktrap",
  steps = 3:8, # default
  plot.EGA = FALSE # no plot for CRAN checks
)

# Estimate optimal EGA with Leiden and CPM
fit.leiden <- EGA.fit(
  data = wmt, algorithm = "leiden",
  objective_function = "CPM", # default
  # resolution_parameter = seq.int(0, max(abs(network)), 0.01),
  # For CPM, the default max resolution parameter
  # is set to the largest absolute edge in the network
  plot.EGA = FALSE # no plot for CRAN checks
)

# Estimate optimal EGA with Leiden and modularity
fit.leiden <- EGA.fit(
  data = wmt, algorithm = "leiden",
  objective_function = "modularity",
  resolution_parameter = seq.int(0, 2, 0.05),
  # default for modularity
  plot.EGA = FALSE # no plot for CRAN checks
)

if (FALSE) { # \dontrun{
# Estimate optimal EGA with Louvain
fit.louvain <- EGA.fit(
  data = wmt, algorithm = "louvain",
  resolution_parameter = seq.int(0, 2, 0.05), # default
  plot.EGA = FALSE # no plot for CRAN checks
)} # }