Compares Community Detection Solutions Using Permutation

A permutation implementation to determine statistical significance of whether the community comparison measure is different from zero

Usage

community.compare(
  base,
  comparison,
  method = c("vi", "nmi", "split.join", "rand", "adjusted.rand"),
  iter = 1000,
  shuffle.base = TRUE,
  verbose = TRUE,
  seed = NULL
)

Arguments

base

Character or numeric vector. A vector of characters or numbers that are treated as the baseline communities

comparison

Character or numeric vector (length = length(base)). A vector of characters or numbers that are treated as the baseline communities

method

Character (length = 1). Comparison metrics from compare. Defaults to "adjusted.rand". Available options:

"vi" — Variation of information (Meila, 2003)
"nmi" — Normalized mutual information (Danon et al., 2003)
"split.join" — Split-join distance (Dongen, 2000)
"rand" — Rand index (Rand, 1971)
"adjusted.rand" — adjusted Rand index (Hubert & Arabie, 1985; Steinley, 2004)

iter

Numeric (length = 1). Number of permutations to perform. Defaults to 1000 (recommended)

shuffle.base

Boolean (length = 1). Whether the base cluster solution should be shuffled. Defaults to TRUE to remain consistent with original implementation (Qannari et al., 2014); however, from a theoretical standpoint, it might make sense to only shuffle the comparison to determine whether it is specifically different from the recognized base

verbose

Boolean (length = 1). Should progress be displayed? Defaults to TRUE. Set to FALSE to not display progress

seed

Numeric (length = 1). Defaults to NULL or random results. Set for reproducible results. See Reproducibility and PRNG for more details on random number generation in EGAnet

Value

Returns data frame containing method used (Method), empirical or observed value (Empirical), and p-value based on the permutation test (p.value)

References

Implementation of Permutation Test
Qannari, E. M., Courcoux, P., & Faye, P. (2014). Significance test of the adjusted Rand index. Application to the free sorting task. Food Quality and Preference, 32, 93–97.

Variation of Information
Meila, M. (2003, August). Comparing clusterings by the variation of information. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings (pp. 173-187). Berlin, DE: Springer Berlin Heidelberg.

Normalized Mutual Information
Danon, L., Diaz-Guilera, A., Duch, J., & Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005(09), P09008.

Split-join Distance
Dongen, S. (2000). Performance criteria for graph clustering and Markov cluster experiments. CWI (Centre for Mathematics and Computer Science).

Rand Index
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846-850.

Adjusted Rand Index
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193-218.

Steinley, D. (2004). Properties of the Hubert-Arabie adjusted rand index. Psychological Methods, 9(3), 386.

Author

Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>

Examples

# Load data
wmt <- wmt2[,7:24]

# Estimate network
network <- EBICglasso.qgraph(data = wmt)

# Compute Edge Betweenness
edge_between <- community.detection(network, algorithm = "edge_betweenness")
#> Warning: At vendor/cigraph/src/community/edge_betweenness.c:503 : Membership vector will be selected based on the highest modularity score.

# Compute Fast Greedy
fast_greedy <- community.detection(network, algorithm = "fast_greedy")

# Perform permutation test
community.compare(edge_between, fast_greedy)
#> Warning: This implementation of `community.compare` is experimental. 
#> 
#> The underlying function and/or output may change until the results have been appropriately vetted and validated.
#> Warning: Some values were NA. These indices were removed: 3
#> Argument 'seed' is set to `NULL`. Results will not be reproducible. Set 'seed' for reproducible results
#>                Method Empirical p.value
#> 1 Adjusted Rand Index 0.2724426   0.005