Skip to contents

A general function to compute several different information theory metrics

Usage

information(
  data,
  base = 2.718282,
  bins = floor(sqrt(nrow(data)/5)),
  statistic = c("entropy", "joint.entropy", "conditional.entropy", "total.correlation",
    "dual.total.correlation", "o.information")
)

Arguments

data

Matrix or data frame. Should consist only of variables to be used in the analysis

base

Numeric (length = 1). Base of logarithm to use for entropy. Common options include:

  • 2 — bits

  • 2.718282 — nats

  • 10 — bans

Defaults to exp(1) or 2.718282

bins

Numeric (length = 1). Number of bins if data are not discrete. Defaults to floor(sqrt(nrow(data) / 5))

statistic

Character. Information theory statistics to compute. Available options:

  • "entropy" — Shannon's entropy (Shannon, 1948) for each variable in data. Values range from 0 to log(k) where k is the number of categories for the variable

  • "joint.entropy" — shared uncertainty over all variables in data. Values range from the maximum of the individual entropies to the sum of individual entropies

  • "conditional.entropy" — uncertainty remaining after considering all other variables in data. Values range from 0 to the individual entropy of the conditioned variable

  • "total.correlation" — generalization of mutual information to more than two variables (Watanabe, 1960). Quantifies the redundancy of information in data. Values range from 0 to the sum of individual entropies minus the maximum of the individual entropies

  • "dual.total.correlation" — "shared randomness" or total uncertainty remaining in the data (Han, 1978). Values range from 0 to joint entropy

  • "o.information" — quantifies the extent to which the data is represented by lower-order (> 0; redundancy) or higher-order (< 0; synergy) constraint (Crutchfield, 1994)

By default, all statistics are computed

Value

Returns list containing only requested statistic

References

Shannon's entropy
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379-423.

Formalization of total correlation
Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development 4, 66-82.

Applied implementation of total correlation
Felix, L. M., Mansur-Alves, M., Teles, M., Jamison, L., & Golino, H. (2021). Longitudinal impact and effects of booster sessions in a cognitive training program for healthy older adults. Archives of Gerontology and Geriatrics, 94, 104337.

Formalization of dual total correlation
Te Sun, H. (1978). Nonnegative entropy measures of multivariate symmetric correlations. Information and Control, 36, 133-156.

Formalization of O-information
Crutchfield, J. P. (1994). The calculi of emergence: Computation, dynamics and induction. Physica D: Nonlinear Phenomena, 75(1-3), 11-54.

Applied implementation of O-information
Marinazzo, D., Van Roozendaal, J., Rosas, F. E., Stella, M., Comolatti, R., Colenbier, N., Stramaglia, S., & Rosseel, Y. (2024). An information-theoretic approach to build hypergraphs in psychometrics. Behavior Research Methods, 1-23.

Author

Hudson F. Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>

Examples

# All measures
information(wmt2[,7:24])
#> $entropy
#>  [1] 0.6248171 0.4848127 0.5703028 0.6852211 0.6248171 0.6540949 0.6795459
#>  [8] 0.6889107 0.6852211 0.6686914 0.6323408 0.6261037 0.6811749 0.6879248
#> [15] 0.6329467 0.6410914 0.5107526 0.5738498
#> 
#> $joint.entropy
#> [1] 6.818678
#> 
#> $conditional.entropy
#>  [1] 0.06491355 0.04776910 0.06099249 0.06055578 0.06512678 0.05807372
#>  [7] 0.06634590 0.07136623 0.06008623 0.05776512 0.09258580 0.07823839
#> [13] 0.09740448 0.07219685 0.08422207 0.06995098 0.08668268 0.09056333
#> 
#> $total.correlation
#> [1] 4.533942
#> 
#> $dual.total.correlation
#> [1] 5.533838
#> 
#> $o.information
#> [1] -0.9998968
#> 

# One measures
information(wmt2[,7:24], statistic = "joint.entropy")
#> $joint.entropy
#> [1] 6.818678
#>