A general function to compute several different information theory metrics
Arguments
- data
Matrix or data frame. Should consist only of variables to be used in the analysis
- base
Numeric (length = 1). Base of logarithm to use for entropy. Common options include:
2
— bits2.718282
— nats10
— bans
Defaults to
exp(1)
or2.718282
- bins
Numeric (length = 1). Number of bins if data are not discrete. Defaults to
floor(sqrt(nrow(data) / 5))
- statistic
Character. Information theory statistics to compute. Available options:
"entropy"
— Shannon's entropy (Shannon, 1948) for each variable indata
. Values range from0
tolog(k)
wherek
is the number of categories for the variable"joint.entropy"
— shared uncertainty over all variables indata
. Values range from the maximum of the individual entropies to the sum of individual entropies"conditional.entropy"
— uncertainty remaining after considering all other variables indata
. Values range from0
to the individual entropy of the conditioned variable"total.correlation"
— generalization of mutual information to more than two variables (Watanabe, 1960). Quantifies the redundancy of information indata
. Values range from0
to the sum of individual entropies minus the maximum of the individual entropies"dual.total.correlation"
— "shared randomness" or total uncertainty remaining in thedata
(Han, 1978). Values range from0
to joint entropy"o.information"
— quantifies the extent to which thedata
is represented by lower-order (> 0
; redundancy) or higher-order (< 0
; synergy) constraint (Crutchfield, 1994)
By default, all statistics are computed
References
Shannon's entropy
Shannon, C. E. (1948). A mathematical theory of communication.
The Bell System Technical Journal, 27(3), 379-423.
Formalization of total correlation
Watanabe, S. (1960).
Information theoretical analysis of multivariate correlation.
IBM Journal of Research and Development 4, 66-82.
Applied implementation of total correlation
Felix, L. M., Mansur-Alves, M., Teles, M., Jamison, L., & Golino, H. (2021).
Longitudinal impact and effects of booster sessions in a cognitive training program for healthy older adults.
Archives of Gerontology and Geriatrics, 94, 104337.
Formalization of dual total correlation
Te Sun, H. (1978).
Nonnegative entropy measures of multivariate symmetric correlations.
Information and Control, 36, 133-156.
Formalization of O-information
Crutchfield, J. P. (1994). The calculi of emergence: Computation, dynamics and induction.
Physica D: Nonlinear Phenomena, 75(1-3), 11-54.
Applied implementation of O-information
Marinazzo, D., Van Roozendaal, J., Rosas, F. E., Stella, M., Comolatti, R., Colenbier, N., Stramaglia, S., & Rosseel, Y. (2024).
An information-theoretic approach to build hypergraphs in psychometrics.
Behavior Research Methods, 1-23.
Author
Hudson F. Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>
Examples
# All measures
information(wmt2[,7:24])
#> $entropy
#> [1] 0.6248171 0.4848127 0.5703028 0.6852211 0.6248171 0.6540949 0.6795459
#> [8] 0.6889107 0.6852211 0.6686914 0.6323408 0.6261037 0.6811749 0.6879248
#> [15] 0.6329467 0.6410914 0.5107526 0.5738498
#>
#> $joint.entropy
#> [1] 6.818678
#>
#> $conditional.entropy
#> [1] 0.06491355 0.04776910 0.06099249 0.06055578 0.06512678 0.05807372
#> [7] 0.06634590 0.07136623 0.06008623 0.05776512 0.09258580 0.07823839
#> [13] 0.09740448 0.07219685 0.08422207 0.06995098 0.08668268 0.09056333
#>
#> $total.correlation
#> [1] 4.533942
#>
#> $dual.total.correlation
#> [1] 5.533838
#>
#> $o.information
#> [1] -0.9998968
#>
# One measures
information(wmt2[,7:24], statistic = "joint.entropy")
#> $joint.entropy
#> [1] 6.818678
#>