reduced_mutual_information#
- graph_tool.inference.reduced_mutual_information(x, y, norm=False, norm_method='asymmetric')[source]#
Returns the reduced mutual information between two partitions.
- Parameters:
- xiterable of
int
values First partition.
- yiterable of
int
values Second partition.
- norm
bool
(optional, default:False
) If
True
, the result will be pseudo-normalized such that returned value is one if the two partitions are identical.- norm_method
"symmetric"
or"asymmetric"
(optional, default:"asymmetric"
) Method used for pseudo-normalization.
- xiterable of
- Returns:
- RMI
float
Reduced mutual information value.
- RMI
Notes
The reduced mutual information (RMI) [newman_improved_2020] is defined as
\[\text{RMI}(\boldsymbol x, \boldsymbol y) = \frac{1}{N}\left[\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -\ln\Omega(\boldsymbol n, \boldsymbol n')\right],\]with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions, and \(\Omega(\boldsymbol n, \boldsymbol n')\) is the total number of contingency tables with fixed row and column sums.
If
norm == True
, the pseudo-normalized value is returned:\[\frac{\text{RMI}(\boldsymbol x, \boldsymbol y)}{\text{RMI}(\boldsymbol x, \boldsymbol x)}\]if
norm_method == "asymmetric"
, or\[\frac{2\text{RMI}(\boldsymbol x, \boldsymbol y)}{\text{RMI}(\boldsymbol x, \boldsymbol x) + \text{RMI}(\boldsymbol y, \boldsymbol y)}\]if
norm_method == "symmetric"
. In both cases the pseudo-normalized value will be unity if the two partitions are identical.Warning
Note that both the pseudo-normalized and unormalized RMI values can be negative. In addition, the pseudo-normalized RMI can take values larger than unity for different partitions. These are issues with the definition of RMI, not bugs in this particular implementation.
This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).
References
[newman_improved_2020]M. E. J. Newman, G. T. Cantwell and J.-G. Young, “Improved mutual information measure for classification and community detection”, Phys. Rev. E, 101, 042304 (2020), DOI: 10.1103/PhysRevE.101.042304 [sci-hub, @tor], arXiv: 1907.12581
Examples
>>> x = np.random.randint(0, 10, 1000) >>> y = np.random.randint(0, 10, 1000) >>> gt.reduced_mutual_information(x, y) -0.069938... >>> x = [0, 1, 2, 3, 4, 5, 6] >>> y = [0, 0, 1, 0, 2, 3, 4] >>> gt.reduced_mutual_information(x, y, norm=True, norm_method="asymmetric") 0.535938... >>> # Pseudo-normalized values larger than one are possible! >>> gt.reduced_mutual_information(x, y, norm=True, norm_method="symmetric") 82.227299... >>> gt.reduced_mutual_information(x, x, norm=False) -0.162538... >>> gt.reduced_mutual_information(y, y, norm=False) 0.160420... >>> gt.reduced_mutual_information(x, x, norm=True, norm_method="asymmetric") 1.0 >>> gt.reduced_mutual_information(x, x, norm=True, norm_method="symmetric") 1.0 >>> gt.reduced_mutual_information(y, y, norm=True, norm_method="asymmetric") 1.0 >>> gt.reduced_mutual_information(y, y, norm=True, norm_method="symmetric") 1.0