graph_tool.inference.reduced_mutual_information#

graph_tool.inference.reduced_mutual_information(x, y, norm=False)[source]#

Returns the reduced mutual information between two partitions.

Parameters:
xiterable of int values

First partition.

yiterable of int values

Second partition.

norm(optional, default: True)

If True, the result will be normalized such that the maximum possible value is one.

Returns:
RMIfloat

Reduced mutual information value.

Notes

The reduced mutual information (RMI) [newman_improved_2020] is defined as

\[\text{RMI}(\boldsymbol x,\boldsymbol y) = \frac{1}{N}\left[\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -\ln\Omega(\boldsymbol n, \boldsymbol n')\right],\]

with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions, and \(\Omega(\boldsymbol n, \boldsymbol n')\) is the total number of contingency tables with fixed row and column sums.

If norm == True, the normalized value is returned:

\[\frac{2\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -2\ln\Omega(\boldsymbol n, \boldsymbol n')} {\ln\frac{N!}{\prod_rn_r!} + \ln\frac{N!}{\prod_rn'_r!} -\ln\Omega(\boldsymbol n, \boldsymbol n) -\ln\Omega(\boldsymbol n', \boldsymbol n')}\]

which can take a maximum value of one.

Note that both the normalized and unormalized RMI values can be negative.

This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).

References

[newman_improved_2020]

M. E. J. Newman, G. T. Cantwell and J.-G. Young, “Improved mutual information measure for classification and community detection”, Phys. Rev. E, 101, 042304 (2020), DOI: 10.1103/PhysRevE.101.042304 [sci-hub, @tor], arXiv: 1907.12581

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> gt.reduced_mutual_information(x, y)
-0.069938...