# reduced_mutual_information#

graph_tool.inference.reduced_mutual_information(x, y, norm=False)[source]#

Returns the reduced mutual information between two partitions.

Parameters:
xiterable of int values

First partition.

yiterable of int values

Second partition.

norm(optional, default: True)

If True, the result will be normalized such that the maximum possible value is one.

Returns:
RMIfloat

Reduced mutual information value.

Notes

The reduced mutual information (RMI) [newman_improved_2020] is defined as

$\text{RMI}(\boldsymbol x,\boldsymbol y) = \frac{1}{N}\left[\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -\ln\Omega(\boldsymbol n, \boldsymbol n')\right],$

with $$m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}$$ being the contingency table between $$\boldsymbol x$$ and $$\boldsymbol y$$, and $$n_r=\sum_sm_{rs}$$ and $$n'_s=\sum_rm_{rs}$$ are the group sizes in both partitions, and $$\Omega(\boldsymbol n, \boldsymbol n')$$ is the total number of contingency tables with fixed row and column sums.

If norm == True, the normalized value is returned:

$\frac{2\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -2\ln\Omega(\boldsymbol n, \boldsymbol n')} {\ln\frac{N!}{\prod_rn_r!} + \ln\frac{N!}{\prod_rn'_r!} -\ln\Omega(\boldsymbol n, \boldsymbol n) -\ln\Omega(\boldsymbol n', \boldsymbol n')}$

which can take a maximum value of one.

Note that both the normalized and unormalized RMI values can be negative.

This algorithm runs in time $$O(N)$$ where $$N$$ is the length of $$\boldsymbol x$$ and $$\boldsymbol y$$.

References

M. E. J. Newman, G. T. Cantwell and J.-G. Young, “Improved mutual information measure for classification and community detection”, Phys. Rev. E, 101, 042304 (2020), DOI: 10.1103/PhysRevE.101.042304 [sci-hub, @tor], arXiv: 1907.12581

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> gt.reduced_mutual_information(x, y)
-0.069938...