reduced_mutual_information

reduced_mutual_information#

graph_tool.inference.reduced_mutual_information(x, y, norm=False, norm_method='asymmetric')[source]#

Returns the reduced mutual information between two partitions.

Parameters:
xiterable of int values

First partition.

yiterable of int values

Second partition.

normbool (optional, default: False)

If True, the result will be pseudo-normalized such that returned value is one if the two partitions are identical.

norm_method"symmetric" or "asymmetric" (optional, default: "asymmetric")

Method used for pseudo-normalization.

Returns:
RMIfloat

Reduced mutual information value.

Notes

The reduced mutual information (RMI) [newman_improved_2020] is defined as

\[\text{RMI}(\boldsymbol x, \boldsymbol y) = \frac{1}{N}\left[\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -\ln\Omega(\boldsymbol n, \boldsymbol n')\right],\]

with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions, and \(\Omega(\boldsymbol n, \boldsymbol n')\) is the total number of contingency tables with fixed row and column sums.

If norm == True, the pseudo-normalized value is returned:

\[\frac{\text{RMI}(\boldsymbol x, \boldsymbol y)}{\text{RMI}(\boldsymbol x, \boldsymbol x)}\]

if norm_method == "asymmetric", or

\[\frac{2\text{RMI}(\boldsymbol x, \boldsymbol y)}{\text{RMI}(\boldsymbol x, \boldsymbol x) + \text{RMI}(\boldsymbol y, \boldsymbol y)}\]

if norm_method == "symmetric". In both cases the pseudo-normalized value will be unity if the two partitions are identical.

Warning

Note that both the pseudo-normalized and unormalized RMI values can be negative. In addition, the pseudo-normalized RMI can take values larger than unity for different partitions. These are issues with the definition of RMI, not bugs in this particular implementation.

This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).

References

[newman_improved_2020]

M. E. J. Newman, G. T. Cantwell and J.-G. Young, “Improved mutual information measure for classification and community detection”, Phys. Rev. E, 101, 042304 (2020), DOI: 10.1103/PhysRevE.101.042304 [sci-hub, @tor], arXiv: 1907.12581

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> gt.reduced_mutual_information(x, y)
-0.069938...
>>> x = [0, 1, 2, 3, 4, 5, 6]
>>> y = [0, 0, 1, 0, 2, 3, 4]
>>> gt.reduced_mutual_information(x, y, norm=True, norm_method="asymmetric")
0.535938...
>>> # Pseudo-normalized values larger than one are possible!
>>> gt.reduced_mutual_information(x, y, norm=True, norm_method="symmetric")
82.227299...
>>> gt.reduced_mutual_information(x, x, norm=False)
-0.162538...
>>> gt.reduced_mutual_information(y, y, norm=False)
0.160420...
>>> gt.reduced_mutual_information(x, x, norm=True, norm_method="asymmetric")
1.0
>>> gt.reduced_mutual_information(x, x, norm=True, norm_method="symmetric")
1.0
>>> gt.reduced_mutual_information(y, y, norm=True, norm_method="asymmetric")
1.0
>>> gt.reduced_mutual_information(y, y, norm=True, norm_method="symmetric")
1.0