variation_information#

graph_tool.inference.variation_information(x, y, norm=False)[source]#

Returns the variation of information between two partitions.

Parameters:

xiterable of int values: First partition.
yiterable of int values: Second partition.
norm(optional, default: False): If True, the result will be normalized in the range \([0,1]\).

Returns:

VIfloat: Variation of information value.

Notes

The variation of information [meila_comparing_2003] is defined as

\[\text{VI}(\boldsymbol x,\boldsymbol y) = -\frac{1}{N}\sum_{rs}m_{rs}\left[\ln\frac{m_{rs}}{n_r} + \ln\frac{m_{rs}}{n_s'}\right],\]

with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions.

If norm == True, the normalized value is returned:

\[\frac{\text{VI}(\boldsymbol x,\boldsymbol y)}{\ln N}\]

which lies in the unit interval \([0,1]\).

This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).

References

[meila_comparing_2003]

Marina Meilă, “Comparing Clusterings by the Variation of Information,” in Learning Theory and Kernel Machines, Lecture Notes in Computer Science No. 2777, edited by Bernhard Schölkopf and Manfred K. Warmuth (Springer Berlin Heidelberg, 2003) pp. 173–187. DOI: 10.1007/978-3-540-45167-9_14 [sci-hub, @tor]

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> gt.variation_information(x, y)
4.525389...

variation_information

Contents

variation_information#