variation_information#
- graph_tool.inference.variation_information(x, y, norm=False)[source]#
Returns the variation of information between two partitions.
- Parameters:
- xiterable of
int
values First partition.
- yiterable of
int
values Second partition.
- norm(optional, default:
False
) If
True
, the result will be normalized in the range \([0,1]\).
- xiterable of
- Returns:
- VI
float
Variation of information value.
- VI
Notes
The variation of information [meila_comparing_2003] is defined as
\[\text{VI}(\boldsymbol x,\boldsymbol y) = -\frac{1}{N}\sum_{rs}m_{rs}\left[\ln\frac{m_{rs}}{n_r} + \ln\frac{m_{rs}}{n_s'}\right],\]with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions.
If
norm == True
, the normalized value is returned:\[\frac{\text{VI}(\boldsymbol x,\boldsymbol y)}{\ln N}\]which lies in the unit interval \([0,1]\).
This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).
References
[meila_comparing_2003]Marina Meilă, “Comparing Clusterings by the Variation of Information,” in Learning Theory and Kernel Machines, Lecture Notes in Computer Science No. 2777, edited by Bernhard Schölkopf and Manfred K. Warmuth (Springer Berlin Heidelberg, 2003) pp. 173–187. DOI: 10.1007/978-3-540-45167-9_14 [sci-hub, @tor]
Examples
>>> x = np.random.randint(0, 10, 1000) >>> y = np.random.randint(0, 10, 1000) >>> gt.variation_information(x, y) 4.525389...