partition_overlap

partition_overlap#

graph_tool.inference.partition_overlap(x, y, norm=True)[source]#

Returns the maximum overlap between partitions, according to an optimal label alignment.

Parameters:
xiterable of int values or PropertyMap

First partition.

yiterable of int values or PropertyMap

Second partition.

norm(optional, default: True)

If True, the result will be normalized in the range \([0,1]\).

Returns:
wfloat or int

Maximum overlap value.

Notes

The maximum overlap between partitions \(\boldsymbol x\) and \(\boldsymbol y\) is defined as

\[\omega(\boldsymbol x,\boldsymbol y) = \underset{\boldsymbol\mu}{\max}\sum_i\delta_{x_i,\mu(y_i)},\]

where \(\boldsymbol\mu\) is a bijective mapping between group labels. It corresponds to solving an instance of the maximum weighted bipartite matching problem, which is done with the Kuhn-Munkres algorithm [kuhn_hungarian_1955] [munkres_algorithms_1957].

If norm == True, the normalized value is returned:

\[\frac{\omega(\boldsymbol x,\boldsymbol y)}{N}\]

which lies in the unit interval \([0,1]\).

This algorithm runs in time \(O[N + (B_x+B_y)E_m]\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\), \(B_x\) and \(B_y\) are the number of labels in partitions \(\boldsymbol x\) and \(\boldsymbol y\), respectively, and \(E_m \le B_xB_y\) is the number of nonzero entries in the contingency table between both partitions.

References

[peixoto-revealing-2021]

Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, Phys. Rev. X 11 021003 (2021) DOI: 10.1103/PhysRevX.11.021003 [sci-hub, @tor], arXiv: 2005.13977

[kuhn_hungarian_1955]

H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly 2, 83–97 (1955) DOI: 10.1002/nav.3800020109 [sci-hub, @tor]

[munkres_algorithms_1957]

James Munkres, “Algorithms for the Assignment and Transportation Problems,” Journal of the Society for Industrial and Applied Mathematics 5, 32–38 (1957). DOI: 10.1137/0105003 [sci-hub, @tor]

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> print(gt.partition_overlap(x, y))
0.143