# vertex_similarity#

graph_tool.topology.vertex_similarity(g, sim_type='jaccard', vertex_pairs=None, eweight=None, sim_map=None)[source]#

Return the similarity between pairs of vertices.

Parameters:
gGraph

The graph to be used.

sim_typestr (optional, default: "jaccard")

Type of similarity to use. This must be one of "dice", "salton", "hub-promoted", "hub-suppressed", "jaccard", "inv-log-weight", "resource-allocation" or "leicht-holme-newman".

vertex_pairsiterable of pairs of integers (optional, default: None)

Pairs of vertices to compute the similarity. If omitted, all pairs will be considered.

eweightEdgePropertyMap (optional, default: None)

Edge weights.

sim_mapVertexPropertyMap (optional, default: None)

If provided, and vertex_pairs is None, the vertex similarities will be stored in this vector-valued property. Otherwise, a new one will be created.

Returns:
similarities

If vertex_pairs was supplied, this will be a numpy.ndarray with the corresponding similarities, otherwise it will be a vector-valued vertex VertexPropertyMap, with the similarities to all other vertices.

Notes

According to sim_type, this function computes one of the following similarities:

sim_type == "dice"

The Sørensen–Dice similarity [sorensen-dice] of vertices $$u$$ and $$v$$ is defined as

$\frac{2|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)|+|\Gamma(v)|},$

where $$\Gamma(u)$$ is the set of neighbors of vertex $$u$$.

sim_type == "salton"

The Salton (or cosine) similarity [salton] of vertices $$u$$ and $$v$$ is defined as

$\frac{|\Gamma(u)\cap\Gamma(v)|}{\sqrt{|\Gamma(u)||\Gamma(v)|}},$

where $$\Gamma(u)$$ is the set of neighbors of vertex $$u$$.

sim_type == "hub-promoted"

The “hub promoted” similarity [ravasz_hierarchical_2002] of vertices $$u$$ and $$v$$ is defined as

$\frac{|\Gamma(u)\cap\Gamma(v)|}{\max(|\Gamma(u)|,|\Gamma(v)|)},$

where $$\Gamma(u)$$ is the set of neighbors of vertex $$u$$.

sim_type == "hub-suppressed"

The “hub suppressed” similarity of vertices $$u$$ and $$v$$ is defined as

$\frac{|\Gamma(u)\cap\Gamma(v)|}{\min(|\Gamma(u)|,|\Gamma(v)|)},$

where $$\Gamma(u)$$ is the set of neighbors of vertex $$u$$.

sim_type == "jaccard"

The Jaccard similarity [jaccard] of vertices $$u$$ and $$v$$ is defined as

$\frac{|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)\cup\Gamma(v)|},$

where $$\Gamma(u)$$ is the set of neighbors of vertex $$u$$.

sim_type == "inv-log-weight"

The inverse log weighted similarity [adamic-friends-2003] of vertices $$u$$ and $$v$$ is defined as

$\sum_{w \in \Gamma(u)\cap\Gamma(v)}\frac{1}{\log |\Gamma(w)|},$

where $$\Gamma(u)$$ is the set of neighbors of vertex $$u$$.

sim_type == "resource-allocation"

The resource allocation similarity [zhou-predicting-2009] of vertices $$u$$ and $$v$$ is defined as

$\sum_{w \in \Gamma(u)\cap\Gamma(v)}\frac{1}{|\Gamma(w)|},$

where $$\Gamma(u)$$ is the set of neighbors of vertex $$u$$.

sim_type == "leicht-holme-newman"

The Leicht-Holme-Newman similarity [leicht_vertex_2006] of vertices $$u$$ and $$v$$ is defined as

$\frac{|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)||\Gamma(v)|},$

where $$\Gamma(u)$$ is the set of neighbors of vertex $$u$$.

For directed graphs, only out-neighbors are considered in the above algorthms (for “inv-log-weight”, the in-degrees are used to compute the weights). To use the in-neighbors instead, a GraphView should be used to reverse the graph, e.g. vertex_similarity(GraphView(g, reversed=True)).

For weighted or multigraphs, in the above equations it is assumed the following:

$\begin{split}|\Gamma(u)\cap\Gamma(v)| &= \sum_w \min(A_{wv}, A_{wu}),\\ |\Gamma(u)\cup\Gamma(v)| &= \sum_w \max(A_{wv}, A_{wu}),\\ |\Gamma(u)| &= \sum_w A_{wu},\end{split}$

where $$A_{wu}$$ is the weighted adjacency matrix.

See [liben-nowell-link-prediction-2007] for a review of the above.

The algorithm runs with complexity $$O(\left<k\right>N^2)$$ if vertex_pairs is None, otherwise with $$O(\left<k\right>P)$$ where $$P$$ is the length of vertex_pairs.

Parallel implementation.

If enabled during compilation, this algorithm will run in parallel using OpenMP. See the parallel algorithms section for information about how to control several aspects of parallelization.

References

[salton]

G. Salton, M. J. McGill, “Introduction to Modern Informa-tion Retrieval”, (MuGraw-Hill, Auckland, 1983).

Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L., “Hierarchical organization of modularity in metabolic networks”, Science, 297(5586), 1551-1555, (2002). DOI: 10.1126/science.1073374 [sci-hub, @tor]

E. A. Leicht, Petter Holme, and M. E. J. Newman, “Vertex similarity in networks”, Phys. Rev. E 73, 026120 (2006), DOI: 10.1103/PhysRevE.73.026120 [sci-hub, @tor], arXiv: physics/0510143

Lada A. Adamic and Eytan Adar, “Friends and neighbors on the Web”, Social Networks Volume 25, Issue 3, Pages 211–230 (2003) DOI: 10.1016/S0378-8733(03)00009-1 [sci-hub, @tor]

Zhou, Tao, Linyuan Lü, and Yi-Cheng Zhang, “Predicting missing links via local information”, The European Physical Journal B 71, no. 4: 623-630 (2009), DOI: 10.1140/epjb/e2009-00335-8 [sci-hub, @tor], arXiv: 0901.0553

Examples

>>> g = gt.collection.data["polbooks"]
>>> s = gt.vertex_similarity(g, "jaccard")
>>> color = g.new_vp("double")
>>> color.a = s[0].a
>>> gt.graph_draw(g, pos=g.vp.pos, vertex_text=g.vertex_index,
...               vertex_color=color, vertex_fill_color=color,
...               vcmap=matplotlib.cm.inferno,
...               output="polbooks-jaccard.svg")
<...>