vertex_similarity#
- graph_tool.topology.vertex_similarity(g, sim_type='jaccard', vertex_pairs=None, eweight=None, sim_map=None)[source]#
Return the similarity between pairs of vertices.
- Parameters:
- g
Graph
The graph to be used.
- sim_type
str
(optional, default:"jaccard"
) Type of similarity to use. This must be one of
"dice"
,"salton"
,"hub-promoted"
,"hub-suppressed"
,"jaccard"
,"inv-log-weight"
,"resource-allocation"
or"leicht-holme-newman"
.- vertex_pairsiterable of pairs of integers (optional, default:
None
) Pairs of vertices to compute the similarity. If omitted, all pairs will be considered.
- eweight
EdgePropertyMap
(optional, default:None
) Edge weights.
- sim_map
VertexPropertyMap
(optional, default:None
) If provided, and
vertex_pairs is None
, the vertex similarities will be stored in this vector-valued property. Otherwise, a new one will be created.
- g
- Returns:
- similarities
numpy.ndarray
orVertexPropertyMap
If
vertex_pairs
was supplied, this will be anumpy.ndarray
with the corresponding similarities, otherwise it will be a vector-valued vertexVertexPropertyMap
, with the similarities to all other vertices.
- similarities
Notes
According to
sim_type
, this function computes one of the following similarities:sim_type == "dice"
The Sørensen–Dice similarity [sorensen-dice] of vertices \(u\) and \(v\) is defined as
\[\frac{2|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)|+|\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "salton"
The Salton (or cosine) similarity [salton] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\sqrt{|\Gamma(u)||\Gamma(v)|}},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "hub-promoted"
The “hub promoted” similarity [ravasz_hierarchical_2002] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\max(|\Gamma(u)|,|\Gamma(v)|)},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "hub-suppressed"
The “hub suppressed” similarity of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\min(|\Gamma(u)|,|\Gamma(v)|)},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "jaccard"
The Jaccard similarity [jaccard] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)\cup\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "inv-log-weight"
The inverse log weighted similarity [adamic-friends-2003] of vertices \(u\) and \(v\) is defined as
\[\sum_{w \in \Gamma(u)\cap\Gamma(v)}\frac{1}{\log |\Gamma(w)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "resource-allocation"
The resource allocation similarity [zhou-predicting-2009] of vertices \(u\) and \(v\) is defined as
\[\sum_{w \in \Gamma(u)\cap\Gamma(v)}\frac{1}{|\Gamma(w)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "leicht-holme-newman"
The Leicht-Holme-Newman similarity [leicht_vertex_2006] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)||\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
For directed graphs, only out-neighbors are considered in the above algorthms (for “inv-log-weight”, the in-degrees are used to compute the weights). To use the in-neighbors instead, a
GraphView
should be used to reverse the graph, e.g.vertex_similarity(GraphView(g, reversed=True))
.For weighted or multigraphs, in the above equations it is assumed the following:
\[\begin{split}|\Gamma(u)\cap\Gamma(v)| &= \sum_w \min(A_{wv}, A_{wu}),\\ |\Gamma(u)\cup\Gamma(v)| &= \sum_w \max(A_{wv}, A_{wu}),\\ |\Gamma(u)| &= \sum_w A_{wu},\end{split}\]where \(A_{wu}\) is the weighted adjacency matrix.
See [liben-nowell-link-prediction-2007] for a review of the above.
The algorithm runs with complexity \(O(\left<k\right>N^2)\) if
vertex_pairs is None
, otherwise with \(O(\left<k\right>P)\) where \(P\) is the length ofvertex_pairs
.Parallel implementation.
If enabled during compilation, this algorithm will run in parallel using OpenMP. See the parallel algorithms section for information about how to control several aspects of parallelization.
References
[salton]G. Salton, M. J. McGill, “Introduction to Modern Informa-tion Retrieval”, (MuGraw-Hill, Auckland, 1983).
[ravasz_hierarchical_2002]Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L., “Hierarchical organization of modularity in metabolic networks”, Science, 297(5586), 1551-1555, (2002). DOI: 10.1126/science.1073374 [sci-hub, @tor]
[leicht_vertex_2006]E. A. Leicht, Petter Holme, and M. E. J. Newman, “Vertex similarity in networks”, Phys. Rev. E 73, 026120 (2006), DOI: 10.1103/PhysRevE.73.026120 [sci-hub, @tor], arXiv: physics/0510143
[adamic-friends-2003]Lada A. Adamic and Eytan Adar, “Friends and neighbors on the Web”, Social Networks Volume 25, Issue 3, Pages 211–230 (2003) DOI: 10.1016/S0378-8733(03)00009-1 [sci-hub, @tor]
[liben-nowell-link-prediction-2007]David Liben-Nowell and Jon Kleinberg, “The link-prediction problem for social networks”, Journal of the American Society for Information Science and Technology, Volume 58, Issue 7, pages 1019–1031 (2007), DOI: 10.1002/asi.20591 [sci-hub, @tor]
[zhou-predicting-2009]Zhou, Tao, Linyuan Lü, and Yi-Cheng Zhang, “Predicting missing links via local information”, The European Physical Journal B 71, no. 4: 623-630 (2009), DOI: 10.1140/epjb/e2009-00335-8 [sci-hub, @tor], arXiv: 0901.0553
Examples
>>> g = gt.collection.data["polbooks"] >>> s = gt.vertex_similarity(g, "jaccard") >>> color = g.new_vp("double") >>> color.a = s[0].a >>> gt.graph_draw(g, pos=g.vp.pos, vertex_text=g.vertex_index, ... vertex_color=color, vertex_fill_color=color, ... vcmap=matplotlib.cm.inferno, ... output="polbooks-jaccard.svg") <...>