vertex_similarity#
- graph_tool.topology.vertex_similarity(g, sim_type='jaccard', vertex_pairs=None, eweight=None, sim_map=None)[source]#
Return the similarity between pairs of vertices.
- Parameters:
- g
Graph The graph to be used.
- sim_type
str(optional, default:"jaccard") Type of similarity to use. This must be one of
"dice","salton","hub-promoted","hub-suppressed","jaccard","inv-log-weight","resource-allocation"or"leicht-holme-newman".- vertex_pairsiterable of pairs of integers (optional, default:
None) Pairs of vertices to compute the similarity. If omitted, all pairs will be considered.
- eweight
EdgePropertyMap(optional, default:None) Edge weights.
- sim_map
VertexPropertyMap(optional, default:None) If provided, and
vertex_pairs is None, the vertex similarities will be stored in this vector-valued property. Otherwise, a new one will be created.
- g
- Returns:
- similarities
numpy.ndarrayorVertexPropertyMap If
vertex_pairswas supplied, this will be anumpy.ndarraywith the corresponding similarities, otherwise it will be a vector-valued vertexVertexPropertyMap, with the similarities to all other vertices.
- similarities
Notes
According to
sim_type, this function computes one of the following similarities:sim_type == "dice"The Sørensen–Dice similarity [sorensen-dice] of vertices \(u\) and \(v\) is defined as
\[\frac{2|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)|+|\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "salton"The Salton (or cosine) similarity [salton] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\sqrt{|\Gamma(u)||\Gamma(v)|}},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "hub-promoted"The “hub promoted” similarity [ravasz_hierarchical_2002] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\max(|\Gamma(u)|,|\Gamma(v)|)},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "hub-suppressed"The “hub suppressed” similarity of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{\min(|\Gamma(u)|,|\Gamma(v)|)},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "jaccard"The Jaccard similarity [jaccard] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)\cup\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "inv-log-weight"The inverse log weighted similarity [adamic-friends-2003] of vertices \(u\) and \(v\) is defined as
\[\sum_{w \in \Gamma(u)\cap\Gamma(v)}\frac{1}{\log |\Gamma(w)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "resource-allocation"The resource allocation similarity [zhou-predicting-2009] of vertices \(u\) and \(v\) is defined as
\[\sum_{w \in \Gamma(u)\cap\Gamma(v)}\frac{1}{|\Gamma(w)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
sim_type == "leicht-holme-newman"The Leicht-Holme-Newman similarity [leicht_vertex_2006] of vertices \(u\) and \(v\) is defined as
\[\frac{|\Gamma(u)\cap\Gamma(v)|}{|\Gamma(u)||\Gamma(v)|},\]where \(\Gamma(u)\) is the set of neighbors of vertex \(u\).
For directed graphs, only out-neighbors are considered in the above algorthms (for “inv-log-weight”, the in-degrees are used to compute the weights). To use the in-neighbors instead, a
GraphViewshould be used to reverse the graph, e.g.vertex_similarity(GraphView(g, reversed=True)).For weighted or multigraphs, in the above equations it is assumed the following:
\[\begin{split}|\Gamma(u)\cap\Gamma(v)| &= \sum_w \min(A_{wv}, A_{wu}),\\ |\Gamma(u)\cup\Gamma(v)| &= \sum_w \max(A_{wv}, A_{wu}),\\ |\Gamma(u)| &= \sum_w A_{wu},\end{split}\]where \(A_{wu}\) is the weighted adjacency matrix.
See [liben-nowell-link-prediction-2007] for a review of the above.
The algorithm runs with complexity \(O(\left<k\right>N^2)\) if
vertex_pairs is None, otherwise with \(O(\left<k\right>P)\) where \(P\) is the length ofvertex_pairs.Parallel implementation.
If enabled during compilation, this algorithm will run in parallel using OpenMP. See the parallel algorithms section for information about how to control several aspects of parallelization.
References
[salton]G. Salton, M. J. McGill, “Introduction to Modern Informa-tion Retrieval”, (MuGraw-Hill, Auckland, 1983).
[ravasz_hierarchical_2002]Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L., “Hierarchical organization of modularity in metabolic networks”, Science, 297(5586), 1551-1555, (2002). DOI: 10.1126/science.1073374 [sci-hub, @tor]
[leicht_vertex_2006]E. A. Leicht, Petter Holme, and M. E. J. Newman, “Vertex similarity in networks”, Phys. Rev. E 73, 026120 (2006), DOI: 10.1103/PhysRevE.73.026120 [sci-hub, @tor], arXiv: physics/0510143
[adamic-friends-2003]Lada A. Adamic and Eytan Adar, “Friends and neighbors on the Web”, Social Networks Volume 25, Issue 3, Pages 211–230 (2003) DOI: 10.1016/S0378-8733(03)00009-1 [sci-hub, @tor]
[liben-nowell-link-prediction-2007]David Liben-Nowell and Jon Kleinberg, “The link-prediction problem for social networks”, Journal of the American Society for Information Science and Technology, Volume 58, Issue 7, pages 1019–1031 (2007), DOI: 10.1002/asi.20591 [sci-hub, @tor]
[zhou-predicting-2009]Zhou, Tao, Linyuan Lü, and Yi-Cheng Zhang, “Predicting missing links via local information”, The European Physical Journal B 71, no. 4: 623-630 (2009), DOI: 10.1140/epjb/e2009-00335-8 [sci-hub, @tor], arXiv: 0901.0553
Examples
>>> g = gt.collection.data["polbooks"] >>> s = gt.vertex_similarity(g, "jaccard") >>> color = g.new_vp("double") >>> color.a = s[0].a >>> gt.graph_draw(g, pos=g.vp.pos, vertex_text=g.vertex_index, ... vertex_color=color, vertex_fill_color=color, ... vcmap=matplotlib.cm.inferno, ... output="polbooks-jaccard.svg") <...>
Jaccard similarities to vertex
0in a political books network.#