graph_tool.topology.similarity#
- graph_tool.topology.similarity(g1, g2, eweight1=None, eweight2=None, label1=None, label2=None, norm=True, p=1.0, distance=False, asymmetric=False)[source]#
Return the Jaccard adjacency similarity between two graphs.
- Parameters:
- g1
Graph
First graph to be compared.
- g2
Graph
Second graph to be compared.
- eweight1
EdgePropertyMap
(optional, default:None
) Edge weights for the first graph to be used in comparison.
- eweight2
EdgePropertyMap
(optional, default:None
) Edge weights for the second graph to be used in comparison.
- label1
VertexPropertyMap
(optional, default:None
) Vertex labels for the first graph to be used in comparison. If not supplied, the vertex indices are used.
- label2
VertexPropertyMap
(optional, default:None
) Vertex labels for the second graph to be used in comparison. If not supplied, the vertex indices are used.
- normbool (optional, default:
True
) If
True
, the returned value is normalized by the total number of edges.- pfloat (optional, default:
1.
) Exponent of the \(L^p\) distance function.
- distancebool (optional, default:
False
) If
True
, the complementary value is returned, i.e. the distance between the two graphs.- asymmetricbool (optional, default:
False
) If
True
, the asymmetric similarity ofg1
tog2
will be computed.
- g1
- Returns:
- similarityfloat
Adjacency similarity value.
Notes
In its default parametrization, the Jaccard adjacency similarity is the sum of equal non-zero entries in the adjacency matrices of both graphs, given a vertex ordering determined by the vertex labels. In other words, it counts the number of edges which have the same source and target labels in both graphs.
If
norm == True
(the default) the value returned is the total fraction of edges of both networks that match.This function also allows for generalized similarities according to an \(L^p\) norm, for arbitrary exponent \(p\).
More specifically, the (unormalized) adjacency similarity is defined as:
\[S(\boldsymbol A_1, \boldsymbol A_2) = E - d(\boldsymbol A_1, \boldsymbol A_2)\]where
\[d(\boldsymbol A_1, \boldsymbol A_2) = \left(\sum_{i\le j} \left|A_{ij}^{(1)} - A_{ij}^{(2)}\right|^p\right)^{1/p}\]is the corresponding distance between graphs, and \(E=(\sum_{i\le j}|A_{ij}^{(1)}|^p + |A_{ij}^{(2)}|^p)^{1/p}\). Unless otherwise stated via the parameter
p
, the exponent used is \(p=1\). This definition holds for undirected graphs, otherwise the sums go over all directed pairs. If edge weights are provided, the weighted adjacency matrix is used.If a multigraph is passed, the matrix entries \(A^{(1)}_{ij}\) and \(A^{(2)}_{ij}\) correspond to the edge multiplicities between nodes \(i\) and \(j\) in each graph.
If
norm == True
the value returned is \(S(\boldsymbol A_1, \boldsymbol A_2) / E\), which lies in the interval \([0,1]\).If
asymmetric == True
, the above is changed so that the comparison is made only for entries in \(\boldsymbol A_1\) that are larger than in \(\boldsymbol A_2\), i.e.\[d(\boldsymbol A_1, \boldsymbol A_2) = \left(\sum_{i\le j} \left|A_{ij}^{(1)} - A_{ij}^{(2)}\right|^p H(A_{ij}^{(1)} - A_{ij}^{(2)})\right)^{1/p},\]where \(H(x) = \{1 \text{ if } x > 0; 0 \text{ otherwise}\}\) is the unit step function, and the total sum is changed accordingly to \(E=\left(\sum_{i\le j}|A_{ij}^{(1)}|^p\right)^{1/p}\).
Relation to set operations
If the graph is unweighted, \(p=1\), and
norm == False
the algorithm is equivalent to the following set operations:If
distance == True
the returned value is equal to\[2 |\boldsymbol A_1 \cup \boldsymbol A_2|\]where \(\boldsymbol A_1 \cup \boldsymbol A_2\) is the union of the edges in \(\boldsymbol A_1\) and \(\boldsymbol A_2\).
If
distance == False
the returned value is equal to\[2 |\boldsymbol A_1 \cap \boldsymbol A_2|\]where \(\boldsymbol A_1 \cap \boldsymbol A_2\) is the intersection of the edges in \(\boldsymbol A_1\) and \(\boldsymbol A_2\).
If
distance == True
andasymmetric == True
the returned value is equal to\[|\boldsymbol A_1 \setminus \boldsymbol A_2|\]where \(\boldsymbol A_1 \setminus \boldsymbol A_2\) is the set of edges in \(\boldsymbol A_1\) that are not also in \(\boldsymbol A_2\).
The algorithm runs with complexity \(O(E_1 + V_1 + E_2 + V_2)\).
If enabled during compilation, and the vertex labels are integers bounded by the sizes of the graphs, this algorithm runs in parallel.
Examples
>>> g = gt.random_graph(100, lambda: (3,3)) >>> u = g.copy() >>> gt.similarity(u, g) 1.0 >>> gt.random_rewire(u) 17 >>> gt.similarity(u, g) 0.05