graph_tool.topology.similarity#

graph_tool.topology.similarity(g1, g2, eweight1=None, eweight2=None, label1=None, label2=None, norm=True, p=1.0, distance=False, asymmetric=False)[source]#

Return the Jaccard adjacency similarity between two graphs.

Parameters:
g1Graph

First graph to be compared.

g2Graph

Second graph to be compared.

eweight1EdgePropertyMap (optional, default: None)

Edge weights for the first graph to be used in comparison.

eweight2EdgePropertyMap (optional, default: None)

Edge weights for the second graph to be used in comparison.

label1VertexPropertyMap (optional, default: None)

Vertex labels for the first graph to be used in comparison. If not supplied, the vertex indices are used.

label2VertexPropertyMap (optional, default: None)

Vertex labels for the second graph to be used in comparison. If not supplied, the vertex indices are used.

normbool (optional, default: True)

If True, the returned value is normalized by the total number of edges.

pfloat (optional, default: 1.)

Exponent of the \(L^p\) distance function.

distancebool (optional, default: False)

If True, the complementary value is returned, i.e. the distance between the two graphs.

asymmetricbool (optional, default: False)

If True, the asymmetric similarity of g1 to g2 will be computed.

Returns:
similarityfloat

Adjacency similarity value.

Notes

In its default parametrization, the Jaccard adjacency similarity is the sum of equal non-zero entries in the adjacency matrices of both graphs, given a vertex ordering determined by the vertex labels. In other words, it counts the number of edges which have the same source and target labels in both graphs.

If norm == True (the default) the value returned is the total fraction of edges of both networks that match.

This function also allows for generalized similarities according to an \(L^p\) norm, for arbitrary exponent \(p\).

More specifically, the (unormalized) adjacency similarity is defined as:

\[S(\boldsymbol A_1, \boldsymbol A_2) = E - d(\boldsymbol A_1, \boldsymbol A_2)\]

where

\[d(\boldsymbol A_1, \boldsymbol A_2) = \left(\sum_{i\le j} \left|A_{ij}^{(1)} - A_{ij}^{(2)}\right|^p\right)^{1/p}\]

is the corresponding distance between graphs, and \(E=(\sum_{i\le j}|A_{ij}^{(1)}|^p + |A_{ij}^{(2)}|^p)^{1/p}\). Unless otherwise stated via the parameter p, the exponent used is \(p=1\). This definition holds for undirected graphs, otherwise the sums go over all directed pairs. If edge weights are provided, the weighted adjacency matrix is used.

If a multigraph is passed, the matrix entries \(A^{(1)}_{ij}\) and \(A^{(2)}_{ij}\) correspond to the edge multiplicities between nodes \(i\) and \(j\) in each graph.

If norm == True the value returned is \(S(\boldsymbol A_1, \boldsymbol A_2) / E\), which lies in the interval \([0,1]\).

If asymmetric == True, the above is changed so that the comparison is made only for entries in \(\boldsymbol A_1\) that are larger than in \(\boldsymbol A_2\), i.e.

\[d(\boldsymbol A_1, \boldsymbol A_2) = \left(\sum_{i\le j} \left|A_{ij}^{(1)} - A_{ij}^{(2)}\right|^p H(A_{ij}^{(1)} - A_{ij}^{(2)})\right)^{1/p},\]

where \(H(x) = \{1 \text{ if } x > 0; 0 \text{ otherwise}\}\) is the unit step function, and the total sum is changed accordingly to \(E=\left(\sum_{i\le j}|A_{ij}^{(1)}|^p\right)^{1/p}\).

Relation to set operations

If the graph is unweighted, \(p=1\), and norm == False the algorithm is equivalent to the following set operations:

  1. If distance == True the returned value is equal to

    \[2 |\boldsymbol A_1 \cup \boldsymbol A_2|\]

    where \(\boldsymbol A_1 \cup \boldsymbol A_2\) is the union of the edges in \(\boldsymbol A_1\) and \(\boldsymbol A_2\).

  2. If distance == False the returned value is equal to

    \[2 |\boldsymbol A_1 \cap \boldsymbol A_2|\]

    where \(\boldsymbol A_1 \cap \boldsymbol A_2\) is the intersection of the edges in \(\boldsymbol A_1\) and \(\boldsymbol A_2\).

  3. If distance == True and asymmetric == True the returned value is equal to

    \[|\boldsymbol A_1 \setminus \boldsymbol A_2|\]

    where \(\boldsymbol A_1 \setminus \boldsymbol A_2\) is the set of edges in \(\boldsymbol A_1\) that are not also in \(\boldsymbol A_2\).

The algorithm runs with complexity \(O(E_1 + V_1 + E_2 + V_2)\).

If enabled during compilation, and the vertex labels are integers bounded by the sizes of the graphs, this algorithm runs in parallel.

Examples

>>> g = gt.random_graph(100, lambda: (3,3))
>>> u = g.copy()
>>> gt.similarity(u, g)
1.0
>>> gt.random_rewire(u)
22
>>> gt.similarity(u, g)
0.046666...