# graph_tool.clustering - Clustering coefficients¶

Provides algorithms for calculation of clustering coefficients, aka. transitivity.

## Summary¶

 local_clustering Return the local clustering coefficients for all vertices. global_clustering Return the global clustering coefficient. extended_clustering Return the extended clustering coefficients for all vertices. motifs Count the occurrence of k-size node-induced subgraphs (motifs). motif_significance Obtain the motif significance profile, for subgraphs with k vertices.

## Contents¶

graph_tool.clustering.local_clustering(g, weight=None, prop=None, undirected=True)[source]

Return the local clustering coefficients for all vertices.

Parameters
gGraph

Graph to be used.

weightEdgePropertyMap, optional (default: None)

Edge weights. If omitted, a constant value of 1 will be used.

propVertexPropertyMap or string, optional

Vertex property map where results will be stored. If specified, this parameter will also be the return value.

undirectedbool (default: True)

Calculate the undirected clustering coefficient, if graph is directed (this option has no effect if the graph is undirected).

Returns
propVertexPropertyMap

Vertex property containing the clustering coefficients.

global_clustering

global clustering coefficient

extended_clustering

extended (generalized) clustering coefficient

motifs

motif counting

Notes

The local clustering coefficient [watts-collective-1998] $$c_i$$ is defined as

$c_i = \frac{|\{e_{jk}\}|}{k_i(k_i-1)} :\, v_j,v_k \in N_i,\, e_{jk} \in E$

where $$k_i$$ is the out-degree of vertex $$i$$, and

$N_i = \{v_j : e_{ij} \in E\}$

is the set of out-neighbors of vertex $$i$$. For undirected graphs the value of $$c_i$$ is normalized as

$c'_i = 2c_i.$

The implemented algorithm runs in $$O(|V|\left<k^2\right>)$$ time, where $$\left<k^2\right>$$ is second moment of the degree distribution.

If enabled during compilation, this algorithm runs in parallel.

References

watts-collective-1998

D. J. Watts and Steven Strogatz, “Collective dynamics of ‘small-world’ networks”, Nature, vol. 393, pp 440-442, 1998. DOI: 10.1038/30918 [sci-hub, @tor]

Examples

>>> g = gt.collection.data["karate"]
>>> clust = gt.local_clustering(g)
>>> print(gt.vertex_average(g, clust))
(0.5706384782..., 0.05869813676...)

graph_tool.clustering.global_clustering(g, weight=None, ret_counts=False)[source]

Return the global clustering coefficient.

Parameters
gGraph

Graph to be used.

weightEdgePropertyMap, optional (default: None)

Edge weights. If omitted, a constant value of 1 will be used.

ret_countsboolean (optional, default: False)

If True the number of triangles and connected triples are also returned.

Returns
ctuple of floats

Global clustering coefficient and standard deviation (jackknife method)

trianglesint (if ret_counts is True)

Number of triangles.

triplesint (if ret_counts is True)

Number of connected triples.

local_clustering

local clustering coefficient

extended_clustering

extended (generalized) clustering coefficient

motifs

motif counting

Notes

The global clustering coefficient [newman-structure-2003] $$c$$ is defined as

$c = 3 \times \frac{\text{number of triangles}} {\text{number of connected triples}}$

If weights are given, the following definition is used:

$c = \frac{\mathrm{Tr}{{\boldsymbol A}^3}}{\sum_{i\ne j}[{\boldsymbol A}^2]_{ij}},$

where $$\boldsymbol A$$ is the weighted adjacency matrix, and it is assumed that the weights are normalized, i.e. $$A_{ij} \le 1$$.

The implemented algorithm runs in time $$O(|V|\left<k^2\right>)$$, where $$\left< k^2\right>$$ is the second moment of the degree distribution.

If enabled during compilation, this algorithm runs in parallel.

References

newman-structure-2003

M. E. J. Newman, “The structure and function of complex networks”, SIAM Review, vol. 45, pp. 167-256, 2003, DOI: 10.1137/S003614450342480 [sci-hub, @tor]

Examples

>>> g = gt.collection.data["karate"]
>>> print(gt.global_clustering(g))
(0.2556818181..., 0.06314746595...)

graph_tool.clustering.extended_clustering(g, props=None, max_depth=3, undirected=False)[source]

Return the extended clustering coefficients for all vertices.

Parameters
gGraph

Graph to be used.

propslist of VertexPropertyMap objects (optional, default: None)

list of vertex property maps where results will be stored. If specified, this parameter will also be the return value.

max_depthint (optional, default: 3)

Maximum clustering order.

undirectedboolean (optional, default: False)

Calculate the undirected clustering coefficients, if graph is directed (this option has no effect if the graph is undirected).

Returns
proplist of VertexPropertyMap objects

List of vertex properties containing the clustering coefficients.

local_clustering

local clustering coefficient

global_clustering

global clustering coefficient

motifs

motif counting

Notes

The extended clustering coefficient $$c^d_i$$ of order $$d$$ is defined as

$c^d_i = \frac{\left|\right\{ \{u,v\}; u,v \in N_i | d_{G(V\setminus \{i\})}(u,v) = d \left\}\right|}{{\left|N_i\right| \choose 2}},$

where $$d_G(u,v)$$ is the shortest distance from vertex $$u$$ to $$v$$ in graph $$G$$, and

$N_i = \{v_j : e_{ij} \in E\}$

is the set of out-neighbors of $$i$$. According to the above definition, we have that the traditional local clustering coefficient is recovered for $$d=1$$, i.e., $$c^1_i = c_i$$.

The implemented algorithm runs in $$O(|V|\left<k\right>^{2+\text{max-depth}})$$ worst time, where $$\left< k\right>$$ is the average out-degree.

If enabled during compilation, this algorithm runs in parallel.

References

abdo-clustering

A. H. Abdo, A. P. S. de Moura, “Clustering as a measure of the local topology of networks”, arXiv: physics/0605235

Examples

>>> g = gt.collection.data["karate"]
>>> clusts = gt.extended_clustering(g, max_depth=5)
>>> for i in range(0, 5):
...    print(gt.vertex_average(g, clusts[i]))
...
(0.5706384782076..., 0.05869813676256...)
(0.3260389360735..., 0.04810773205917...)
(0.0530678759917..., 0.01513061504691...)
(0.0061658977316..., 0.00310690511463...)
(0.0002162629757..., 0.00021305890271...)

graph_tool.clustering.motifs(g, k, p=1.0, motif_list=None, return_maps=False)[source]

Count the occurrence of k-size node-induced subgraphs (motifs). A tuple with two lists is returned: the list of motifs found, and the list with their respective counts.

Parameters
gGraph

Graph to be used.

kint

number of vertices of the motifs

pfloat or float list (optional, default: 1.0)

uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth $$[1,\dots,k]$$ in the algorithm. See [wernicke-efficient-2006] for more details.

motif_listlist of Graph objects, optional

If supplied, the algorithms will only search for the motifs in this list (or isomorphisms).

return_mapsbool (optional, default False)

If True, a list will be returned, which provides for each motif graph a list of vertex property maps which map the motif to its location in the main graph.

Returns
motifslist of Graph objects

List of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to in-degree sequence, out-degree-sequence, and number of edges (in this order).

countslist of ints

The number of times the respective motif in the motifs list was counted

vertex_mapslist of lists of VertexPropertyMap objects

List for each motif graph containing the locations in the main graph. This is only returned if return_maps == True.

motif_significance

significance profile of motifs

local_clustering

local clustering coefficient

global_clustering

global clustering coefficient

extended_clustering

extended (generalized) clustering coefficient

Notes

This functions implements the ESU and RAND-ESU algorithms described in [wernicke-efficient-2006].

If enabled during compilation, this algorithm runs in parallel.

References

wernicke-efficient-2006(1,2)

S. Wernicke, “Efficient detection of network motifs”, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 3, Issue 4, Pages 347-359, 2006. DOI: 10.1109/TCBB.2006.51 [sci-hub, @tor]

induced-subgraph-isomorphism

http://en.wikipedia.org/wiki/Induced_subgraph_isomorphism_problem

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> motifs, counts = gt.motifs(gt.GraphView(g, directed=False), 4)
>>> print(len(motifs))
11
>>> print(counts)
[116386, 392916, 443, 507, 2574, 1124, 741, 5, 5, 8, 2]

graph_tool.clustering.motif_significance(g, k, n_shuffles=100, p=1.0, motif_list=None, threshold=0, self_loops=False, parallel_edges=False, full_output=False, shuffle_model='configuration')[source]

Obtain the motif significance profile, for subgraphs with k vertices. A tuple with two lists is returned: the list of motifs found, and their respective z-scores.

Parameters
gGraph

Graph to be used.

kint

Number of vertices of the motifs

n_shufflesint (optional, default: 100)

Number of shuffled networks to consider for the z-score

pfloat or float list (optional, default: 1.0)

Uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth $$[1,\dots,k]$$ in the algorithm. See [wernicke-efficient-2006] for more details.

motif_listlist of Graph objects (optional, default: None)

If supplied, the algorithms will only search for the motifs in this list (isomorphisms)

thresholdint (optional, default: 0)

If a given motif count is below this level, it is not considered.

self_loopsbool (optional, default: False)

Whether or not the shuffled graphs are allowed to contain self-loops

parallel_edgesbool (optional, default: False)

Whether or not the shuffled graphs are allowed to contain parallel edges.

full_outputbool (optional, default: False)

If set to True, three additional lists are returned: the count of each motif, the average count of each motif in the shuffled networks, and the standard deviation of the average count of each motif in the shuffled networks.

shuffle_modelstring (optional, default: “configuration”)

Shuffle model to use. See random_rewire() for details.

Returns
motifslist of Graph objects

List of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to in-degree sequence, out-degree-sequence, and number of edges (in this order).

z-scoreslist of floats

The z-score of the respective motives. See below for the definition of the z-score.

motifs

motif counting or sampling

local_clustering

local clustering coefficient

global_clustering

global clustering coefficient

extended_clustering

extended (generalized) clustering coefficient

Notes

The z-score $$z_i$$ of motif i is defined as

$z_i = \frac{N_i - \left<N^s_i\right>} {\sqrt{\left<(N^s_i)^2\right> - \left<N^s_i\right>^2}},$

where $$N_i$$ is the number of times motif i found, and $$N^s_i$$ is the count of the same motif but on a shuffled network. It measures how many standard deviations is each motif count, in respect to an ensemble of randomly shuffled graphs with the same degree sequence.

The z-scores values are not normalized.

If enabled during compilation, this algorithm runs in parallel.

References

wernicke-efficient-2006

S. Wernicke, “Efficient detection of network motifs”, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 3, Issue 4, Pages 347-359, 2006. DOI: 10.1109/TCBB.2006.51 [sci-hub, @tor]

Examples

>>> from numpy import random
>>> random.seed(10)
>>> g = gt.random_graph(100, lambda: (3,3))
>>> motifs, zscores = gt.motif_significance(g, 3)
>>> print(len(motifs))
12
>>> print(zscores)
[2.59252643351441, 2.5966529814390387, 2.3459237708258587, -1.0829180621127024, -1.3368754665984663, -2.33027728409781, -3.055817397993647, -0.1, -0.15, -0.19, -0.4, -0.01]