# graph_tool.clustering - Clustering coefficients¶

Provides algorithms for calculation of clustering coefficients, aka. transitivity.

## Summary¶

 local_clustering Return the local clustering coefficients for all vertices. global_clustering Return the global clustering coefficient. extended_clustering Return the extended clustering coefficients for all vertices. motifs Count the occurrence of k-size subgraphs (motifs). motif_significance Obtain the motif significance profile, for subgraphs with k vertices.

## Contents¶

graph_tool.clustering.local_clustering(g, prop=None, undirected=True)

Return the local clustering coefficients for all vertices.

Parameters: g : Graph Graph to be used. prop : PropertyMap or string, optional Vertex property map where results will be stored. If specified, this parameter will also be the return value. undirected : bool (default: True) Calculate the undirected clustering coefficient, if graph is directed (this option has no effect if the graph is undirected). prop : PropertyMap Vertex property containing the clustering coefficients.

global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient
motifs
motif counting

Notes

The local clustering coefficient [watts-collective-1998] $$c_i$$ is defined as

$c_i = \frac{|\{e_{jk}\}|}{k_i(k_i-1)} :\, v_j,v_k \in N_i,\, e_{jk} \in E$

where $$k_i$$ is the out-degree of vertex $$i$$, and

$N_i = \{v_j : e_{ij} \in E\}$

is the set of out-neighbours of vertex $$i$$. For undirected graphs the value of $$c_i$$ is normalized as

$c'_i = 2c_i.$

The implemented algorithm runs in $$O(|V|\left< k\right>^3)$$ time, where $$\left< k\right>$$ is the average out-degree.

If enabled during compilation, this algorithm runs in parallel.

References

 [watts-collective-1998] (1, 2) D. J. Watts and Steven Strogatz, “Collective dynamics of ‘small-world’ networks”, Nature, vol. 393, pp 440-442, 1998. DOI: 10.1038/30918

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> clust = gt.local_clustering(g)
>>> print(gt.vertex_average(g, clust))
(0.007177777777777778, 0.0003966968553716155)

graph_tool.clustering.global_clustering(g)

Return the global clustering coefficient.

Parameters: g : Graph Graph to be used. c : tuple of floats Global clustering coefficient and standard deviation (jacknife method)

local_clustering
local clustering coefficient
extended_clustering
extended (generalized) clustering coefficient
motifs
motif counting

Notes

The global clustering coefficient [newman-structure-2003] $$c$$ is defined as

$c = 3 \times \frac{\text{number of triangles}} {\text{number of connected triples}}$

The implemented algorithm runs in $$O(|V|\left< k\right>^3)$$ time, where $$\left< k\right>$$ is the average (total) degree.

If enabled during compilation, this algorithm runs in parallel.

References

 [newman-structure-2003] (1, 2) M. E. J. Newman, “The structure and function of complex networks”, SIAM Review, vol. 45, pp. 167-256, 2003, DOI: 10.1137/S003614450342480

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> print(gt.global_clustering(g))
(0.007177777777777778, 0.0003970939493209407)

graph_tool.clustering.extended_clustering(g, props=None, max_depth=3, undirected=False)

Return the extended clustering coefficients for all vertices.

Parameters: g : Graph Graph to be used. props : list of PropertyMap objects, optional list of vertex property maps where results will be stored. If specified, this parameter will also be the return value. max_depth : int, optional Maximum clustering order (default: 3). undirected : bool, optional Calculate the undirected clustering coefficients, if graph is directed (this option has no effect if the graph is undirected). prop : list of PropertyMap objects List of vertex properties containing the clustering coefficients.

local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
motifs
motif counting

Notes

The extended clustering coefficient $$c^d_i$$ of order $$d$$ is defined as

$c^d_i = \frac{\left|\right\{ \{u,v\}; u,v \in N_i | d_{G(V\setminus \{i\})}(u,v) = d \left\}\right|}{{\left|N_i\right| \choose 2}},$

where $$d_G(u,v)$$ is the shortest distance from vertex $$u$$ to $$v$$ in graph $$G$$, and

$N_i = \{v_j : e_{ij} \in E\}$

is the set of out-neighbours of $$i$$. According to the above definition, we have that the traditional local clustering coefficient is recovered for $$d=1$$, i.e., $$c^1_i = c_i$$.

The implemented algorithm runs in $$O(|V|\left<k\right>^{2+\text{max-depth}})$$ worst time, where $$\left< k\right>$$ is the average out-degree.

If enabled during compilation, this algorithm runs in parallel.

References

 [abdo-clustering] A. H. Abdo, A. P. S. de Moura, “Clustering as a measure of the local topology of networks”, arXiv: physics/0605235

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> clusts = gt.extended_clustering(g, max_depth=5)
>>> for i in range(0, 5):
...    print(gt.vertex_average(g, clusts[i]))
...
(0.005535, 0.00046266726404860573)
(0.02351, 0.0009395843702876762)
(0.11651833333333333, 0.0019837472110181336)
(0.391765, 0.0030221072114043567)
(0.4439983333333333, 0.0030393922085216675)

graph_tool.clustering.motifs(g, k, p=1.0, motif_list=None, return_maps=False)

Count the occurrence of k-size subgraphs (motifs). A tuple with two lists is returned: the list of motifs found, and the list with their respective counts.

Parameters: g : Graph Graph to be used. k : int number of vertices of the motifs p : float or float list (optional, default: 1.0) uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth $$[1,\dots,k]$$ in the algorithm. See [wernicke-efficient-2006] for more details. motif_list : list of Graph objects, optional If supplied, the algorithms will only search for the motifs in this list (or isomorphisms). return_maps : bool (optional, default False) If True, a list will be returned, which provides for each motif graph a list of vertex property maps which map the motif to its location in the main graph. motifs : list of Graph objects List of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to in-degree sequence, out-degree-sequence, and number of edges (in this order). counts : list of ints The number of times the respective motif in the motifs list was counted vertex_maps : list of lists of PropertyMap objects List for each motif graph containing the locations in the main graph. This is only returned if return_maps == True.

motif_significance
significance profile of motifs
local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient

Notes

This functions implements the ESU and RAND-ESU algorithms described in [wernicke-efficient-2006].

If enabled during compilation, this algorithm runs in parallel.

References

 [wernicke-efficient-2006] (1, 2, 3, 4) S. Wernicke, “Efficient detection of network motifs”, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 3, Issue 4, Pages 347-359, 2006. DOI: 10.1109/TCBB.2006.51

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> motifs, counts = gt.motifs(gt.GraphView(g, directed=False), 4)
>>> print(len(motifs))
63
>>> print(counts)
[28448, 28343, 28929, 29600, 159144, 98111, 98187, 33446, 284, 176, 70, 139, 210, 37, 526, 280, 85, 156, 553, 620, 469, 585, 443, 380, 203, 362, 210, 313, 241, 100, 295, 64, 253, 278, 250, 7, 5, 3, 4, 4, 3, 2, 6, 1, 7, 2, 1, 2, 3, 1, 1, 3, 2, 3, 3, 1, 1, 2, 2, 1, 1, 1, 1]

graph_tool.clustering.motif_significance(g, k, n_shuffles=100, p=1.0, motif_list=None, threshold=0, self_loops=False, parallel_edges=False, full_output=False, shuffle_model='uncorrelated')

Obtain the motif significance profile, for subgraphs with k vertices. A tuple with two lists is returned: the list of motifs found, and their respective z-scores.

Parameters: g : Graph Graph to be used. k : int Number of vertices of the motifs n_shuffles : int (optional, default: 100) Number of shuffled networks to consider for the z-score p : float or float list (optional, default: 1.0) Uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth $$[1,\dots,k]$$ in the algorithm. See [wernicke-efficient-2006] for more details. motif_list : list of Graph objects (optional, default: None) If supplied, the algorithms will only search for the motifs in this list (isomorphisms) threshold : int (optional, default: 0) If a given motif count is below this level, it is not considered. self_loops : bool (optional, default: False) Whether or not the shuffled graphs are allowed to contain self-loops parallel_edges : bool (optional, default: False) Whether or not the shuffled graphs are allowed to contain parallel edges. full_output : bool (optional, default: False) If set to True, three additional lists are returned: the count of each motif, the average count of each motif in the shuffled networks, and the standard deviation of the average count of each motif in the shuffled networks. shuffle_model : string (optional, default: “uncorrelated”) Shuffle model to use. See random_rewire() for details. motifs : list of Graph objects List of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to in-degree sequence, out-degree-sequence, and number of edges (in this order). z-scores : list of floats The z-score of the respective motives. See below for the definition of the z-score.

motifs
motif counting or sampling
local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient

Notes

The z-score $$z_i$$ of motif i is defined as

$\begin{split}z_i = \frac{N_i - \left<N^s_i\right>} {\sqrt{\left<(N^s_i)^2\right> - \left<N^s_i\right>^2}},\end{split}$

where $$N_i$$ is the number of times motif i found, and $$N^s_i$$ is the count of the same motif but on a shuffled network. It measures how many standard deviations is each motif count, in respect to an ensemble of randomly shuffled graphs with the same degree sequence.

The z-scores values are not normalized.

If enabled during compilation, this algorithm runs in parallel.

Examples

>>> from numpy import random
>>> random.seed(10)
>>> g = gt.random_graph(100, lambda: (3,3))
>>> motifs, zscores = gt.motif_significance(g, 3)
>>> print(len(motifs))
11
>>> print(zscores)
[0.91152284386957971, 0.79611981715847313, 0.75844561796064847, -0.69370586236292986, -0.83494968780939771, -0.58698299759329431, -0.25891660868480709, 0.89, -0.07, -0.29, -0.17]