graph_tool.clustering
 Clustering coefficients¶
Provides algorithms for calculation of clustering coefficients, aka. transitivity.
Summary¶
Return the local clustering coefficients for all vertices. 

Return the global clustering coefficient. 

Return the extended clustering coefficients for all vertices. 

Count the occurrence of ksize nodeinduced subgraphs (motifs). 

Obtain the motif significance profile, for subgraphs with k vertices. 
Contents¶

graph_tool.clustering.
local_clustering
(g, weight=None, prop=None, undirected=True)[source]¶ Return the local clustering coefficients for all vertices.
 Parameters
 g
Graph
Graph to be used.
 weight
EdgePropertyMap
, optional (default: None) Edge weights. If omitted, a constant value of 1 will be used.
 prop
VertexPropertyMap
or string, optional Vertex property map where results will be stored. If specified, this parameter will also be the return value.
 undirectedbool (default: True)
Calculate the undirected clustering coefficient, if graph is directed (this option has no effect if the graph is undirected).
 g
 Returns
 prop
VertexPropertyMap
Vertex property containing the clustering coefficients.
 prop
See also
global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient
motifs
motif counting
Notes
The local clustering coefficient [wattscollective1998] \(c_i\) is defined as
\[c_i = \frac{\{e_{jk}\}}{k_i(k_i1)} :\, v_j,v_k \in N_i,\, e_{jk} \in E\]where \(k_i\) is the outdegree of vertex \(i\), and
\[N_i = \{v_j : e_{ij} \in E\}\]is the set of outneighbors of vertex \(i\). For undirected graphs the value of \(c_i\) is normalized as
\[c'_i = 2c_i.\]The implemented algorithm runs in \(O(V\left<k^2\right>)\) time, where \(\left<k^2\right>\) is second moment of the degree distribution.
If enabled during compilation, this algorithm runs in parallel.
References
 wattscollective1998
D. J. Watts and Steven Strogatz, “Collective dynamics of ‘smallworld’ networks”, Nature, vol. 393, pp 440442, 1998. DOI: 10.1038/30918 [scihub, @tor]
Examples
>>> g = gt.collection.data["karate"] >>> clust = gt.local_clustering(g) >>> print(gt.vertex_average(g, clust)) (0.5706384782..., 0.05869813676...)

graph_tool.clustering.
global_clustering
(g, weight=None, ret_counts=False)[source]¶ Return the global clustering coefficient.
 Parameters
 g
Graph
Graph to be used.
 weight
EdgePropertyMap
, optional (default: None) Edge weights. If omitted, a constant value of 1 will be used.
 ret_counts
boolean
(optional, default:False
) If
True
the number of triangles and connected triples are also returned.
 g
 Returns
 ctuple of floats
Global clustering coefficient and standard deviation (jackknife method)
 trianglesint (if
ret_counts is True
) Number of triangles.
 triplesint (if
ret_counts is True
) Number of connected triples.
See also
local_clustering
local clustering coefficient
extended_clustering
extended (generalized) clustering coefficient
motifs
motif counting
Notes
The global clustering coefficient [newmanstructure2003] \(c\) is defined as
\[c = 3 \times \frac{\text{number of triangles}} {\text{number of connected triples}}\]If weights are given, the following definition is used:
\[c = \frac{\mathrm{Tr}{{\boldsymbol A}^3}}{\sum_{i\ne j}[{\boldsymbol A}^2]_{ij}},\]where \(\boldsymbol A\) is the weighted adjacency matrix, and it is assumed that the weights are normalized, i.e. \(A_{ij} \le 1\).
The implemented algorithm runs in time \(O(V\left<k^2\right>)\), where \(\left< k^2\right>\) is the second moment of the degree distribution.
If enabled during compilation, this algorithm runs in parallel.
References
 newmanstructure2003
M. E. J. Newman, “The structure and function of complex networks”, SIAM Review, vol. 45, pp. 167256, 2003, DOI: 10.1137/S003614450342480 [scihub, @tor]
Examples
>>> g = gt.collection.data["karate"] >>> print(gt.global_clustering(g)) (0.2556818181..., 0.06314746595...)

graph_tool.clustering.
extended_clustering
(g, props=None, max_depth=3, undirected=False)[source]¶ Return the extended clustering coefficients for all vertices.
 Parameters
 g
Graph
Graph to be used.
 propslist of
VertexPropertyMap
objects (optional, default:None
) list of vertex property maps where results will be stored. If specified, this parameter will also be the return value.
 max_depthint (optional, default:
3
) Maximum clustering order.
 undirectedboolean (optional, default:
False
) Calculate the undirected clustering coefficients, if graph is directed (this option has no effect if the graph is undirected).
 g
 Returns
 proplist of
VertexPropertyMap
objects List of vertex properties containing the clustering coefficients.
 proplist of
See also
local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
motifs
motif counting
Notes
The extended clustering coefficient \(c^d_i\) of order \(d\) is defined as
\[c^d_i = \frac{\left\right\{ \{u,v\}; u,v \in N_i  d_{G(V\setminus \{i\})}(u,v) = d \left\}\right}{{\leftN_i\right \choose 2}},\]where \(d_G(u,v)\) is the shortest distance from vertex \(u\) to \(v\) in graph \(G\), and
\[N_i = \{v_j : e_{ij} \in E\}\]is the set of outneighbors of \(i\). According to the above definition, we have that the traditional local clustering coefficient is recovered for \(d=1\), i.e., \(c^1_i = c_i\).
The implemented algorithm runs in \(O(V\left<k\right>^{2+\text{maxdepth}})\) worst time, where \(\left< k\right>\) is the average outdegree.
If enabled during compilation, this algorithm runs in parallel.
References
 abdoclustering
A. H. Abdo, A. P. S. de Moura, “Clustering as a measure of the local topology of networks”, arXiv: physics/0605235
Examples
>>> g = gt.collection.data["karate"] >>> clusts = gt.extended_clustering(g, max_depth=5) >>> for i in range(0, 5): ... print(gt.vertex_average(g, clusts[i])) ... (0.5706384782076..., 0.05869813676256...) (0.3260389360735..., 0.04810773205917...) (0.0530678759917..., 0.01513061504691...) (0.0061658977316..., 0.00310690511463...) (0.0002162629757..., 0.00021305890271...)

graph_tool.clustering.
motifs
(g, k, p=1.0, motif_list=None, return_maps=False)[source]¶ Count the occurrence of ksize nodeinduced subgraphs (motifs). A tuple with two lists is returned: the list of motifs found, and the list with their respective counts.
 Parameters
 g
Graph
Graph to be used.
 kint
number of vertices of the motifs
 pfloat or float list (optional, default: 1.0)
uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth \([1,\dots,k]\) in the algorithm. See [wernickeefficient2006] for more details.
 motif_listlist of
Graph
objects, optional If supplied, the algorithms will only search for the motifs in this list (or isomorphisms).
 return_mapsbool (optional, default False)
If
True
, a list will be returned, which provides for each motif graph a list of vertex property maps which map the motif to its location in the main graph.
 g
 Returns
 motifslist of
Graph
objects List of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to indegree sequence, outdegreesequence, and number of edges (in this order).
 countslist of ints
The number of times the respective motif in the motifs list was counted
 vertex_mapslist of lists of
VertexPropertyMap
objects List for each motif graph containing the locations in the main graph. This is only returned if return_maps == True.
 motifslist of
See also
motif_significance
significance profile of motifs
local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient
Notes
This functions implements the ESU and RANDESU algorithms described in [wernickeefficient2006].
If enabled during compilation, this algorithm runs in parallel.
References
 wernickeefficient2006(1,2)
S. Wernicke, “Efficient detection of network motifs”, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 3, Issue 4, Pages 347359, 2006. DOI: 10.1109/TCBB.2006.51 [scihub, @tor]
 inducedsubgraphisomorphism
http://en.wikipedia.org/wiki/Induced_subgraph_isomorphism_problem
Examples
>>> g = gt.random_graph(1000, lambda: (5,5)) >>> motifs, counts = gt.motifs(gt.GraphView(g, directed=False), 4) >>> print(len(motifs)) 11 >>> print(counts) [116386, 392916, 443, 507, 2574, 1124, 741, 5, 5, 8, 2]

graph_tool.clustering.
motif_significance
(g, k, n_shuffles=100, p=1.0, motif_list=None, threshold=0, self_loops=False, parallel_edges=False, full_output=False, shuffle_model='configuration')[source]¶ Obtain the motif significance profile, for subgraphs with k vertices. A tuple with two lists is returned: the list of motifs found, and their respective zscores.
 Parameters
 g
Graph
Graph to be used.
 kint
Number of vertices of the motifs
 n_shufflesint (optional, default: 100)
Number of shuffled networks to consider for the zscore
 pfloat or float list (optional, default: 1.0)
Uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth \([1,\dots,k]\) in the algorithm. See [wernickeefficient2006] for more details.
 motif_listlist of
Graph
objects (optional, default: None) If supplied, the algorithms will only search for the motifs in this list (isomorphisms)
 thresholdint (optional, default: 0)
If a given motif count is below this level, it is not considered.
 self_loopsbool (optional, default: False)
Whether or not the shuffled graphs are allowed to contain selfloops
 parallel_edgesbool (optional, default: False)
Whether or not the shuffled graphs are allowed to contain parallel edges.
 full_outputbool (optional, default: False)
If set to True, three additional lists are returned: the count of each motif, the average count of each motif in the shuffled networks, and the standard deviation of the average count of each motif in the shuffled networks.
 shuffle_modelstring (optional, default: “configuration”)
Shuffle model to use. See
random_rewire()
for details.
 g
 Returns
 motifslist of
Graph
objects List of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to indegree sequence, outdegreesequence, and number of edges (in this order).
 zscoreslist of floats
The zscore of the respective motives. See below for the definition of the zscore.
 motifslist of
See also
motifs
motif counting or sampling
local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient
Notes
The zscore \(z_i\) of motif i is defined as
\[z_i = \frac{N_i  \left<N^s_i\right>} {\sqrt{\left<(N^s_i)^2\right>  \left<N^s_i\right>^2}},\]where \(N_i\) is the number of times motif i found, and \(N^s_i\) is the count of the same motif but on a shuffled network. It measures how many standard deviations is each motif count, in respect to an ensemble of randomly shuffled graphs with the same degree sequence.
The zscores values are not normalized.
If enabled during compilation, this algorithm runs in parallel.
References
 wernickeefficient2006
S. Wernicke, “Efficient detection of network motifs”, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 3, Issue 4, Pages 347359, 2006. DOI: 10.1109/TCBB.2006.51 [scihub, @tor]
Examples
>>> from numpy import random >>> random.seed(10) >>> g = gt.random_graph(100, lambda: (3,3)) >>> motifs, zscores = gt.motif_significance(g, 3) >>> print(len(motifs)) 12 >>> print(zscores) [2.59252643351441, 2.5966529814390387, 2.3459237708258587, 1.0829180621127024, 1.3368754665984663, 2.33027728409781, 3.055817397993647, 0.1, 0.15, 0.19, 0.4, 0.01]