graph_tool.clustering
 Clustering coefficients¶
Provides algorithms for calculation of clustering coefficients, aka. transitivity.
Summary¶
local_clustering 
Return the local clustering coefficients for all vertices. 
global_clustering 
Return the global clustering coefficient. 
extended_clustering 
Return the extended clustering coefficients for all vertices. 
motifs 
Count the occurrence of ksize nodeinduced subgraphs (motifs). 
motif_significance 
Obtain the motif significance profile, for subgraphs with k vertices. 
Contents¶

graph_tool.clustering.
local_clustering
(g, prop=None, undirected=True)[source]¶ Return the local clustering coefficients for all vertices.
Parameters: g :
Graph
Graph to be used.
prop :
PropertyMap
or string, optionalVertex property map where results will be stored. If specified, this parameter will also be the return value.
undirected : bool (default: True)
Calculate the undirected clustering coefficient, if graph is directed (this option has no effect if the graph is undirected).
Returns: prop :
PropertyMap
Vertex property containing the clustering coefficients.
See also
global_clustering
 global clustering coefficient
extended_clustering
 extended (generalized) clustering coefficient
motifs
 motif counting
Notes
The local clustering coefficient [wattscollective1998] \(c_i\) is defined as
\[c_i = \frac{\{e_{jk}\}}{k_i(k_i1)} :\, v_j,v_k \in N_i,\, e_{jk} \in E\]where \(k_i\) is the outdegree of vertex \(i\), and
\[N_i = \{v_j : e_{ij} \in E\}\]is the set of outneighbors of vertex \(i\). For undirected graphs the value of \(c_i\) is normalized as
\[c'_i = 2c_i.\]The implemented algorithm runs in \(O(V\left< k\right>^2)\) time, where \(\left< k\right>\) is the average outdegree.
If enabled during compilation, this algorithm runs in parallel.
References
[wattscollective1998] (1, 2) D. J. Watts and Steven Strogatz, “Collective dynamics of ‘smallworld’ networks”, Nature, vol. 393, pp 440442, 1998. DOI: 10.1038/30918 [scihub, @tor] Examples
>>> g = gt.random_graph(1000, lambda: (5,5)) >>> clust = gt.local_clustering(g) >>> print(gt.vertex_average(g, clust)) (0.008177777777777779, 0.00042080229075093...)

graph_tool.clustering.
global_clustering
(g)[source]¶ Return the global clustering coefficient.
Parameters: g :
Graph
Graph to be used.
Returns: c : tuple of floats
Global clustering coefficient and standard deviation (jacknife method)
See also
local_clustering
 local clustering coefficient
extended_clustering
 extended (generalized) clustering coefficient
motifs
 motif counting
Notes
The global clustering coefficient [newmanstructure2003] \(c\) is defined as
\[c = 3 \times \frac{\text{number of triangles}} {\text{number of connected triples}}\]The implemented algorithm runs in \(O(V\left< k\right>^2)\) time, where \(\left< k\right>\) is the average (total) degree.
If enabled during compilation, this algorithm runs in parallel.
References
[newmanstructure2003] (1, 2) M. E. J. Newman, “The structure and function of complex networks”, SIAM Review, vol. 45, pp. 167256, 2003, DOI: 10.1137/S003614450342480 [scihub, @tor] Examples
>>> g = gt.random_graph(1000, lambda: (5,5)) >>> print(gt.global_clustering(g)) (0.008177777777777779, 0.0004212235142651...)

graph_tool.clustering.
extended_clustering
(g, props=None, max_depth=3, undirected=False)[source]¶ Return the extended clustering coefficients for all vertices.
Parameters: g :
Graph
Graph to be used.
props : list of
PropertyMap
objects, optionallist of vertex property maps where results will be stored. If specified, this parameter will also be the return value.
max_depth : int, optional
Maximum clustering order (default: 3).
undirected : bool, optional
Calculate the undirected clustering coefficients, if graph is directed (this option has no effect if the graph is undirected).
Returns: prop : list of
PropertyMap
objectsList of vertex properties containing the clustering coefficients.
See also
local_clustering
 local clustering coefficient
global_clustering
 global clustering coefficient
motifs
 motif counting
Notes
The extended clustering coefficient \(c^d_i\) of order \(d\) is defined as
\[c^d_i = \frac{\left\right\{ \{u,v\}; u,v \in N_i  d_{G(V\setminus \{i\})}(u,v) = d \left\}\right}{{\leftN_i\right \choose 2}},\]where \(d_G(u,v)\) is the shortest distance from vertex \(u\) to \(v\) in graph \(G\), and
\[N_i = \{v_j : e_{ij} \in E\}\]is the set of outneighbors of \(i\). According to the above definition, we have that the traditional local clustering coefficient is recovered for \(d=1\), i.e., \(c^1_i = c_i\).
The implemented algorithm runs in \(O(V\left<k\right>^{2+\text{maxdepth}})\) worst time, where \(\left< k\right>\) is the average outdegree.
If enabled during compilation, this algorithm runs in parallel.
References
[abdoclustering] A. H. Abdo, A. P. S. de Moura, “Clustering as a measure of the local topology of networks”, arXiv: physics/0605235 Examples
>>> g = gt.random_graph(1000, lambda: (5,5)) >>> clusts = gt.extended_clustering(g, max_depth=5) >>> for i in range(0, 5): ... print(gt.vertex_average(g, clusts[i])) ... (0.0050483333333333335, 0.0004393940240073...) (0.024593787878787878, 0.0009963004021144...) (0.11238924242424242, 0.001909615401971...) (0.40252272727272725, 0.003113987400030...) (0.43629378787878786, 0.003144159256565...)

graph_tool.clustering.
motifs
(g, k, p=1.0, motif_list=None, return_maps=False)[source]¶ Count the occurrence of ksize nodeinduced subgraphs (motifs). A tuple with two lists is returned: the list of motifs found, and the list with their respective counts.
Parameters: g :
Graph
Graph to be used.
k : int
number of vertices of the motifs
p : float or float list (optional, default: 1.0)
uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth \([1,\dots,k]\) in the algorithm. See [wernickeefficient2006] for more details.
motif_list : list of
Graph
objects, optionalIf supplied, the algorithms will only search for the motifs in this list (or isomorphisms).
return_maps : bool (optional, default False)
If
True
, a list will be returned, which provides for each motif graph a list of vertex property maps which map the motif to its location in the main graph.Returns: motifs : list of
Graph
objectsList of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to indegree sequence, outdegreesequence, and number of edges (in this order).
counts : list of ints
The number of times the respective motif in the motifs list was counted
vertex_maps : list of lists of
PropertyMap
objectsList for each motif graph containing the locations in the main graph. This is only returned if return_maps == True.
See also
motif_significance
 significance profile of motifs
local_clustering
 local clustering coefficient
global_clustering
 global clustering coefficient
extended_clustering
 extended (generalized) clustering coefficient
Notes
This functions implements the ESU and RANDESU algorithms described in [wernickeefficient2006].
If enabled during compilation, this algorithm runs in parallel.
References
[wernickeefficient2006] (1, 2, 3, 4) S. Wernicke, “Efficient detection of network motifs”, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 3, Issue 4, Pages 347359, 2006. DOI: 10.1109/TCBB.2006.51 [scihub, @tor] [inducedsubgraphisomorphism] http://en.wikipedia.org/wiki/Induced_subgraph_isomorphism_problem Examples
>>> g = gt.random_graph(1000, lambda: (5,5)) >>> motifs, counts = gt.motifs(gt.GraphView(g, directed=False), 4) >>> print(len(motifs)) 18 >>> print(counts) [115557, 390005, 627, 700, 1681, 2815, 820, 12, 27, 44, 15, 7, 12, 4, 6, 1, 2, 1]

graph_tool.clustering.
motif_significance
(g, k, n_shuffles=100, p=1.0, motif_list=None, threshold=0, self_loops=False, parallel_edges=False, full_output=False, shuffle_model='configuration')[source]¶ Obtain the motif significance profile, for subgraphs with k vertices. A tuple with two lists is returned: the list of motifs found, and their respective zscores.
Parameters: g :
Graph
Graph to be used.
k : int
Number of vertices of the motifs
n_shuffles : int (optional, default: 100)
Number of shuffled networks to consider for the zscore
p : float or float list (optional, default: 1.0)
Uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth \([1,\dots,k]\) in the algorithm. See [wernickeefficient2006] for more details.
motif_list : list of
Graph
objects (optional, default: None)If supplied, the algorithms will only search for the motifs in this list (isomorphisms)
threshold : int (optional, default: 0)
If a given motif count is below this level, it is not considered.
self_loops : bool (optional, default: False)
Whether or not the shuffled graphs are allowed to contain selfloops
parallel_edges : bool (optional, default: False)
Whether or not the shuffled graphs are allowed to contain parallel edges.
full_output : bool (optional, default: False)
If set to True, three additional lists are returned: the count of each motif, the average count of each motif in the shuffled networks, and the standard deviation of the average count of each motif in the shuffled networks.
shuffle_model : string (optional, default: “configuration”)
Shuffle model to use. See
random_rewire()
for details.Returns: motifs : list of
Graph
objectsList of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to indegree sequence, outdegreesequence, and number of edges (in this order).
zscores : list of floats
The zscore of the respective motives. See below for the definition of the zscore.
See also
motifs
 motif counting or sampling
local_clustering
 local clustering coefficient
global_clustering
 global clustering coefficient
extended_clustering
 extended (generalized) clustering coefficient
Notes
The zscore \(z_i\) of motif i is defined as
\[\begin{split}z_i = \frac{N_i  \left<N^s_i\right>} {\sqrt{\left<(N^s_i)^2\right>  \left<N^s_i\right>^2}},\end{split}\]where \(N_i\) is the number of times motif i found, and \(N^s_i\) is the count of the same motif but on a shuffled network. It measures how many standard deviations is each motif count, in respect to an ensemble of randomly shuffled graphs with the same degree sequence.
The zscores values are not normalized.
If enabled during compilation, this algorithm runs in parallel.
Examples
>>> from numpy import random >>> random.seed(10) >>> g = gt.random_graph(100, lambda: (3,3)) >>> motifs, zscores = gt.motif_significance(g, 3) >>> print(len(motifs)) 11 >>> print(zscores) [0.22728646681107012, 0.21409572051644973, 0.0070220407889021114, 0.58721419671233477, 0.37770179603294357, 0.34847335047837341, 0.88618118013255021, 0.08, 0.2, 0.38, 0.2]