graph_tool.clustering - Clustering coefficients

Provides algorithms for calculation of clustering coefficients, aka. transitivity.

Summary

local_clustering Return the local clustering coefficients for all vertices.
global_clustering Return the global clustering coefficient.
extended_clustering Return the extended clustering coefficients for all vertices.
motifs Count the occurrence of k-size subgraphs (motifs).
motif_significance Obtain the motif significance profile, for subgraphs with k vertices.

Contents

graph_tool.clustering.local_clustering(g, prop=None, undirected=True)

Return the local clustering coefficients for all vertices.

Parameters:

g : Graph

Graph to be used.

prop : PropertyMap or string, optional

Vertex property map where results will be stored. If specified, this parameter will also be the return value.

undirected : bool (default: True)

Calculate the undirected clustering coefficient, if graph is directed (this option has no effect if the graph is undirected).

Returns:

prop : PropertyMap

Vertex property containing the clustering coefficients.

See also

global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient
motifs
motif counting

Notes

The local clustering coefficient [watts-collective-1998] \(c_i\) is defined as

\[c_i = \frac{|\{e_{jk}\}|}{k_i(k_i-1)} :\, v_j,v_k \in N_i,\, e_{jk} \in E\]

where \(k_i\) is the out-degree of vertex \(i\), and

\[N_i = \{v_j : e_{ij} \in E\}\]

is the set of out-neighbours of vertex \(i\). For undirected graphs the value of \(c_i\) is normalized as

\[c'_i = 2c_i.\]

The implemented algorithm runs in \(O(|V|\left< k\right>^3)\) time, where \(\left< k\right>\) is the average out-degree.

If enabled during compilation, this algorithm runs in parallel.

References

[watts-collective-1998](1, 2) D. J. Watts and Steven Strogatz, “Collective dynamics of ‘small-world’ networks”, Nature, vol. 393, pp 440-442, 1998. DOI: 10.1038/30918

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> clust = gt.local_clustering(g)
>>> print(gt.vertex_average(g, clust))
(0.007177777777777778, 0.0003966968553716155)
graph_tool.clustering.global_clustering(g)

Return the global clustering coefficient.

Parameters:

g : Graph

Graph to be used.

Returns:

c : tuple of floats

Global clustering coefficient and standard deviation (jacknife method)

See also

local_clustering
local clustering coefficient
extended_clustering
extended (generalized) clustering coefficient
motifs
motif counting

Notes

The global clustering coefficient [newman-structure-2003] \(c\) is defined as

\[c = 3 \times \frac{\text{number of triangles}} {\text{number of connected triples}}\]

The implemented algorithm runs in \(O(|V|\left< k\right>^3)\) time, where \(\left< k\right>\) is the average (total) degree.

If enabled during compilation, this algorithm runs in parallel.

References

[newman-structure-2003](1, 2) M. E. J. Newman, “The structure and function of complex networks”, SIAM Review, vol. 45, pp. 167-256, 2003, DOI: 10.1137/S003614450342480

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> print(gt.global_clustering(g))
(0.007177777777777778, 0.0003970939493209407)
graph_tool.clustering.extended_clustering(g, props=None, max_depth=3, undirected=False)

Return the extended clustering coefficients for all vertices.

Parameters:

g : Graph

Graph to be used.

props : list of PropertyMap objects, optional

list of vertex property maps where results will be stored. If specified, this parameter will also be the return value.

max_depth : int, optional

Maximum clustering order (default: 3).

undirected : bool, optional

Calculate the undirected clustering coefficients, if graph is directed (this option has no effect if the graph is undirected).

Returns:

prop : list of PropertyMap objects

List of vertex properties containing the clustering coefficients.

See also

local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
motifs
motif counting

Notes

The extended clustering coefficient \(c^d_i\) of order \(d\) is defined as

\[c^d_i = \frac{\left|\right\{ \{u,v\}; u,v \in N_i | d_{G(V\setminus \{i\})}(u,v) = d \left\}\right|}{{\left|N_i\right| \choose 2}},\]

where \(d_G(u,v)\) is the shortest distance from vertex \(u\) to \(v\) in graph \(G\), and

\[N_i = \{v_j : e_{ij} \in E\}\]

is the set of out-neighbours of \(i\). According to the above definition, we have that the traditional local clustering coefficient is recovered for \(d=1\), i.e., \(c^1_i = c_i\).

The implemented algorithm runs in \(O(|V|\left<k\right>^{2+\text{max-depth}})\) worst time, where \(\left< k\right>\) is the average out-degree.

If enabled during compilation, this algorithm runs in parallel.

References

[abdo-clustering]A. H. Abdo, A. P. S. de Moura, “Clustering as a measure of the local topology of networks”, arXiv: physics/0605235

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> clusts = gt.extended_clustering(g, max_depth=5)
>>> for i in range(0, 5):
...    print(gt.vertex_average(g, clusts[i]))
...
(0.005535, 0.00046266726404860573)
(0.02351, 0.0009395843702876762)
(0.11651833333333333, 0.0019837472110181336)
(0.391765, 0.0030221072114043567)
(0.4439983333333333, 0.0030393922085216675)
graph_tool.clustering.motifs(g, k, p=1.0, motif_list=None, return_maps=False)

Count the occurrence of k-size subgraphs (motifs). A tuple with two lists is returned: the list of motifs found, and the list with their respective counts.

Parameters:

g : Graph

Graph to be used.

k : int

number of vertices of the motifs

p : float or float list (optional, default: 1.0)

uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth \([1,\dots,k]\) in the algorithm. See [wernicke-efficient-2006] for more details.

motif_list : list of Graph objects, optional

If supplied, the algorithms will only search for the motifs in this list (or isomorphisms).

return_maps : bool (optional, default False)

If True, a list will be returned, which provides for each motif graph a list of vertex property maps which map the motif to its location in the main graph.

Returns:

motifs : list of Graph objects

List of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to in-degree sequence, out-degree-sequence, and number of edges (in this order).

counts : list of ints

The number of times the respective motif in the motifs list was counted

vertex_maps : list of lists of PropertyMap objects

List for each motif graph containing the locations in the main graph. This is only returned if return_maps == True.

See also

motif_significance
significance profile of motifs
local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient

Notes

This functions implements the ESU and RAND-ESU algorithms described in [wernicke-efficient-2006].

If enabled during compilation, this algorithm runs in parallel.

References

[wernicke-efficient-2006](1, 2, 3, 4) S. Wernicke, “Efficient detection of network motifs”, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 3, Issue 4, Pages 347-359, 2006. DOI: 10.1109/TCBB.2006.51

Examples

>>> g = gt.random_graph(1000, lambda: (5,5))
>>> motifs, counts = gt.motifs(gt.GraphView(g, directed=False), 4)
>>> print(len(motifs))
63
>>> print(counts)
[28448, 28343, 28929, 29600, 159144, 98111, 98187, 33446, 284, 176, 70, 139, 210, 37, 526, 280, 85, 156, 553, 620, 469, 585, 443, 380, 203, 362, 210, 313, 241, 100, 295, 64, 253, 278, 250, 7, 5, 3, 4, 4, 3, 2, 6, 1, 7, 2, 1, 2, 3, 1, 1, 3, 2, 3, 3, 1, 1, 2, 2, 1, 1, 1, 1]
graph_tool.clustering.motif_significance(g, k, n_shuffles=100, p=1.0, motif_list=None, threshold=0, self_loops=False, parallel_edges=False, full_output=False, shuffle_model='uncorrelated')

Obtain the motif significance profile, for subgraphs with k vertices. A tuple with two lists is returned: the list of motifs found, and their respective z-scores.

Parameters:

g : Graph

Graph to be used.

k : int

Number of vertices of the motifs

n_shuffles : int (optional, default: 100)

Number of shuffled networks to consider for the z-score

p : float or float list (optional, default: 1.0)

Uniform fraction of the motifs to be sampled. If a float list is provided, it will be used as the fraction at each depth \([1,\dots,k]\) in the algorithm. See [wernicke-efficient-2006] for more details.

motif_list : list of Graph objects (optional, default: None)

If supplied, the algorithms will only search for the motifs in this list (isomorphisms)

threshold : int (optional, default: 0)

If a given motif count is below this level, it is not considered.

self_loops : bool (optional, default: False)

Whether or not the shuffled graphs are allowed to contain self-loops

parallel_edges : bool (optional, default: False)

Whether or not the shuffled graphs are allowed to contain parallel edges.

full_output : bool (optional, default: False)

If set to True, three additional lists are returned: the count of each motif, the average count of each motif in the shuffled networks, and the standard deviation of the average count of each motif in the shuffled networks.

shuffle_model : string (optional, default: “uncorrelated”)

Shuffle model to use. See random_rewire() for details.

Returns:

motifs : list of Graph objects

List of motifs of size k found in the Graph. Graphs are grouped according to their isomorphism class, and only one of each class appears in this list. The list is sorted according to in-degree sequence, out-degree-sequence, and number of edges (in this order).

z-scores : list of floats

The z-score of the respective motives. See below for the definition of the z-score.

See also

motifs
motif counting or sampling
local_clustering
local clustering coefficient
global_clustering
global clustering coefficient
extended_clustering
extended (generalized) clustering coefficient

Notes

The z-score \(z_i\) of motif i is defined as

\[\begin{split}z_i = \frac{N_i - \left<N^s_i\right>} {\sqrt{\left<(N^s_i)^2\right> - \left<N^s_i\right>^2}},\end{split}\]

where \(N_i\) is the number of times motif i found, and \(N^s_i\) is the count of the same motif but on a shuffled network. It measures how many standard deviations is each motif count, in respect to an ensemble of randomly shuffled graphs with the same degree sequence.

The z-scores values are not normalized.

If enabled during compilation, this algorithm runs in parallel.

Examples

>>> from numpy import random
>>> random.seed(10)
>>> g = gt.random_graph(100, lambda: (3,3))
>>> motifs, zscores = gt.motif_significance(g, 3)
>>> print(len(motifs))
11
>>> print(zscores)
[0.91152284386957971, 0.79611981715847313, 0.75844561796064847, -0.69370586236292986, -0.83494968780939771, -0.58698299759329431, -0.25891660868480709, 0.89, -0.07, -0.29, -0.17]

Table Of Contents

Previous topic

graph_tool.centrality - Centrality measures

Next topic

graph_tool.collection - Dataset collection

This Page

Latest development version docs are also available.