# graph_tool.stats - Miscellaneous statistics¶

## Summary¶

 vertex_hist Return the vertex histogram of the given degree type or property. edge_hist Return the edge histogram of the given property. vertex_average Return the average of the given degree or vertex property. edge_average Return the average of the given degree or vertex property. label_parallel_edges Label edges which are parallel, i.e, have the same source and target vertices. remove_parallel_edges Remove all parallel edges from the graph. label_self_loops Label edges which are self-loops, i.e, the source and target vertices are the same. remove_self_loops Remove all self-loops edges from the graph. remove_labeled_edges Remove every edge e such that label[e] != 0. distance_histogram Return the shortest-distance histogram for each vertex pair in the graph.

## Contents¶

graph_tool.stats.vertex_hist(g, deg, bins=[0, 1], float_count=True)[source]

Return the vertex histogram of the given degree type or property.

Parameters
gGraph

Graph to be used.

degstring or VertexPropertyMap

Degree or property to be used for the histogram. It can be either “in”, “out” or “total”, for in-, out-, or total degree of the vertices. It can also be a vertex property map.

binslist of bins (optional, default: [0, 1])

List of bins to be used for the histogram. The values given represent the edges of the bins (i.e. lower and upper bounds). If the list contains two values, this will be used to automatically create an appropriate bin range, with a constant width given by the second value, and starting from the first value.

float_countbool (optional, default: True)

If True, the counts in each histogram bin will be returned as floats. If False, they will be returned as integers.

Returns
countsndarray

The bin counts.

binsndarray

The bin edges.

edge_hist

Edge histograms.

vertex_average

Average of vertex properties, degrees.

edge_average

Average of edge properties.

distance_histogram

Shortest-distance histogram.

Notes

The algorithm runs in $$O(|V|)$$ time.

If enabled during compilation, this algorithm runs in parallel.

Examples

>>> from numpy.random import poisson
>>> g = gt.random_graph(1000, lambda: (poisson(5), poisson(5)))
>>> print(gt.vertex_hist(g, "out"))
[array([  5.,  32.,  85., 148., 152., 182., 160., 116.,  53.,  25.,  23.,
13.,   3.,   2.,   1.]), array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
dtype=uint64)]

graph_tool.stats.edge_hist(g, eprop, bins=[0, 1], float_count=True)[source]

Return the edge histogram of the given property.

Parameters
gGraph

Graph to be used.

epropEdgePropertyMap

Edge property to be used for the histogram.

binslist of bins (optional, default: [0, 1])

List of bins to be used for the histogram. The values given represent the edges of the bins (i.e. lower and upper bounds). If the list contains two values, this will be used to automatically create an appropriate bin range, with a constant width given by the second value, and starting from the first value.

float_countbool (optional, default: True)

If True, the counts in each histogram bin will be returned as floats. If False, they will be returned as integers.

Returns
countsndarray

The bin counts.

binsndarray

The bin edges.

vertex_hist

Vertex histograms.

vertex_average

Average of vertex properties, degrees.

edge_average

Average of edge properties.

distance_histogram

Shortest-distance histogram.

Notes

The algorithm runs in $$O(|E|)$$ time.

If enabled during compilation, this algorithm runs in parallel.

Examples

>>> from numpy import arange
>>> from numpy.random import random
>>> g = gt.random_graph(1000, lambda: (5, 5))
>>> eprop = g.new_edge_property("double")
>>> eprop.get_array()[:] = random(g.num_edges())
>>> print(gt.edge_hist(g, eprop, linspace(0, 1, 11)))
[array([485., 538., 502., 505., 474., 497., 544., 465., 492., 498.]), array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])]

graph_tool.stats.vertex_average(g, deg)[source]

Return the average of the given degree or vertex property.

Parameters
gGraph

Graph to be used.

degstring or VertexPropertyMap

Degree or property to be used for the histogram. It can be either “in”, “out” or “total”, for in-, out-, or total degree of the vertices. It can also be a vertex property map.

Returns
averagefloat

The average of the given degree or property.

stdfloat

The standard deviation of the average.

vertex_hist

Vertex histograms.

edge_hist

Edge histograms.

edge_average

Average of edge properties.

distance_histogram

Shortest-distance histogram.

Notes

The algorithm runs in $$O(|V|)$$ time.

If enabled during compilation, this algorithm runs in parallel.

Examples

>>> from numpy.random import poisson
>>> g = gt.random_graph(1000, lambda: (poisson(5), poisson(5)))
>>> print(gt.vertex_average(g, "in"))
(4.986, 0.07323799560337517)

graph_tool.stats.edge_average(g, eprop)[source]

Return the average of the given degree or vertex property.

Parameters
gGraph

Graph to be used.

epropEdgePropertyMap

Edge property to be used for the histogram.

Returns
averagefloat

The average of the given property.

stdfloat

The standard deviation of the average.

vertex_hist

Vertex histograms.

edge_hist

Edge histograms.

vertex_average

Average of vertex degree, properties.

distance_histogram

Shortest-distance histogram.

Notes

The algorithm runs in $$O(|E|)$$ time.

If enabled during compilation, this algorithm runs in parallel.

Examples

>>> from numpy import arange
>>> from numpy.random import random
>>> g = gt.random_graph(1000, lambda: (5, 5))
>>> eprop = g.new_edge_property("double")
>>> eprop.get_array()[:] = random(g.num_edges())
>>> print(gt.edge_average(g, eprop))
(0.5027850372071281, 0.004073940886690715)

graph_tool.stats.remove_labeled_edges(g, label)[source]

Remove every edge e such that label[e] != 0.

graph_tool.stats.label_parallel_edges(g, mark_only=False, eprop=None)[source]

Label edges which are parallel, i.e, have the same source and target vertices. For each parallel edge set $$PE$$, the labelling starts from 0 to $$|PE|-1$$. If mark_only==True, all parallel edges are simply marked with the value 1. If the eprop parameter is given (a EdgePropertyMap), the labelling is stored there.

graph_tool.stats.remove_parallel_edges(g)[source]

Remove all parallel edges from the graph. Only one edge from each parallel edge set is left.

graph_tool.stats.label_self_loops(g, mark_only=False, eprop=None)[source]

Label edges which are self-loops, i.e, the source and target vertices are the same. For each self-loop edge set $$SL$$, the labelling starts from 0 to $$|SL|-1$$. If mark_only == True, self-loops are labeled with 1 and others with 0. If the eprop parameter is given (a EdgePropertyMap), the labelling is stored there.

graph_tool.stats.remove_self_loops(g)[source]

Remove all self-loops edges from the graph.

graph_tool.stats.distance_histogram(g, weight=None, bins=[0, 1], samples=None, float_count=True)[source]

Return the shortest-distance histogram for each vertex pair in the graph.

Parameters
gGraph

Graph to be used.

weightEdgePropertyMap (optional, default: None)

Edge weights.

binslist of bins (optional, default: [0, 1])

List of bins to be used for the histogram. The values given represent the edges of the bins (i.e. lower and upper bounds). If the list contains two values, this will be used to automatically create an appropriate bin range, with a constant width given by the second value, and starting from the first value.

samplesint (optional, default: None)

If supplied, the distances will be randomly sampled from a number of source vertices given by this parameter. It samples is None (default), all pairs are used.

float_countbool (optional, default: True)

If True, the counts in each histogram bin will be returned as floats. If False, they will be returned as integers.

Returns
countsndarray

The bin counts.

binsndarray

The bin edges.

vertex_hist

Vertex histograms.

edge_hist

Edge histograms.

vertex_average

Average of vertex degree, properties.

distance_histogram

Shortest-distance histogram.

Notes

The algorithm runs in $$O(V^2)$$ time, or $$O(V^2\log V)$$ if weight is not None. If samples is supplied, the complexities are $$O(\text{samples}\times V)$$ and $$O(\text{samples}\times V\log V)$$, respectively.

If enabled during compilation, this algorithm runs in parallel.

Examples

>>> g = gt.random_graph(100, lambda: (3, 3))
>>> hist = gt.distance_histogram(g)
>>> print(hist)
[array([   0.,  300.,  862., 2195., 3850., 2518.,  175.]), array([0, 1, 2, 3, 4, 5, 6, 7], dtype=uint64)]
>>> hist = gt.distance_histogram(g, samples=10)
>>> print(hist)
[array([  0.,  30.,  86., 213., 378., 262.,  21.]), array([0, 1, 2, 3, 4, 5, 6, 7], dtype=uint64)]