graph_tool.inference - Statistical inference of generative network models

This module contains algorithms for the identification of large-scale network structure via the statistical inference of generative models.

Note

An introduction to the concepts used here, as well as a basic HOWTO is included in the cookbook section: Inferring modular network structure.

Nonparametric stochastic block model inference

High-level functions

minimize_blockmodel_dl

Fit the stochastic block model, by minimizing its description length using an agglomerative heuristic.

minimize_nested_blockmodel_dl

Fit the nested stochastic block model, by minimizing its description length using an agglomerative heuristic.

State classes

BlockState

The stochastic block model state of a given graph.

OverlapBlockState

The overlapping stochastic block model state of a given graph.

LayeredBlockState

The (possibly overlapping) block state of a given graph, where the edges are divided into discrete layers.

NestedBlockState

The nested stochastic block model state of a given graph.

PPBlockState

Obtain the partition of a network according to the Bayesian planted partition model.

ModularityState

Obtain the partition of a network according to Newman’s modularity.

TemperingState

This class aggregates several state classes and corresponding inverse-temperature values to implement parallel tempering MCMC.

Sampling and minimization

mcmc_equilibrate

Equilibrate a MCMC with a given starting state.

mcmc_anneal

Equilibrate a MCMC at a specified target temperature by performing simulated annealing.

mcmc_multilevel

Equilibrate a MCMC from a starting state with a higher order, by performing successive agglomerative initializations and equilibrations until the desired order is reached, such that metastable states are avoided.

multicanonical_equilibrate

Equilibrate a multicanonical Monte Carlo sampling using the Wang-Landau algorithm.

MulticanonicalState

The density of states of a multicanonical Monte Carlo algorithm.

bisection_minimize

Find the best order (number of groups) given an initial set of states by performing a one-dimension minimization, using a Fibonacci (or golden section) search.

hierarchy_minimize

Attempt to find a fit of the nested stochastic block model that minimizes the description length.

Comparing and manipulating partitions

PartitionModeState

The random label model state for a set of labelled partitions, which attempts to align them with a common group labelling.

ModeClusterState

The mixed random label model state for a set of labelled partitions, which attempts to align them inside clusters with a common group labelling.

PartitionCentroidState

Obtain the center of a set of partitions, according to the variation of information metric or reduced mutual information.

partition_overlap

Returns the maximum overlap between partitions, according to an optimal label alignment.

nested_partition_overlap

Returns the hierarchical maximum overlap between nested partitions, according to an optimal recursive label alignment.

variation_information

Returns the variation of information between two partitions.

mutual_information

Returns the mutual information between two partitions.

reduced_mutual_information

Returns the reduced mutual information between two partitions.

contingency_graph

Returns the contingency graph between both partitions.

shuffle_partition_labels

Returns a copy of partition x, with the group labels randomly shuffled.

order_partition_labels

Returns a copy of partition x, with the group labels ordered decreasingly according to group size.

order_nested_partition_labels

Returns a copy of nested partition x, with the group labels ordered decreasingly according to group size at each level.

align_partition_labels

Returns a copy of partition x, with the group labels aligned as to maximize the overlap with y.

align_nested_partition_labels

Returns a copy of nested partition x, with the group labels aligned as to maximize the overlap with y.

partition_overlap_center

Find a partition with a maximal overlap to all items of the list of partitions given.

nested_partition_overlap_center

Find a nested partition with a maximal overlap to all items of the list of nested partitions given.

nested_partition_clear_null

Returns a copy of nested partition x where the null values -1 are replaced with 0.

Auxiliary functions

model_entropy

Computes the amount of information necessary for the parameters of the traditional blockmodel ensemble, for B blocks, N vertices, E edges, and either a directed or undirected graph.

mf_entropy

Compute the “mean field” entropy given the vertex block membership marginals.

bethe_entropy

Compute the Bethe entropy given the edge block membership marginals.

microstate_entropy

Compute microstate entropy given a histogram of partitions.

marginal_multigraph_entropy

Compute the entropy of the marginal latent multigraph distribution.

half_edge_graph

Generate a half-edge graph, where each half-edge is represented by a node, and an edge connects the half-edges like in the original graph.

get_block_edge_gradient

Get edge gradients corresponding to the block membership at the endpoints of the edges given by the be edge property map.

Auxiliary classes

PartitionHist

Histogram of partitions, implemented in C++.

BlockPairHist

Histogram of block pairs, implemented in C++.

Nonparametric network reconstruction

State classes

LatentMultigraphBlockState

Inference state of an erased Poisson multigraph, using the stochastic block model as a prior.

MeasuredBlockState

Inference state of a measured graph, using the stochastic block model as a prior.

MixedMeasuredBlockState

Inference state of a measured graph with heterogeneous errors, using the stochastic block model as a prior.

UncertainBlockState

Inference state of an uncertain graph, using the stochastic block model as a prior.

UncertainBaseState

Base state for uncertain network inference.

DynamicsBlockStateBase

Base state for network reconstruction based on dynamical data, using the stochastic block model as a prior.

EpidemicsBlockState

Inference state for network reconstruction based on epidemic dynamics, using the stochastic block model as a prior.

IsingBaseBlockState

Base state for network reconstruction based on the Ising model, using the stochastic block model as a prior.

IsingGlauberBlockState

State for network reconstruction based on the Glauber dynamics of the Ising model, using the stochastic block model as a prior.

CIsingGlauberBlockState

State for network reconstruction based on the Glauber dynamics of the continuous Ising model, using the stochastic block model as a prior.

PseudoIsingBlockState

State for network reconstruction based on the equilibrium configurations of the Ising model, using the Pseudolikelihood approximation and the stochastic block model as a prior.

PseudoCIsingBlockState

State for network reconstruction based on the equilibrium configurations of the continuous Ising model, using the Pseudolikelihood approximation and the stochastic block model as a prior.

Expectation-maximization Inference

latent_multigraph

Infer latent Poisson multigraph model given an “erased” simple graph.

Semiparametric stochastic block model inference

State classes

EMBlockState

The parametric, undirected stochastic block model state of a given graph.

Expectation-maximization Inference

em_infer

Infer the model parameters and latent variables using the expectation-maximization (EM) algorithm with initial state given by state.

Large-scale descriptors

modularity

Calculate Newman’s (generalized) modularity of a network partition.

Contents

class graph_tool.inference.blockmodel.PartitionHist

Histogram of partitions, implemented in C++. Interface supports querying and setting using Vector_int32_t as keys, and ints as values.

asdict()

Return the histogram’s contents as a dict.

class graph_tool.inference.blockmodel.BlockPairHist

Histogram of block pairs, implemented in C++. Interface supports querying and setting using pairs of ints as keys, and ints as values.

asdict()

Return the histogram’s contents as a dict.

class graph_tool.inference.blockmodel.BlockState(g, b=None, B=None, eweight=None, vweight=None, recs=[], rec_types=[], rec_params=[], clabel=None, pclabel=None, bfield=None, Bfield=None, deg_corr=True, dense_bg=False, **kwargs)[source]

Bases: object

The stochastic block model state of a given graph.

Parameters
gGraph

Graph to be modelled.

bVertexPropertyMap (optional, default: None)

Initial block labels on the vertices. If not supplied, it will be randomly sampled.

Bint (optional, default: None)

Number of blocks (or vertex groups). If not supplied it will be obtained from the parameter b.

eweightEdgePropertyMap (optional, default: None)

Edge multiplicities (for multigraphs or block graphs).

vweightVertexPropertyMap (optional, default: None)

Vertex multiplicities (for block graphs).

recslist of EdgePropertyMap instances (optional, default: [])

List of real or discrete-valued edge covariates.

rec_typeslist of edge covariate types (optional, default: [])

List of types of edge covariates. The possible types are: "real-exponential", "real-normal", "discrete-geometric", "discrete-poisson" or "discrete-binomial".

rec_paramslist of dict (optional, default: [])

Model hyperparameters for edge covariates. This should be a list of dict instances, or the string “microcanonical” (the default if nothing is specified). The keys depend on the type of edge covariate:

"real-exponential" or "discrete-poisson"

The parameter list is ["r", "theta"], corresponding to the parameters of the Gamma prior distribution. If unspecified, the default is the “empirical Bayes” choice: r = 1.0 and theta is the global average of the edge covariate.

"discrete-geometric"

The parameter list is ["alpha", "beta"], corresponding to the parameters of the Beta prior distribution. If unspecified, the default is the noninformative choice: alpha = beta = 1.0

"discrete-binomial"

The parameter list is ["N", "alpha", "beta"], corresponding to the number of trials N and the parameters of the Beta prior distribution. If unspecified, the default is the noninformative choice, alpha = beta = 1.0, and N is taken to be the maximum edge covarite value.

"real-normal"

The parameter list is ["m0", "k0", "v0", "nu0"] corresponding to the normal-inverse-chi-squared prior. If unspecified, the defaults are: m0 = rec.fa.mean(), k0 = 1, v0 = rec.fa.std() ** 2, and nu0 = 3, where rec is the corresponding edge covariate property map.

clabelVertexPropertyMap (optional, default: None)

Constraint labels on the vertices. If supplied, vertices with different label values will not be clustered in the same group.

pclabelVertexPropertyMap (optional, default: None)

Partition constraint labels on the vertices. This has the same interpretation as clabel, but will be used to compute the partition description length.

bfieldVertexPropertyMap (optional, default: None)

Local field acting as a prior for the node partition. This should be a vector property map of type vector<double>, and contain the log-probability for each node to be placed in each group.

deg_corrbool (optional, default: True)

If True, the degree-corrected version of the blockmodel ensemble will be assumed, otherwise the traditional variant will be used.

dense_bgbool (optional, default: False)

If True a dense matrix is used for the block graph, otherwise a sparse matrix will be used.

get_rec_params()[source]

Get model hyperparameters for edge covariates.

set_rec_params(params)[source]

Update model hyperparameters for edge covariates.

copy(g=None, eweight=None, vweight=None, b=None, B=None, deg_corr=None, clabel=None, overlap=False, pclabel=None, bfield=None, dense_bg=None, **kwargs)[source]

Copies the block state. The parameters override the state properties, and have the same meaning as in the constructor.

get_block_state(b=None, vweight=False, **kwargs)[source]

Returns a BlockState corresponding to the block graph (i.e. the blocks of the current state become the nodes). The parameters have the same meaning as the in the constructor. If vweight == True the nodes of the block state are weighted with the node counts.

get_E()[source]

Returns the total number of edges.

get_N()[source]

Returns the total number of nodes.

get_B()[source]

Returns the total number of blocks.

get_nonempty_B()[source]

Returns the total number of nonempty blocks.

get_Be()[source]

Returns the effective number of blocks, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of nodes in group r.

get_bclabel(clabel=None)[source]

Returns a VertexPropertyMap corresponding to constraint labels for the block graph.

get_bpclabel()[source]

Returns a VertexPropertyMap corresponding to partition constraint labels for the block graph.

get_blocks()[source]

Returns the property map which contains the block labels for each vertex.

get_state()[source]

Alias to get_blocks().

set_state(b)[source]

Sets the internal partition of the state.

get_bg()[source]

Returns the block graph.

get_ers()[source]

Returns the edge property map of the block graph which contains the \(e_{rs}\) matrix entries. For undirected graphs, the diagonal values (self-loops) contain \(e_{rr}/2\).

get_er()[source]

Returns the vertex property map of the block graph which contains the number \(e_r\) of half-edges incident on block \(r\). If the graph is directed, a pair of property maps is returned, with the number of out-edges \(e^+_r\) and in-edges \(e^-_r\), respectively.

get_nr()[source]

Returns the vertex property map of the block graph which contains the block sizes \(n_r\).

entropy(adjacency=True, dl=True, partition_dl=True, degree_dl=True, degree_dl_kind='distributed', edges_dl=True, dense=False, multigraph=True, deg_entropy=True, recs=True, recs_dl=True, beta_dl=1.0, Bfield=True, exact=True, **kwargs)[source]

Calculate the entropy (a.k.a. negative log-likelihood) associated with the current block partition.

Parameters
adjacencybool (optional, default: True)

If True, the adjacency term of the description length will be included.

dlbool (optional, default: True)

If True, the description length for the parameters will be included.

partition_dlbool (optional, default: True)

If True, and dl == True the partition description length will be included.

degree_dlbool (optional, default: True)

If True, and dl == True the degree sequence description length will be included (for degree-corrected models).

degree_dl_kindstr (optional, default: "distributed")

This specifies the prior used for the degree sequence. It must be one of: "uniform", "distributed" (default) or "entropy".

edges_dlbool (optional, default: True)

If True, and dl == True the edge matrix description length will be included.

densebool (optional, default: False)

If True, the “dense” variant of the entropy will be computed.

multigraphbool (optional, default: True)

If True, the multigraph entropy will be used.

deg_entropybool (optional, default: True)

If True, the degree entropy term that is independent of the network partition will be included (for degree-corrected models).

recsbool (optional, default: True)

If True, the likelihood for real or discrete-valued edge covariates is computed.

recs_dlbool (optional, default: True)

If True, and dl == True the edge covariate description length will be included.

beta_dldouble (optional, default: 1.)

Prior inverse temperature.

exactbool (optional, default: True)

If True, the exact expressions will be used. Otherwise, Stirling’s factorial approximation will be used for some terms.

Notes

The “entropy” of the state is the negative log-likelihood of the microcanonical SBM, that includes the generated graph \(\boldsymbol{A}\) and the model parameters \(\boldsymbol{\theta}\),

\[\begin{split}\Sigma &= - \ln P(\boldsymbol{A},\boldsymbol{\theta}) \\ &= - \ln P(\boldsymbol{A}|\boldsymbol{\theta}) - \ln P(\boldsymbol{\theta}).\end{split}\]

This value is also called the description length of the data, and it corresponds to the amount of information required to describe it (in nats).

For the traditional blockmodel (deg_corr == False), the model parameters are \(\boldsymbol{\theta} = \{\boldsymbol{e}, \boldsymbol{b}\}\), where \(\boldsymbol{e}\) is the matrix of edge counts between blocks, and \(\boldsymbol{b}\) is the partition of the nodes into blocks. For the degree-corrected blockmodel (deg_corr == True), we have an additional set of parameters, namely the degree sequence \(\boldsymbol{k}\).

For the traditional blockmodel, the model likelihood is

\[\begin{split}P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b}) &= \frac{\prod_{r<s}e_{rs}!\prod_re_{rr}!!}{\prod_rn_r^{e_r}}\times \frac{1}{\prod_{i<j}A_{ij}!\prod_iA_{ii}!!},\\ P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b}) &= \frac{\prod_{rs}e_{rs}!}{\prod_rn_r^{e_r}}\times \frac{1}{\prod_{ij}A_{ij}!},\end{split}\]

for undirected and directed graphs, respectively, where \(e_{rs}\) is the number of edges from block \(r\) to \(s\) (or the number of half-edges for the undirected case when \(r=s\)), and \(n_r\) is the number of vertices in block \(r\) .

For the degree-corrected variant the equivalent expressions are

\[\begin{split}P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b},\boldsymbol{k}) &= \frac{\prod_{r<s}e_{rs}!\prod_re_{rr}!!}{\prod_re_r!}\times \frac{\prod_ik_i!}{\prod_{i<j}A_{ij}!\prod_iA_{ii}!!},\\ P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b},\boldsymbol{k}) &= \frac{\prod_{rs}e_{rs}!}{\prod_re_r^+!\prod_re_r^-!}\times \frac{\prod_ik_i^+!\prod_ik_i^-!}{\prod_{ij}A_{ij}!},\end{split}\]

where \(e_r = \sum_se_{rs}\) is the number of half-edges incident on block \(r\), and \(e^+_r = \sum_se_{rs}\) and \(e^-_r = \sum_se_{sr}\) are the numbers of out- and in-edges adjacent to block \(r\), respectively.

If exact == False, Stirling’s approximation is used in the above expression.

If dense == True, the likelihood for the non-degree-corrected model becomes instead

\[\begin{split}P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b})^{-1} &= \prod_{r<s}{n_rn_s\choose e_{rs}}\prod_r{{n_r\choose 2}\choose e_{rr}/2},\\ P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b})^{-1} &= \prod_{rs}{n_rn_s\choose e_{rs}}\end{split}\]

if multigraph == False, otherwise we replace \({n\choose m}\to\left(\!\!{n\choose m}\!\!\right)\) above, where \(\left(\!\!{n\choose m}\!\!\right) = {n+m-1\choose m}\). A “dense” entropy for the degree-corrected model is not available, and if requested will raise a NotImplementedError.

If dl == True, the description length \(\mathcal{L} = -\ln P(\boldsymbol{\theta})\) of the model will be returned as well. The terms \(P(\boldsymbol{e})\) and \(P(\boldsymbol{b})\) are described in described in model_entropy().

For the degree-corrected model we need to specify the prior \(P(\boldsymbol{k})\) for the degree sequence as well. Here there are three options:

  1. degree_dl_kind == "uniform"

    \[P(\boldsymbol{k}|\boldsymbol{e},\boldsymbol{b}) = \prod_r\left(\!\!{n_r\choose e_r}\!\!\right)^{-1}.\]

    This corresponds to a noninformative prior, where the degrees are sampled from a uniform distribution.

  2. degree_dl_kind == "distributed" (default)

    \[P(\boldsymbol{k}|\boldsymbol{e},\boldsymbol{b}) = \prod_r\frac{\prod_k\eta_k^r!}{n_r!} \prod_r q(e_r, n_r)^{-1}\]

    with \(\eta_k^r\) being the number of nodes with degree \(k\) in group \(r\), and \(q(n,m)\) being the number of partitions of integer \(n\) into at most \(m\) parts.

    This corresponds to a prior for the degree sequence conditioned on the degree frequencies, which are themselves sampled from a uniform hyperprior. This option should be preferred in most cases.

  3. degree_dl_kind == "entropy"

    \[P(\boldsymbol{k}|\boldsymbol{e},\boldsymbol{b}) \approx \prod_r\exp\left(-n_rH(\boldsymbol{k}_r)\right)\]

    where \(H(\boldsymbol{k}_r) = -\sum_kp_r(k)\ln p_r(k)\) is the entropy of the degree distribution inside block \(r\).

    Note that, differently from the other two choices, this represents only an approximation of the description length. It is meant to be used only for comparison purposes, and should be avoided in practice.

For the directed case, the above expressions are duplicated for the in- and out-degrees.

References

peixoto-nonparametric-2017

Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub, @tor], arXiv: 1610.02703

peixoto-hierarchical-2014

Tiago P. Peixoto, “Hierarchical block structures and high-resolution model selection in large networks “, Phys. Rev. X 4, 011047 (2014), DOI: 10.1103/PhysRevX.4.011047 [sci-hub, @tor], arXiv: 1310.4377.

peixoto-weighted-2017

Tiago P. Peixoto, “Nonparametric weighted stochastic block models”, Phys. Rev. E 97, 012306 (2018), DOI: 10.1103/PhysRevE.97.012306 [sci-hub, @tor], arXiv: 1708.01432

get_matrix()[source]

Returns the block matrix (as a sparse csr_matrix), which contains the number of edges between each block pair.

Warning

This corresponds to the adjacency matrix of the block graph, which by convention includes twice the amount of edges in the diagonal entries if the graph is undirected.

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.BlockState(g, B=5, deg_corr=True)
>>> state.mcmc_sweep(niter=1000)
(...)
>>> m = state.get_matrix()
>>> figure()
<...>
>>> matshow(m.todense())
<...>
>>> savefig("bloc_mat.svg")
_images/bloc_mat.svg

A 5x5 block matrix.

virtual_vertex_move(v, s, **kwargs)[source]

Computes the entropy difference if vertex v is moved to block s. The remaining parameters are the same as in graph_tool.inference.blockmodel.BlockState.entropy().

move_vertex(v, s)[source]

Move vertex v to block s.

This optionally accepts a list of vertices and blocks to move simultaneously.

remove_vertex(v)[source]

Remove vertex v from its current group.

This optionally accepts a list of vertices to remove.

Warning

This will leave the state in an inconsistent state before the vertex is returned to some other group, or if the same vertex is removed twice.

add_vertex(v, r)[source]

Add vertex v to block r.

This optionally accepts a list of vertices and blocks to add.

Warning

This can leave the state in an inconsistent state if a vertex is added twice to the same group.

merge_vertices(u, v)[source]

Merge vertex u into v.

Warning

This modifies the underlying graph.

sample_vertex_move(v, c=1.0, d=0.1)[source]

Sample block membership proposal of vertex v according to real-valued sampling parameters c and d: For \(c\to 0\) the blocks are sampled according to the local neighborhood and their connections; for \(c\to\infty\) the blocks are sampled randomly. With a probability d, a new (empty) group is sampled.

get_move_prob(v, s, c=1.0, d=0.1, reverse=False)[source]

Compute the probability of a move proposal for vertex v to block s according to sampling parameters c and d, as obtained with graph_tool.inference.blockmodel.BlockState.sample_vertex_move(). If reverse == True, the reverse probability of moving the node back from block s to its current one is obtained.

get_edges_prob(missing, spurious=[], entropy_args={})[source]

Compute the joint log-probability of the missing and spurious edges given by missing and spurious (a list of (source, target) tuples, or Edge() instances), together with the observed edges.

More precisely, the log-likelihood returned is

\[\ln \frac{P(\boldsymbol G + \delta \boldsymbol G | \boldsymbol b)}{P(\boldsymbol G| \boldsymbol b)}\]

where \(\boldsymbol G + \delta \boldsymbol G\) is the modified graph (with missing edges added and spurious edges deleted).

The values in entropy_args are passed to entropy() to calculate the log-probability.

mcmc_sweep(beta=1.0, c=1.0, d=0.01, niter=1, entropy_args={}, allow_vacate=True, sequential=True, deterministic=False, vertices=None, verbose=False, **kwargs)[source]

Perform niter sweeps of a Metropolis-Hastings acceptance-rejection sampling MCMC to sample network partitions.

Parameters
betafloat (optional, default: 1.)

Inverse temperature.

cfloat (optional, default: 1.)

Sampling parameter c for move proposals: For \(c\to 0\) the blocks are sampled according to the local neighborhood of a given node and their block connections; for \(c\to\infty\) the blocks are sampled randomly. Note that only for \(c > 0\) the MCMC is guaranteed to be ergodic.

dfloat (optional, default: .01)

Probability of selecting a new (i.e. empty) group for a given move.

niterint (optional, default: 1)

Number of sweeps to perform. During each sweep, a move attempt is made for each node.

entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.blockmodel.BlockState.entropy().

allow_vacatebool (optional, default: True)

Allow groups to be vacated.

sequentialbool (optional, default: True)

If sequential == True each vertex move attempt is made sequentially, where vertices are visited in random order. Otherwise the moves are attempted by sampling vertices randomly, so that the same vertex can be moved more than once, before other vertices had the chance to move.

deterministicbool (optional, default: False)

If sequential == True and deterministic == True the vertices will be visited in deterministic order.

verticeslist of ints (optional, default: None)

If provided, this should be a list of vertices which will be moved. Otherwise, all vertices will.

verbosebool (optional, default: False)

If verbose == True, detailed information will be displayed.

Returns
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of blocks).

References

peixoto-efficient-2014

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378

multiflip_mcmc_sweep(beta=1.0, c=1.0, psingle=None, psplit=1, pmerge=1, pmergesplit=1, d=0.01, gibbs_sweeps=10, niter=1, entropy_args={}, accept_stats=None, verbose=False, **kwargs)[source]

Perform niter sweeps of a Metropolis-Hastings acceptance-rejection sampling MCMC with multiple simultaneous moves to sample network partitions.

Parameters
betafloat (optional, default: 1.)

Inverse temperature.

cfloat (optional, default: 1.)

Sampling parameter c for move proposals: For \(c\to 0\) the blocks are sampled according to the local neighborhood of a given node and their block connections; for \(c\to\infty\) the blocks are sampled randomly. Note that only for \(c > 0\) the MCMC is guaranteed to be ergodic.

psinglefloat (optional, default: None)

Relative probability of proposing a single node move. If None, it will be selected as the number of nodes in the graph.

psplitfloat (optional, default: 1)

Relative probability of proposing a group split.

pmergesplitfloat (optional, default: 1)

Relative probability of proposing a marge-split move.

dfloat (optional, default: 1)

Probability of selecting a new (i.e. empty) group for a given single-node move.

gibbs_sweepsint (optional, default: 10)

Number of sweeps of Gibbs sampling to be performed (i.e. each node is attempted once per sweep) to refine a split proposal.

niterint (optional, default: 1)

Number of sweeps to perform. During each sweep, a move attempt is made for each node, on average.

entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.blockmodel.BlockState.entropy().

verbosebool (optional, default: False)

If verbose == True, detailed information will be displayed.

Returns
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of blocks).

gibbs_sweep(beta=1.0, niter=1, entropy_args={}, allow_new_group=True, sequential=True, deterministic=False, vertices=None, verbose=False, **kwargs)[source]

Perform niter sweeps of a rejection-free Gibbs sampling MCMC to sample network partitions.

Parameters
betafloat (optional, default: 1.)

Inverse temperature.

niterint (optional, default: 1)

Number of sweeps to perform. During each sweep, a move attempt is made for each node.

entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.blockmodel.BlockState.entropy().

allow_new_groupbool (optional, default: True)

Allow the number of groups to increase and decrease.

sequentialbool (optional, default: True)

If sequential == True each vertex move attempt is made sequentially, where vertices are visited in random order. Otherwise the moves are attempted by sampling vertices randomly, so that the same vertex can be moved more than once, before other vertices had the chance to move.

deterministicbool (optional, default: False)

If sequential == True and deterministic == True the vertices will be visited in deterministic order.

verticeslist of ints (optional, default: None)

If provided, this should be a list of vertices which will be moved. Otherwise, all vertices will.

verbosebool (optional, default: False)

If verbose == True, detailed information will be displayed.

Returns
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E\times B)\) complexity, where \(B\) is the number of blocks, and \(E\) is the number of edges.

multicanonical_sweep(m_state, multiflip=False, **kwargs)[source]

Perform niter sweeps of a non-Markovian multicanonical sampling using the Wang-Landau algorithm.

Parameters
m_stateMulticanonicalState

MulticanonicalState instance containing the current state of the Wang-Landau run.

multiflipbool (optional, default: False)

If True, multiflip_mcmc_sweep() will be used, otherwise mcmc_sweep().

**kwargsKeyword parameter list

The remaining parameters will be passed to multiflip_mcmc_sweep() or mcmc_sweep().

Returns
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of blocks).

References

wang-efficient-2001

Fugao Wang, D. P. Landau, “An efficient, multiple range random walk algorithm to calculate the density of states”, Phys. Rev. Lett. 86, 2050 (2001), DOI: 10.1103/PhysRevLett.86.2050 [sci-hub, @tor], arXiv: cond-mat/0011174

multicanonical_B_sweep(m_state, **kwargs)[source]

Perform niter sweeps of a non-Markovian multicanonical sampling using the Wang-Landau algorithm.

Parameters
m_stateMulticanonicalState

MulticanonicalState instance containing the current state of the Wang-Landau run.

multiflipbool (optional, default: False)

If True, multiflip_mcmc_sweep() will be used, otherwise mcmc_sweep().

**kwargsKeyword parameter list

The remaining parameters will be passed to multiflip_mcmc_sweep() or mcmc_sweep().

Returns
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of blocks).

References

wang-efficient-2001

Fugao Wang, D. P. Landau, “An efficient, multiple range random walk algorithm to calculate the density of states”, Phys. Rev. Lett. 86, 2050 (2001), DOI: 10.1103/PhysRevLett.86.2050 [sci-hub, @tor], arXiv: cond-mat/0011174

exhaustive_sweep(entropy_args={}, callback=None, density=None, vertices=None, initial_partition=None, max_iter=None)[source]

Perform an exhaustive loop over all possible network partitions.

Parameters
entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.blockmodel.BlockState.entropy().

callbackcallable object (optional, default: None)

Function to be called for each partition, with three arguments (S, S_min, b_min) corresponding to the the current entropy value, the minimum entropy value so far, and the corresponding partition, respectively. If not provided, and hist is None an iterator over the same values will be returned instead.

densitytuple (optional, default: None)

If provided, it should contain a tuple with values (S_min, S_max, n_bins), which will be used to obtain the density of states via a histogram of size n_bins. This parameter is ignored unless callback is None.

verticesiterable of ints (optional, default: None)

If provided, this should be a list of vertices which will be moved. Otherwise, all vertices will.

initial_partitioniterable of ints (optional, default: None)

If provided, this will provide the initial partition for the iteration.

max_iterint (optional, default: None)

If provided, this will limit the total number of iterations.

Returns
statesiterator over (S, S_min, b_min)

If callback is None and hist is None, the function will return an iterator over (S, S_min, b_min) corresponding to the the current entropy value, the minimum entropy value so far, and the corresponding partition, respectively.

Ss, countspair of numpy.ndarray

If callback is None and hist is not None, the function will return the values of each bin (Ss) and the state count of each bin (counts).

b_minVertexPropertyMap

If callback is not None or hist is not None, the function will also return partition with smallest entropy.

Notes

This algorithm has an \(O(B^N)\) complexity, where \(B\) is the number of blocks, and \(N\) is the number of vertices.

merge_sweep(nmerges=1, niter=10, entropy_args={}, parallel=True, verbose=False, **kwargs)[source]

Perform niter merge sweeps, where block nodes are progressively merged together in a manner that least increases the entropy.

Parameters
nmergesint (optional, default: 1)

Number block nodes to merge.

niterint (optional, default: 1)

Number of merge attempts to perform for each block node, before the best one is selected.

entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.blockmodel.BlockState.entropy().

parallelbool (optional, default: True)

If parallel == True, the merge candidates are obtained in parallel.

verbosebool (optional, default: False)

If verbose == True, detailed information will be displayed.

Returns
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of attempted merges.

nmovesint

Number of vertices merged.

Notes

This function should only be called for block states, obtained from graph_tool.inference.blockmodel.BlockState.get_block_state().

References

peixoto-efficient-2014

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378

shrink(B, **kwargs)[source]

Reduces the order of current state by progressively merging groups, until only B are left. All remaining keyword arguments are passed to graph_tool.inference.blockmodel.BlockState.merge_sweep().

This function leaves the current state untouched and returns instead a copy with the new partition.

collect_edge_marginals(p=None, update=1)[source]

Collect the edge marginal histogram, which counts the number of times the endpoints of each node have been assigned to a given block pair.

This should be called multiple times, e.g. after repeated runs of the graph_tool.inference.blockmodel.BlockState.mcmc_sweep() function.

Parameters
pEdgePropertyMap (optional, default: None)

Edge property map with edge marginals to be updated. If not provided, an empty histogram will be created.

updatefloat (optional, default: 1)

Each call increases the current count by the amount given by this parameter.

Returns
pEdgePropertyMap

Edge property map with updated edge marginals.

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.BlockState(g, B=4, deg_corr=True)
>>> pe = None
>>> state.mcmc_sweep(niter=1000)   # remove part of the transient
(...)
>>> for i in range(1000):
...     ret = state.mcmc_sweep(niter=10)
...     pe = state.collect_edge_marginals(pe)
>>> gt.bethe_entropy(g, pe)[0]
1.7733496...
collect_vertex_marginals(p=None, b=None, unlabel=False, update=1)[source]

Collect the vertex marginal histogram, which counts the number of times a node was assigned to a given block.

This should be called multiple times, e.g. after repeated runs of the graph_tool.inference.blockmodel.BlockState.mcmc_sweep() function.

Parameters
pVertexPropertyMap (optional, default: None)

Vertex property map with vector-type values, storing the previous block membership counts. If not provided, an empty histogram will be created.

bVertexPropertyMap (optional, default: None)

Vertex property map with group partition. If not provided, the state’s partition will be used.

unlabelbool (optional, default: False)

If True, a canonical labelling of the groups will be used, so that each partition is uniquely represented.

updateint (optional, default: 1)

Each call increases the current count by the amount given by this parameter.

Returns
pVertexPropertyMap

Vertex property map with vector-type values, storing the accumulated block membership counts.

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.BlockState(g, B=4, deg_corr=True)
>>> pv = None
>>> state.mcmc_sweep(niter=1000)   # remove part of the transient
(...)
>>> for i in range(1000):
...     ret = state.mcmc_sweep(niter=10)
...     pv = state.collect_vertex_marginals(pv)
>>> gt.mf_entropy(g, pv)
22.237735...
>>> gt.graph_draw(g, pos=g.vp["pos"], vertex_shape="pie",
...               vertex_pie_fractions=pv, output="polbooks_blocks_soft_B4.svg")
<...>
_images/polbooks_blocks_soft_B4.svg

“Soft” block partition of a political books network with \(B=4\).

collect_partition_histogram(h=None, update=1, unlabel=True)[source]

Collect a histogram of partitions.

This should be called multiple times, e.g. after repeated runs of the graph_tool.inference.blockmodel.BlockState.mcmc_sweep() function.

Parameters
hPartitionHist (optional, default: None)

Partition histogram. If not provided, an empty histogram will be created.

updatefloat (optional, default: 1)

Each call increases the current count by the amount given by this parameter.

unlabelbool (optional, default: True)

If True, a canonical labelling of the groups will be used, so that each partition is uniquely represented.

Returns
hPartitionHist (optional, default: None)

Updated Partition histogram.

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.BlockState(g, B=4, deg_corr=True)
>>> ph = None
>>> state.mcmc_sweep(niter=1000)   # remove part of the transient
(...)
>>> for i in range(1000):
...     ret = state.mcmc_sweep(niter=10)
...     ph = state.collect_partition_histogram(ph)
>>> gt.microstate_entropy(ph)
132.254124...
draw(**kwargs)[source]

Convenience wrapper to graph_draw() that draws the state of the graph as colors on the vertices and edges.

sample_graph(canonical=False, multigraph=True, self_loops=True, sample_params=False, max_ent=False, n_iter=1000)[source]

Sample a new graph from the fitted model.

Parameters
canonicalbool (optional, default: False)

If canonical == True, the graph will be sampled from the maximum-likelihood estimate of the canonical stochastic block model. Otherwise, it will be sampled from the microcanonical model.

multigraphbool (optional, default: True)

If True, parallel edges will be allowed.

self-loopsbool (optional, default: True)

If True, self-loops will be allowed.

sample_paramsbool (optional, default: True)

If True, and canonical == False and max_ent == False, the count parameters (edges between groups and node degrees) will be sampled from their posterior distribution conditioned on the actual state. Otherwise, their maximum-likelihood values will be used.

max_entbool (optional, default: False)

If True, maximum-entropy model variants will be used.

n_iterint (optional, default: 1000)

Number of iterations used (only relevant if canonical == False and max_ent == True).

Returns
gGraph

Generated graph.

Notes

This function is just a convenience wrapper to generate_sbm(). However, if max_ent==True and canonical == False it wraps random_rewire() instead.

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.minimize_blockmodel_dl(g, B_max=3)
>>> u = state.sample_graph(canonical=True, self_loops=False, multigraph=False)
>>> ustate = gt.BlockState(u, b=state.b)
>>> state.draw(pos=g.vp.pos, output="polbooks-sbm.svg")
<...>
>>> ustate.draw(pos=u.own_property(g.vp.pos), output="polbooks-sbm-sampled.svg")
<...>
_images/polbooks-sbm.svg _images/polbooks-sbm-sampled.svg

Left: Political books network. Right: Sample from the degree-corrected SBM fitted to the original network.

graph_tool.inference.blockmodel.model_entropy(B, N, E, directed=False, nr=None)[source]

Computes the amount of information necessary for the parameters of the traditional blockmodel ensemble, for B blocks, N vertices, E edges, and either a directed or undirected graph.

This is equivalently defined as minus the log-likelihood of sampling the parameters from a nonparametric generative model.

A traditional blockmodel is defined as a set of \(N\) vertices which can belong to one of \(B\) blocks, and the matrix \(e_{rs}\) describes the number of edges from block \(r\) to \(s\) (or twice that number if \(r=s\) and the graph is undirected).

For an undirected graph, the number of distinct \(e_{rs}\) matrices is given by,

\[\Omega_m = \left(\!\!{B(B+1)/2 \choose E}\!\!\right)\]

and for a directed graph,

\[\Omega_m = \left(\!\!{B^2 \choose E}\!\!\right)\]

where \(\left(\!{n \choose k}\!\right) = {n+k-1\choose k}\) is the number of \(k\) combinations with repetitions from a set of size \(n\). Hence, we have the description length of the edge counts

\[-\ln P(\boldsymbol{e}) = \ln \Omega_m.\]

For the node partition \(\boldsymbol{b}\) we assume a two-level Bayesian hierarchy, where first the group size histogram is generated, and conditioned on it the partition, which leads to a description length:

\[-\ln P(\boldsymbol{b}) = \ln {N - 1 \choose B - 1} + \ln N! - \sum_r \ln n_r!.\]

where \(n_r\) is the number of nodes in block \(r\).

The total information necessary to describe the model is then,

\[-\ln P(\boldsymbol{e}, \boldsymbol{b}) = -\ln P(\boldsymbol{e}) - \ln P(\boldsymbol{b}).\]

If nr is None, it is assumed \(n_r=N/B\). If nr is False, the partition term \(-\ln P(\boldsymbol{b})\) is omitted entirely.

References

peixoto-parsimonious-2013

Tiago P. Peixoto, “Parsimonious module inference in large networks”, Phys. Rev. Lett. 110, 148701 (2013), DOI: 10.1103/PhysRevLett.110.148701 [sci-hub, @tor], arXiv: 1212.4794.

peixoto-nonparametric-2017

Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub, @tor], arXiv: 1610.02703

graph_tool.inference.blockmodel.bethe_entropy(g, p)[source]

Compute the Bethe entropy given the edge block membership marginals.

Parameters
gGraph

The graph.

pEdgePropertyMap

Edge property map with edge marginals.

Returns
Hfloat

The Bethe entropy value (in nats)

Hmffloat

The “mean field” entropy value (in nats), as would be returned by the mf_entropy() function.

pvVertexPropertyMap (optional, default: None)

Vertex property map with vector-type values, storing the accumulated block membership counts. These are the node marginals, as would be returned by the collect_vertex_marginals() method.

Notes

The Bethe entropy is defined as,

\[H = -\sum_{ij}A_{ij}\sum_{rs}\pi_{ij}(r,s)\ln\pi_{ij}(r,s) - \sum_i(1-k_i)\sum_r\pi_i(r)\ln\pi_i(r),\]

where \(\pi_{ij}(r,s)\) is the marginal probability that vertices \(i\) and \(j\) belong to blocks \(r\) and \(s\), respectively, and \(\pi_i(r)\) is the marginal probability that vertex \(i\) belongs to block \(r\), and \(k_i\) is the degree of vertex \(i\) (or total degree for directed graphs).

References

mezard-information-2009

Marc Mézard, Andrea Montanari, “Information, Physics, and Computation”, Oxford Univ Press, 2009. DOI: 10.1093/acprof:oso/9780198570837.001.0001 [sci-hub, @tor]

graph_tool.inference.blockmodel.mf_entropy(g, p)[source]

Compute the “mean field” entropy given the vertex block membership marginals.

Parameters
gGraph

The graph.

pVertexPropertyMap

Vertex property map with vector-type values, storing the accumulated block membership counts.

Returns
Hmffloat

The “mean field” entropy value (in nats).

Notes

The “mean field” entropy is defined as,

\[H = - \sum_{i,r}\pi_i(r)\ln\pi_i(r),\]

where \(\pi_i(r)\) is the marginal probability that vertex \(i\) belongs to block \(r\).

References

mezard-information-2009

Marc Mézard, Andrea Montanari, “Information, Physics, and Computation”, Oxford Univ Press, 2009. DOI: 10.1093/acprof:oso/9780198570837.001.0001 [sci-hub, @tor]

graph_tool.inference.blockmodel.microstate_entropy(h, unlabel=True)[source]

Compute microstate entropy given a histogram of partitions.

Parameters
hPartitionHist (optional, default: None)

Partition histogram.

unlabelbool (optional, default: True)

If True, a canonical labelling of the groups will be used, so that each partition is uniquely represented. However, the entropy computed will still correspond to the full distribution over labelled partitions, where all permutations are assumed to be equally likely.

Returns
Hfloat

The microstate entropy value (in nats).

Notes

The microstate entropy is defined as,

\[H = - \sum_{\boldsymbol b}p({\boldsymbol b})\ln p({\boldsymbol b}),\]

where \(p({\boldsymbol b})\) is observed frequency of labelled partition \({\boldsymbol b}\).

References

mezard-information-2009

Marc Mézard, Andrea Montanari, “Information, Physics, and Computation”, Oxford Univ Press, 2009. DOI: 10.1093/acprof:oso/9780198570837.001.0001 [sci-hub, @tor]

class graph_tool.inference.overlap_blockmodel.OverlapBlockState(g, b=None, B=None, recs=[], rec_types=[], rec_params=[], clabel=None, pclabel=None, deg_corr=True, dense_bg=False, **kwargs)[source]

Bases: graph_tool.inference.blockmodel.BlockState

The overlapping stochastic block model state of a given graph.

Parameters
gGraph

Graph to be modelled.

bVertexPropertyMap or numpy.ndarray (optional, default: None)

Initial block labels on the vertices or half-edges. If not supplied, it will be randomly sampled. If the value passed is a vertex property map, it will be assumed to be a non-overlapping partition of the vertices. If it is an edge property map, it should contain a vector for each edge, with the block labels at each end point (sorted according to their vertex index, in the case of undirected graphs, otherwise from source to target). If the value is an numpy.ndarray, it will be assumed to correspond directly to a partition of the list of half-edges.

Bint (optional, default: None)

Number of blocks (or vertex groups). If not supplied it will be obtained from the parameter b.

recslist of EdgePropertyMap instances (optional, default: [])

List of real or discrete-valued edge covariates.

rec_typeslist of edge covariate types (optional, default: [])

List of types of edge covariates. The possible types are: "real-exponential", "real-normal", "discrete-geometric", "discrete-poisson" or "discrete-binomial".

rec_paramslist of dict (optional, default: [])

Model hyperparameters for edge covariates. This should a list of dict instances. See BlockState for more details.

clabelVertexPropertyMap (optional, default: None)

Constraint labels on the vertices. If supplied, vertices with different label values will not be clustered in the same group.

deg_corrbool (optional, default: True)

If True, the degree-corrected version of the blockmodel ensemble will be assumed, otherwise the traditional variant will be used.

dense_bgbool (optional, default: False)

If True a dense matrix is used for the block graph, otherwise a sparse matrix will be used.

copy(g=None, b=None, B=None, deg_corr=None, clabel=None, pclabel=None, **kwargs)[source]

Copies the block state. The parameters override the state properties, and have the same meaning as in the constructor. If overlap=False an instance of BlockState is returned. This is by default a shallow copy.

get_E()[source]

Returns the total number of edges.

get_N()[source]

Returns the total number of nodes.

get_B()[source]

Returns the total number of blocks.

get_nonempty_B()[source]

Returns the total number of nonempty blocks.

get_edge_blocks()[source]

Returns an edge property map which contains the block labels pairs for each edge.

get_overlap_blocks()[source]

Returns the mixed membership of each vertex.

Returns
bvVertexPropertyMap

A vector-valued vertex property map containing the block memberships of each node.

bc_inVertexPropertyMap

The labelled in-degrees of each node, i.e. how many in-edges belong to each group, in the same order as the bv property above.

bc_outVertexPropertyMap

The labelled out-degrees of each node, i.e. how many out-edges belong to each group, in the same order as the bv property above.

bc_totalVertexPropertyMap

The labelled total degrees of each node, i.e. how many incident edges belong to each group, in the same order as the bv property above.

get_nonoverlap_blocks()[source]

Returns a scalar-valued vertex property map with the block mixture represented as a single number.

get_majority_blocks()[source]

Returns a scalar-valued vertex property map with the majority block membership of each node.

entropy(adjacency=True, dl=True, partition_dl=True, degree_dl=True, degree_dl_kind='distributed', edges_dl=True, dense=False, multigraph=True, deg_entropy=True, recs=True, recs_dl=True, beta_dl=1.0, exact=True, **kwargs)[source]

Calculate the entropy associated with the current block partition.

Parameters
adjacencybool (optional, default: True)

If True, the adjacency term of the description length will be included.

dlbool (optional, default: True)

If True, the description length for the parameters will be included.

partition_dlbool (optional, default: True)

If True, and dl == True the partition description length will be included.

degree_dlbool (optional, default: True)

If True, and dl == True the degree sequence description length will be included (for degree-corrected models).

degree_dl_kindstr (optional, default: "distributed")

This specifies the prior used for the degree sequence. It must be one of: "uniform", "distributed" (default) or "entropy".

edges_dlbool (optional, default: True)

If True, and dl == True the edge matrix description length will be included.

densebool (optional, default: False)

If True, the “dense” variant of the entropy will be computed.

multigraphbool (optional, default: True)

If True, the multigraph entropy will be used.

deg_entropybool (optional, default: True)

If True, the degree entropy term that is independent of the network partition will be included (for degree-corrected models).

recsbool (optional, default: True)

If True, the likelihood for real or discrete-valued edge covariates is computed.

recs_dlbool (optional, default: True)

If True, and dl == True the edge covariate description length will be included.

beta_dldouble (optional, default: 1.)

Prior inverse temperature.

exactbool (optional, default: True)

If True, the exact expressions will be used. Otherwise, Stirling’s factorial approximation will be used for some terms.

Notes

The “entropy” of the state is minus the log-likelihood of the microcanonical SBM, that includes the generated graph \(\boldsymbol{A}\) and the model parameters \(\boldsymbol{\theta}\),

\[\begin{split}\mathcal{S} &= - \ln P(\boldsymbol{A},\boldsymbol{\theta}) \\ &= - \ln P(\boldsymbol{A}|\boldsymbol{\theta}) - \ln P(\boldsymbol{\theta}).\end{split}\]

This value is also called the description length of the data, and it corresponds to the amount of information required to describe it (in nats).

For the traditional blockmodel (deg_corr == False), the model parameters are \(\boldsymbol{\theta} = \{\boldsymbol{e}, \boldsymbol{b}\}\), where \(\boldsymbol{e}\) is the matrix of edge counts between blocks, and \(\boldsymbol{b}\) is the overlapping partition of the nodes into blocks. For the degree-corrected blockmodel (deg_corr == True), we have an additional set of parameters, namely the labelled degree sequence \(\boldsymbol{k}\).

The model likelihood \(P(\boldsymbol{A}|\theta)\) is given analogously to the non-overlapping case, as described in graph_tool.inference.blockmodel.BlockState.entropy().

If dl == True, the description length \(\mathcal{L} = -\ln P(\boldsymbol{\theta})\) of the model will be returned as well. The edge-count prior \(P(\boldsymbol{e})\) is described in described in model_entropy(). For the overlapping partition \(P(\boldsymbol{b})\), we have

\[-\ln P(\boldsymbol{b}) = \ln\left(\!\!{D \choose N}\!\!\right) + \sum_d \ln {\left(\!\!{{B\choose d}\choose n_d}\!\!\right)} + \ln N! - \sum_{\vec{b}}\ln n_{\vec{b}}!,\]

where \(d \equiv |\vec{b}|_1 = \sum_rb_r\) is the mixture size, \(n_d\) is the number of nodes in a mixture of size \(d\), \(D\) is the maximum value of \(d\), \(n_{\vec{b}}\) is the number of nodes in mixture \(\vec{b}\).

For the degree-corrected model we need to specify the prior \(P(\boldsymbol{k})\) for the labelled degree sequence as well:

\[-\ln P(\boldsymbol{k}) = \sum_r\ln\left(\!\!{m_r \choose e_r}\!\!\right) - \sum_{\vec{b}}\ln P(\boldsymbol{k}|{\vec{b}}),\]

where \(m_r\) is the number of non-empty mixtures which contain type \(r\), and \(P(\boldsymbol{k}|{\vec{b}})\) is the likelihood of the labelled degree sequence inside mixture \(\vec{b}\). For this term we have three options:

  1. degree_dl_kind == "uniform"

    \[P(\boldsymbol{k}|\vec{b}) = \prod_r\left(\!\!{n_{\vec{b}}\choose e^r_{\vec{b}}}\!\!\right)^{-1}.\]
  2. degree_dl_kind == "distributed"

    \[P(\boldsymbol{k}|\vec{b}) = \prod_{\vec{b}}\frac{\prod_{\vec{k}}\eta_{\vec{k}}^{\vec{b}}!}{n_{\vec{b}}!} \prod_r q(e_{\vec{b}}^r - n_{\vec{b}}, n_{\vec{b}})\]

    where \(n^{\vec{b}}_{\vec{k}}\) is the number of nodes in mixture \(\vec{b}\) with labelled degree \(\vec{k}\), and \(q(n,m)\) is the number of partitions of integer \(n\) into at most \(m\) parts.

  3. degree_dl_kind == "entropy"

    \[P(\boldsymbol{k}|\vec{b}) = \prod_{\vec{b}}\exp\left(-n_{\vec{b}}H(\boldsymbol{k}_{\vec{b}})\right)\]

    where \(H(\boldsymbol{k}_{\vec{b}}) = -\sum_{\vec{k}}p_{\vec{b}}(\vec{k})\ln p_{\vec{b}}(\vec{k})\) is the entropy of the labelled degree distribution inside mixture \(\vec{b}\).

    Note that, differently from the other two choices, this represents only an approximation of the description length. It is meant to be used only for comparison purposes, and should be avoided in practice.

For the directed case, the above expressions are duplicated for the in- and out-degrees.

mcmc_sweep(bundled=False, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings rejection sampling MCMC to sample network partitions. If bundled == True, the half-edges incident of the same node that belong to the same group are moved together. All remaining parameters are passed to graph_tool.inference.blockmodel.BlockState.mcmc_sweep().

shrink(B, **kwargs)[source]

Reduces the order of current state by progressively merging groups, until only B are left. All remaining keyword arguments are passed to graph_tool.inference.blockmodel.BlockState.merge_sweep().

This function leaves the current state untouched and returns instead a copy with the new partition.

draw(**kwargs)[source]

Convenience wrapper to graph_draw() that draws the state of the graph as colors on the vertices and edges.

graph_tool.inference.overlap_blockmodel.half_edge_graph(g, b=None, B=None, rec=None)[source]

Generate a half-edge graph, where each half-edge is represented by a node, and an edge connects the half-edges like in the original graph.

graph_tool.inference.overlap_blockmodel.augmented_graph(g, b, node_index, eweight=None)[source]

Generates an augmented graph from the half-edge graph g partitioned according to b, where each half-edge belonging to a different group inside each node forms a new node.

graph_tool.inference.overlap_blockmodel.get_block_edge_gradient(g, be, cmap=None)[source]

Get edge gradients corresponding to the block membership at the endpoints of the edges given by the be edge property map.

Parameters
gGraph

The graph.

beEdgePropertyMap

Vector-valued edge property map with the block membership at each endpoint.

cmapmatplotlib.colors.Colormap (optional, default: default_cm)

Color map used to construct the gradient.

Returns
cpEdgePropertyMap

A vector-valued edge property map containing a color gradient.

class graph_tool.inference.layered_blockmodel.LayeredBlockState(g, ec, eweight=None, vweight=None, recs=[], rec_types=[], rec_params=[], b=None, B=None, clabel=None, pclabel=False, layers=False, deg_corr=True, overlap=False, **kwargs)[source]

Bases: graph_tool.inference.overlap_blockmodel.OverlapBlockState, graph_tool.inference.blockmodel.BlockState

The (possibly overlapping) block state of a given graph, where the edges are divided into discrete layers.

Parameters
gGraph

Graph to be modelled.

ecEdgePropertyMap

Edge property map containing discrete edge covariates that will split the network in discrete layers.

recslist of EdgePropertyMap instances (optional, default: [])

List of real or discrete-valued edge covariates.

rec_typeslist of edge covariate types (optional, default: [])

List of types of edge covariates. The possible types are: "real-exponential", "real-normal", "discrete-geometric", "discrete-poisson" or "discrete-binomial".

rec_paramslist of dict (optional, default: [])

Model hyperparameters for edge covariates. This should a list of dict instances. See BlockState for more details.

eweightEdgePropertyMap (optional, default: None)

Edge multiplicities (for multigraphs or block graphs).

vweightVertexPropertyMap (optional, default: None)

Vertex multiplicities (for block graphs).

bVertexPropertyMap or numpy.ndarray (optional, default: None)

Initial block labels on the vertices or half-edges. If not supplied, it will be randomly sampled.

Bint (optional, default: None)

Number of blocks (or vertex groups). If not supplied it will be obtained from the parameter b.

clabelVertexPropertyMap (optional, default: None)

Constraint labels on the vertices. If supplied, vertices with different label values will not be clustered in the same group.

pclabelVertexPropertyMap (optional, default: None)

Partition constraint labels on the vertices. This has the same interpretation as clabel, but will be used to compute the partition description length.

layersbool (optional, default: False)

If layers == True, the “independent layers” version of the model is used, instead of the “edge covariates” version.

deg_corrbool (optional, default: True)

If True, the degree-corrected version of the blockmodel ensemble will be assumed, otherwise the traditional variant will be used.

overlapbool (optional, default: False)

If True, the overlapping version of the model will be used.

get_N()[source]

Returns the total number of edges.

get_E()[source]

Returns the total number of nodes.

get_B()[source]

Returns the total number of blocks.

get_nonempty_B()[source]

Returns the total number of nonempty blocks.

copy(g=None, eweight=None, vweight=None, b=None, B=None, deg_corr=None, clabel=None, pclabel=None, bfield=None, overlap=None, layers=None, ec=None, **kwargs)[source]

Copies the block state. The parameters override the state properties, and have the same meaning as in the constructor.

get_bg()[source]

Returns the block graph.

get_block_state(b=None, vweight=False, deg_corr=False, overlap=False, layers=None, **kwargs)[source]

Returns a LayeredBlockState corresponding to the block graph. The parameters have the same meaning as the in the constructor.

get_edge_blocks()[source]

Returns an edge property map which contains the block labels pairs for each edge.

get_overlap_blocks()[source]

Returns the mixed membership of each vertex.

Returns
bvVertexPropertyMap

A vector-valued vertex property map containing the block memberships of each node.

bc_inVertexPropertyMap

The labelled in-degrees of each node, i.e. how many in-edges belong to each group, in the same order as the bv property above.

bc_outVertexPropertyMap

The labelled out-degrees of each node, i.e. how many out-edges belong to each group, in the same order as the bv property above.

bc_totalVertexPropertyMap

The labelled total degrees of each node, i.e. how many incident edges belong to each group, in the same order as the bv property above.

get_nonoverlap_blocks()[source]

Returns a scalar-valued vertex property map with the block mixture represented as a single number.

get_majority_blocks()[source]

Returns a scalar-valued vertex property map with the majority block membership of each node.

entropy(adjacency=True, dl=True, partition_dl=True, degree_dl=True, degree_dl_kind='distributed', edges_dl=True, dense=False, multigraph=True, deg_entropy=True, exact=True, **kwargs)[source]

Calculate the entropy associated with the current block partition. The meaning of the parameters are the same as in graph_tool.inference.blockmodel.BlockState.entropy().

get_edges_prob(missing, spurious=[], entropy_args={})[source]

Compute the joint log-probability of the missing and spurious edges given by missing and spurious (a list of (source, target, layer) tuples, or Edge() instances), together with the observed edges.

More precisely, the log-likelihood returned is

\[\ln \frac{P(\boldsymbol G + \delta \boldsymbol G | \boldsymbol b)}{P(\boldsymbol G| \boldsymbol b)}\]

where \(\boldsymbol G + \delta \boldsymbol G\) is the modified graph (with missing edges added and spurious edges deleted).

The values in entropy_args are passed to graph_tool.inference.blockmodel.BlockState.entropy() to calculate the log-probability.

mcmc_sweep(bundled=False, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings rejection sampling MCMC to sample network partitions. If bundled == True and the state is an overlapping one, the half-edges incident of the same node that belong to the same group are moved together. All remaining parameters are passed to graph_tool.inference.blockmodel.BlockState.mcmc_sweep().

shrink(B, **kwargs)[source]

Reduces the order of current state by progressively merging groups, until only B are left. All remaining keyword arguments are passed to graph_tool.inference.blockmodel.BlockState.shrink() or graph_tool.inference.overlap_blockmodel.OverlapBlockState.shrink(), as appropriate.

This function leaves the current state untouched and returns instead a copy with the new partition.

draw(**kwargs)[source]

Convenience function to draw the current state. All keyword arguments are passed to graph_tool.inference.blockmodel.BlockState.draw() or graph_tool.inference.overlap_blockmodel.OverlapBlockState.draw(), as appropriate.

class graph_tool.inference.nested_blockmodel.NestedBlockState(g, bs=None, base_type=<class 'graph_tool.inference.blockmodel.BlockState'>, state_args={}, hstate_args={}, hentropy_args={}, sampling=True, **kwargs)[source]

Bases: object

The nested stochastic block model state of a given graph.

Parameters
gGraph

Graph to be modeled.

bslist of VertexPropertyMap or numpy.ndarray (optional, default: None)

Hierarchical node partition. If not provided it will correspond to a single-group hierarchy of length \(\lceil\log_2(N)\rceil\).

base_typetype (optional, default: BlockState)

State type for lowermost level (e.g. BlockState, OverlapBlockState or LayeredBlockState)

hstate_argsdict (optional, default: {})

Keyword arguments to be passed to the constructor of the higher-level states.

hentropy_argsdict (optional, default: {})

Keyword arguments to be passed to the entropy() method of the higher-level states.

samplingbool (optional, default: True)

If True, the state will be properly prepared for MCMC sampling (as opposed to minimization).

state_argsdict (optional, default: {})

Keyword arguments to be passed to base type constructor.

**kwargskeyword arguments

Keyword arguments to be passed to base type constructor. The state_args parameter overrides this.

copy(g=None, bs=None, state_args=None, hstate_args=None, hentropy_args=None, sampling=None, **kwargs)[source]

Copies the block state. The parameters override the state properties, and have the same meaning as in the constructor.

get_bs()[source]

Get hierarchy levels as a list of numpy.ndarray objects with the group memberships at each level.

get_state()[source]

Alias to get_bs().

set_state(bs)[source]

Sets the internal nested partition of the state.

get_levels()[source]

Get hierarchy levels as a list of BlockState instances.

project_partition(j, l)[source]

Project partition of level j onto level l, and return it.

propagate_clabel(l)[source]

Project base clabel to level l.

get_clabel(l)[source]

Get clabel for level l.

replace_level(l, b)[source]

Replace level l given the new partition b

delete_level(l)[source]

Delete level l.

duplicate_level(l)[source]

Duplicate level l.

level_entropy(l, bstate=None, **kwargs)[source]

Compute the entropy of level l.

entropy(**kwargs)[source]

Compute the entropy of whole hierarchy.

The keyword arguments are passed to the entropy() method of the underlying state objects (e.g. graph_tool.inference.blockmodel.BlockState.entropy, graph_tool.inference.overlap_blockmodel.OverlapBlockState.entropy, or graph_tool.inference.layered_blockmodel.LayeredBlockState.entropy).

move_vertex(v, s)[source]

Move vertex v to block s.

remove_vertex(v)[source]

Remove vertex v from its current group.

This optionally accepts a list of vertices to remove.

Warning

This will leave the state in an inconsistent state before the vertex is returned to some other group, or if the same vertex is removed twice.

add_vertex(v, r)[source]

Add vertex v to block r.

This optionally accepts a list of vertices and blocks to add.

Warning

This can leave the state in an inconsistent state if a vertex is added twice to the same group.

get_edges_prob(missing, spurious=[], entropy_args={})[source]

Compute the joint log-probability of the missing and spurious edges given by missing and spurious (a list of (source, target) tuples, or Edge() instances), together with the observed edges.

More precisely, the log-likelihood returned is

\[\ln \frac{P(\boldsymbol G + \delta \boldsymbol G | \boldsymbol b)}{P(\boldsymbol G| \boldsymbol b)}\]

where \(\boldsymbol G + \delta \boldsymbol G\) is the modified graph (with missing edges added and spurious edges deleted).

The values in entropy_args are passed to graph_tool.inference.blockmodel.BlockState.entropy() to calculate the log-probability.

get_bstack()[source]

Return the nested levels as individual graphs.

This returns a list of Graph instances representing the inferred hierarchy at each level. Each graph has two internal vertex and edge property maps named “count” which correspond to the vertex and edge counts at the lower level, respectively. Additionally, an internal vertex property map named “b” specifies the block partition.

project_level(l)[source]

Project the partition at level l onto the lowest level, and return the corresponding state.

print_summary()[source]

Print a hierarchy summary.

find_new_level(l, bisection_args={}, B_min=None, B_max=None, b_min=None, b_max=None)[source]

Attempt to find a better network partition at level l, using bisection_minimize() with arguments given by bisection_args.

mcmc_sweep(**kwargs)[source]

Perform niter sweeps of a Metropolis-Hastings acceptance-rejection MCMC to sample hierarchical network partitions.

The arguments accepted are the same as in graph_tool.inference.blockmodel.BlockState.mcmc_sweep().

If the parameter c is a scalar, the values used at each level are c * 2 ** l for l in the range [0, L-1]. Optionally, a list of values may be passed instead, which specifies the value of c[l] to be used at each level.

Warning

This function performs niter sweeps at each hierarchical level once. This means that in order for the chain to equilibrate, we need to call this function several times, i.e. it is not enough to call it once with a large value of niter.

multiflip_mcmc_sweep(**kwargs)[source]

Perform niter sweeps of a Metropolis-Hastings acceptance-rejection MCMC with multiple moves to sample hierarchical network partitions.

The arguments accepted are the same as in graph_tool.inference.blockmodel.BlockState.multiflip_mcmc_sweep().

If the parameter c is a scalar, the values used at each level are c * 2 ** l for l in the range [0, L-1]. Optionally, a list of values may be passed instead, which specifies the value of c[l] to be used at each level.

Warning

This function performs niter sweeps at each hierarchical level once. This means that in order for the chain to equilibrate, we need to call this function several times, i.e. it is not enough to call it once with a large value of niter.

gibbs_sweep(**kwargs)[source]

Perform niter sweeps of a rejection-free Gibbs sampling MCMC to sample network partitions.

The arguments accepted are the same as in graph_tool.inference.blockmodel.BlockState.gibbs_sweep().

Warning

This function performs niter sweeps at each hierarchical level once. This means that in order for the chain to equilibrate, we need to call this function several times, i.e. it is not enough to call it once with a large value of niter.

multicanonical_sweep(**kwargs)[source]

Perform niter sweeps of a non-Markovian multicanonical sampling using the Wang-Landau algorithm.

The arguments accepted are the same as in graph_tool.inference.blockmodel.BlockState.multicanonical_sweep().

collect_partition_histogram(h=None, update=1)[source]

Collect a histogram of partitions.

This should be called multiple times, e.g. after repeated runs of the graph_tool.inference.nested_blockmodel.NestedBlockState.mcmc_sweep() function.

Parameters
hPartitionHist (optional, default: None)

Partition histogram. If not provided, an empty histogram will be created.

updatefloat (optional, default: 1)

Each call increases the current count by the amount given by this parameter.

Returns
hPartitionHist (optional, default: None)

Updated Partition histogram.

draw(**kwargs)[source]

Convenience wrapper to draw_hierarchy() that draws the hierarchical state.

graph_tool.inference.nested_blockmodel.hierarchy_minimize(state, B_min=None, B_max=None, b_min=None, b_max=None, frozen_levels=None, bisection_args={}, epsilon=1e-08, verbose=False)[source]

Attempt to find a fit of the nested stochastic block model that minimizes the description length.

Parameters
stateNestedBlockState

The nested block state.

B_minint (optional, default: None)

The minimum number of blocks.

B_maxint (optional, default: None)

The maximum number of blocks.

b_minVertexPropertyMap (optional, default: None)

The partition to be used with the minimum number of blocks.

b_maxVertexPropertyMap (optional, default: None)

The partition to be used with the maximum number of blocks.

frozen_levelssequence of int values (optional, default: None)

List of hierarchy levels that are kept constant during the minimization.

bisection_argsdict (optional, default: {})

Arguments to be passed to bisection_minimize().

epsilon: ``float`` (optional, default: ``1e-8``)

Only replace levels if the description length difference is above this threshold.

verbosebool or tuple (optional, default: False)

If True, progress information will be shown. Optionally, this accepts arguments of the type tuple of the form (level, prefix) where level is a positive integer that specifies the level of detail, and prefix is a string that is prepended to the all output messages.

Returns
min_stateNestedBlockState

Nested state with minimal description length.

Notes

This algorithms moves along the hierarchical levels, attempting to replace, delete or insert partitions that minimize the description length, until no further progress is possible.

See [peixoto-hierarchical-2014] for details on the algorithm.

This algorithm has a complexity of \(O(V \ln^2 V)\), where \(V\) is the number of nodes in the network.

References

peixoto-hierarchical-2014

Tiago P. Peixoto, “Hierarchical block structures and high-resolution model selection in large networks “, Phys. Rev. X 4, 011047 (2014), DOI: 10.1103/PhysRevX.4.011047 [sci-hub, @tor], arXiv: 1310.4377.

graph_tool.inference.nested_blockmodel.get_hierarchy_tree(state, empty_branches=True)[source]

Obtain the nested hierarchical levels as a tree.

This transforms a NestedBlockState instance into a single Graph instance containing the hierarchy tree.

Parameters
stateNestedBlockState

Nested block model state.

empty_branchesbool (optional, default: True)

If empty_branches == False, dangling branches at the upper layers will be pruned.

Returns
treeGraph

A directed graph, where vertices are blocks, and a directed edge points to an upper to a lower level in the hierarchy.

labelVertexPropertyMap

A vertex property map containing the block label for each node.

orderVertexPropertyMap

A vertex property map containing the relative ordering of each layer according to the total degree of the groups at the specific levels.

class graph_tool.inference.uncertain_blockmodel.UncertainBaseState(g, nested=True, state_args={}, bstate=None, self_loops=False, init_empty=False)[source]

Bases: object

Base state for uncertain network inference.

get_block_state()[source]

Return the underlying block state, which can be either BlockState or NestedBlockState.

entropy(latent_edges=True, density=True, **kwargs)[source]

Return the entropy, i.e. negative log-likelihood.

mcmc_sweep(r=0.5, multiflip=True, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings acceptance-rejection sampling MCMC to sample network partitions and latent edges. The parameter r controls the probability with which edge move will be attempted, instead of partition moves. The remaining keyword parameters will be passed to mcmc_sweep() or multiflip_mcmc_sweep(), if multiflip=True.

multiflip_mcmc_sweep(**kwargs)[source]

Alias for mcmc_sweep() with multiflip=True.

get_edge_prob(u, v, entropy_args={}, epsilon=1e-08)[source]

Return conditional posterior log-probability of edge \((u,v)\).

get_edges_prob(elist, entropy_args={}, epsilon=1e-08)[source]

Return conditional posterior log-probability of an edge list, with shape \((E,2)\).

get_graph()[source]

Return the current inferred graph.

collect_marginal(g=None)[source]

Collect marginal inferred network during MCMC runs.

Parameters
gGraph (optional, default: None)

Previous marginal graph.

Returns
gGraph

New marginal graph, with internal edge EdgePropertyMap "eprob", containing the marginal probabilities for each edge.

Notes

The posterior marginal probability of an edge \((i,j)\) is defined as

\[\pi_{ij} = \sum_{\boldsymbol A}A_{ij}P(\boldsymbol A|\boldsymbol D)\]

where \(P(\boldsymbol A|\boldsymbol D)\) is the posterior probability given the data.

collect_marginal_multigraph(g=None)[source]

Collect marginal latent multigraph during MCMC runs.

Parameters
gGraph (optional, default: None)

Previous marginal multigraph.

Returns
gGraph

New marginal graph, with internal edge EdgePropertyMap "w" and "wcount", containing the edge multiplicities and their respective counts.

Notes

The mean posterior marginal multiplicity distribution of a multi-edge \((i,j)\) is defined as

\[\pi_{ij}(w) = \sum_{\boldsymbol G}\delta_{w,G_{ij}}P(\boldsymbol G|\boldsymbol D)\]

where \(P(\boldsymbol G|\boldsymbol D)\) is the posterior probability of a multigraph \(\boldsymbol G\) given the data.

class graph_tool.inference.uncertain_blockmodel.UncertainBlockState(g, q, q_default=0.0, aE=nan, nested=True, state_args={}, bstate=None, self_loops=False, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.UncertainBaseState

Inference state of an uncertain graph, using the stochastic block model as a prior.

Parameters
gGraph

Measured graph.

qEdgePropertyMap

Edge probabilities in range \([0,1]\).

q_defaultfloat (optional, default: 0.)

Non-edge probability in range \([0,1]\).

aEfloat (optional, default: NaN)

Expected total number of edges used in prior. If NaN, a flat prior will be used instead.

nestedboolean (optional, default: True)

If True, a NestedBlockState will be used, otherwise BlockState.

state_argsdict (optional, default: {})

Arguments to be passed to NestedBlockState or BlockState.

bstateNestedBlockState or BlockState (optional, default: None)

If passed, this will be used to initialize the block state directly.

self_loopsbool (optional, default: False)

If True, it is assumed that the uncertain graph can contain self-loops.

References

peixoto-reconstructing-2018

Tiago P. Peixoto, “Reconstructing networks with unknown and heterogeneous errors”, Phys. Rev. X 8 041011 (2018). DOI: 10.1103/PhysRevX.8.041011 [sci-hub, @tor], arXiv: 1806.07956

copy(**kwargs)[source]

Return a copy of the state.

class graph_tool.inference.uncertain_blockmodel.LatentMultigraphBlockState(g, aE=nan, nested=True, state_args={}, bstate=None, self_loops=False, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.UncertainBaseState

Inference state of an erased Poisson multigraph, using the stochastic block model as a prior.

Parameters
gGraph

Measured graph.

aEfloat (optional, default: NaN)

Expected total number of edges used in prior. If NaN, a flat prior will be used instead.

nestedboolean (optional, default: True)

If True, a NestedBlockState will be used, otherwise BlockState.

state_argsdict (optional, default: {})

Arguments to be passed to NestedBlockState or BlockState.

bstateNestedBlockState or BlockState (optional, default: None)

If passed, this will be used to initialize the block state directly.

self_loopsbool (optional, default: False)

If True, it is assumed that the uncertain graph can contain self-loops.

References

peixoto-latent-2020

Tiago P. Peixoto, “Latent Poisson models for networks with heterogeneous density”, arXiv: 2002.07803

copy(**kwargs)[source]

Return a copy of the state.

class graph_tool.inference.uncertain_blockmodel.MeasuredBlockState(g, n, x, n_default=1, x_default=0, fn_params={'alpha': 1, 'beta': 1}, fp_params={'mu': 1, 'nu': 1}, aE=nan, nested=True, state_args={}, bstate=None, self_loops=False, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.UncertainBaseState

Inference state of a measured graph, using the stochastic block model as a prior.

Parameters
gGraph

Measured graph.

nEdgePropertyMap

Edge property map of type int, containing the total number of measurements for each edge.

xEdgePropertyMap

Edge property map of type int, containing the number of positive measurements for each edge.

n_defaultint (optional, default: 1)

Total number of measurements for each non-edge.

x_defaultint (optional, default: 0)

Total number of positive measurements for each non-edge.

fn_paramsdict (optional, default: dict(alpha=1, beta=1))

Beta distribution hyperparameters for the probability of missing edges (false negatives).

fp_paramsdict (optional, default: dict(mu=1, nu=1))

Beta distribution hyperparameters for the probability of spurious edges (false positives).

aEfloat (optional, default: NaN)

Expected total number of edges used in prior. If NaN, a flat prior will be used instead.

nestedboolean (optional, default: True)

If True, a NestedBlockState will be used, otherwise BlockState.

state_argsdict (optional, default: {})

Arguments to be passed to NestedBlockState or BlockState.

bstateNestedBlockState or BlockState (optional, default: None)

If passed, this will be used to initialize the block state directly.

self_loopsbool (optional, default: False)

If True, it is assumed that the uncertain graph can contain self-loops.

References

peixoto-reconstructing-2018

Tiago P. Peixoto, “Reconstructing networks with unknown and heterogeneous errors”, Phys. Rev. X 8 041011 (2018). DOI: 10.1103/PhysRevX.8.041011 [sci-hub, @tor], arXiv: 1806.07956

copy(**kwargs)[source]

Return a copy of the state.

set_hparams(alpha, beta, mu, nu)[source]

Set edge and non-edge hyperparameters.

get_p_posterior()[source]

Get beta distribution parameters for the posterior probability of missing edges.

get_q_posterior()[source]

Get beta distribution parameters for the posterior probability of spurious edges.

class graph_tool.inference.uncertain_blockmodel.MixedMeasuredBlockState(g, n, x, n_default=1, x_default=0, fn_params={'alpha': 1, 'beta': 10}, fp_params={'mu': 1, 'nu': 10}, aE=nan, nested=True, state_args={}, bstate=None, self_loops=False, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.UncertainBaseState

Inference state of a measured graph with heterogeneous errors, using the stochastic block model as a prior.

Parameters
gGraph

Measured graph.

nEdgePropertyMap

Edge property map of type int, containing the total number of measurements for each edge.

xEdgePropertyMap

Edge property map of type int, containing the number of positive measurements for each edge.

n_defaultint (optional, default: 1)

Total number of measurements for each non-edge.

x_defaultint (optional, default: 1)

Total number of positive measurements for each non-edge.

fn_paramsdict (optional, default: dict(alpha=1, beta=10))

Beta distribution hyperparameters for the probability of missing edges (false negatives).

fp_paramsdict (optional, default: dict(mu=1, nu=10))

Beta distribution hyperparameters for the probability of spurious edges (false positives).

aEfloat (optional, default: NaN)

Expected total number of edges used in prior. If NaN, a flat prior will be used instead.

nestedboolean (optional, default: True)

If True, a NestedBlockState will be used, otherwise BlockState.

state_argsdict (optional, default: {})

Arguments to be passed to NestedBlockState or BlockState.

bstateNestedBlockState or BlockState (optional, default: None)

If passed, this will be used to initialize the block state directly.

self_loopsbool (optional, default: False)

If True, it is assumed that the uncertain graph can contain self-loops.

References

peixoto-reconstructing-2018

Tiago P. Peixoto, “Reconstructing networks with unknown and heterogeneous errors”, Phys. Rev. X 8 041011 (2018). DOI: 10.1103/PhysRevX.8.041011 [sci-hub, @tor], arXiv: 1806.07956

set_hparams(alpha, beta, mu, nu)[source]

Set edge and non-edge hyperparameters.

copy(**kwargs)[source]

Return a copy of the state.

mcmc_sweep(r=0.5, h=0.1, hstep=1, multiflip=True, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings acceptance-rejection sampling MCMC to sample network partitions and latent edges. The parameter r controls the probability with which edge move will be attempted, instead of partition moves. The parameter h controls the relative probability with which hyperparamters moves will be attempted, and hstep is the size of the step.

The remaining keyword parameters will be passed to mcmc_sweep() or multiflip_mcmc_sweep(), if multiflip=True.

class graph_tool.inference.uncertain_blockmodel.DynamicsBlockStateBase(g, s, t, x=None, aE=nan, nested=True, state_args={}, bstate=None, self_loops=False, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.UncertainBaseState

Base state for network reconstruction based on dynamical data, using the stochastic block model as a prior. This class is not meant to be instantiated directly, only indirectly via one of its subclasses.

set_params(params)[source]

Sets the model parameters via the dictionary params.

copy(**kwargs)[source]

Return a copy of the state.

get_x()[source]

Return latent edge covariates.

get_edge_prob(u, v, x, entropy_args={}, epsilon=1e-08)[source]

Return conditional posterior log-probability of edge \((u,v)\).

collect_marginal(g=None)[source]

Collect marginal inferred network during MCMC runs.

Parameters
gGraph (optional, default: None)

Previous marginal graph.

Returns
gGraph

New marginal graph, with internal edge EdgePropertyMap "eprob", containing the marginal probabilities for each edge.

Notes

The posterior marginal probability of an edge \((i,j)\) is defined as

\[\pi_{ij} = \sum_{\boldsymbol A}A_{ij}P(\boldsymbol A|\boldsymbol D)\]

where \(P(\boldsymbol A|\boldsymbol D)\) is the posterior probability given the data.

class graph_tool.inference.uncertain_blockmodel.EpidemicsBlockState(g, s, beta, r, r_v=None, global_beta=None, active=None, t=[], exposed=False, aE=nan, nested=True, state_args={}, bstate=None, self_loops=False, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.DynamicsBlockStateBase

Inference state for network reconstruction based on epidemic dynamics, using the stochastic block model as a prior.

Parameters
gGraph

Initial graph state.

slist of VertexPropertyMap

Collection of time-series with node states over time. Each entry in this list must be a VertexPropertyMap with type vector<int> containing the states of each node in each time step. A value of 1 means infected and 0 susceptible. Other values are allowed (e.g. for recovered), but their actual value is unimportant for reconstruction.

If the parameter t below is given, each property map value for a given node should contain only the states for the same points in time given by that parameter.

betafloat or EdgePropertyMap

Initial value of the global or local transmission probability for each edge.

rfloat

Spontaneous infection probability.

r_vVertexPropertyMap (optional, default: None)

If given, this will set the initial spontaneous infection probability for each node, and trigger the use of a model where this quantity is in principle different for each node.

global_betafloat (optional, default: None)

If provided, and beta is None this will trigger the use of a model where all transmission probabilities on edges are the same, and given (initially) by this value.

tlist of VertexPropertyMap (optional, default: [])

If nonempty, this allows for a compressed representation of the time-series parameter s, corresponding only to points in time where the state of each node changes. Each each entry in this list must be a VertexPropertyMap with type vector<int> containing the points in time where the state of each node change. The corresponding state of the nodes at these times are given by parameter s.

activelist of VertexPropertyMap (optional, default: None)

If given, this specifies the points in time where each node is “active”, and prepared to change its state according to the state of its neighbors. Each entry in this list must be a VertexPropertyMap with type vector<int> containing the states of each node in each time step. A value of 1 means active and 0 inactive.

exposedboolean (optional, default: False)

If True, the data is supposed to come from a SEI, SEIR, etc. model, where a susceptible node (valued 0) first transits to an exposed state (valued -1) upon transmission, before transiting to the infective state (valued 1).

aEfloat (optional, default: NaN)

Expected total number of edges used in prior. If NaN, a flat prior will be used instead.

nestedboolean (optional, default: True)

If True, a NestedBlockState will be used, otherwise BlockState.

state_argsdict (optional, default: {})

Arguments to be passed to NestedBlockState or BlockState.

bstateNestedBlockState or BlockState (optional, default: None)

If passed, this will be used to initialize the block state directly.

self_loopsbool (optional, default: False)

If True, it is assumed that the inferred graph can contain self-loops.

References

peixoto-network-2019

Tiago P. Peixoto, “Network reconstruction and community detection from dynamics”, Phys. Rev. Lett. 123 128301 (2019), DOI: 10.1103/PhysRevLett.123.128301 [sci-hub, @tor], arXiv: 1903.10833

copy(**kwargs)[source]

Return a copy of the state.

set_params(params)[source]

Sets the model parameters via the dictionary params.

get_x()[source]

Return latent edge transmission probabilities.

mcmc_sweep(r=0.5, p=0.1, pstep=0.1, h=0.1, hstep=1, xstep=0.1, multiflip=True, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings acceptance-rejection sampling MCMC to sample network partitions and latent edges. The parameter r controls the probability with which edge move will be attempted, instead of partition moves. The parameter h controls the relative probability with which moves for the parameters r_v will be attempted, and hstep is the size of the step. The parameter p controls the relative probability with which moves for the parameters global_beta and r will be attempted, and pstep is the size of the step. The paramter xstep determines the size of the attempted steps for the edge transmission probabilities.

The remaining keyword parameters will be passed to mcmc_sweep() or multiflip_mcmc_sweep(), if multiflip=True.

class graph_tool.inference.uncertain_blockmodel.IsingBaseBlockState(g, s, beta, x=None, h=None, t=None, aE=nan, nested=True, state_args={}, bstate=None, self_loops=False, has_zero=False, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.DynamicsBlockStateBase

Base state for network reconstruction based on the Ising model, using the stochastic block model as a prior. This class is not supposed to be instantiated directly. Instead one of its specialized subclasses must be used, which have the same signature: IsingGlauberBlockState, PseudoIsingBlockState, CIsingGlauberBlockState, PseudoCIsingBlockState.

Parameters
gGraph

Initial graph state.

slist of VertexPropertyMap or VertexPropertyMap

Collection of time-series with node states over time, or a single time-series. Each time-series must be a VertexPropertyMap with type vector<int> containing the Ising states (-1 or +1) of each node in each time step.

If the parameter t below is given, each property map value for a given node should contain only the states for the same points in time given by that parameter.

betafloat

Initial value of the global inverse temperature.

xEdgePropertyMap (optional, default: None)

Initial value of the local coupling for each edge. If not given, a uniform value of 1 will be used.

hVertexPropertyMap (optional, default: None)

If given, this will set the initial local fields of each node. Otherwise a value of 0 will be used.

tlist of VertexPropertyMap (optional, default: [])

If nonempty, this allows for a compressed representation of the time-series parameter s, corresponding only to points in time where the state of each node changes. Each each entry in this list must be a VertexPropertyMap with type vector<int> containing the points in time where the state of each node change. The corresponding state of the nodes at these times are given by parameter s.

aEfloat (optional, default: NaN)

Expected total number of edges used in prior. If NaN, a flat prior will be used instead.

nestedboolean (optional, default: True)

If True, a NestedBlockState will be used, otherwise BlockState.

state_argsdict (optional, default: {})

Arguments to be passed to NestedBlockState or BlockState.

bstateNestedBlockState or BlockState (optional, default: None)

If passed, this will be used to initialize the block state directly.

self_loopsbool (optional, default: False)

If True, it is assumed that the inferred graph can contain self-loops.

has_zerobool (optional, default: False)

If True, the three-state “Ising” model with values {-1,0,1} is used.

References

peixoto-network-2019

Tiago P. Peixoto, “Network reconstruction and community detection from dynamics”, Phys. Rev. Lett. 123 128301 (2019), DOI: 10.1103/PhysRevLett.123.128301 [sci-hub, @tor], arXiv: 1903.10833

get_x()[source]

Return edge couplings.

mcmc_sweep(r=0.5, p=0.1, pstep=0.1, h=0.1, hstep=1, xstep=0.1, multiflip=True, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings acceptance-rejection sampling MCMC to sample network partitions and latent edges. The parameter r controls the probability with which edge move will be attempted, instead of partition moves. The parameter h controls the relative probability with which moves for the parameters r_v will be attempted, and hstep is the size of the step. The parameter p controls the relative probability with which moves for the parameters global_beta and r will be attempted, and pstep is the size of the step. The paramter xstep determines the size of the attempted steps for the edge coupling parameters.

The remaining keyword parameters will be passed to mcmc_sweep() or multiflip_mcmc_sweep(), if multiflip=True.

class graph_tool.inference.uncertain_blockmodel.IsingGlauberBlockState(*args, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.IsingBaseBlockState

State for network reconstruction based on the Glauber dynamics of the Ising model, using the stochastic block model as a prior.

See documentation for IsingBaseBlockState for details.

class graph_tool.inference.uncertain_blockmodel.CIsingGlauberBlockState(*args, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.IsingBaseBlockState

State for network reconstruction based on the Glauber dynamics of the continuous Ising model, using the stochastic block model as a prior.

See documentation for IsingBaseBlockState for details. Note that in this case the s parameter should contain property maps of type vector<double>, with values in the range \([-1,1]\).

class graph_tool.inference.uncertain_blockmodel.PseudoIsingBlockState(*args, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.IsingBaseBlockState

State for network reconstruction based on the equilibrium configurations of the Ising model, using the Pseudolikelihood approximation and the stochastic block model as a prior.

See documentation for IsingBaseBlockState for details. Note that in this model “time-series” should be interpreted as a set of uncorrelated samples, not a temporal sequence.

class graph_tool.inference.uncertain_blockmodel.PseudoCIsingBlockState(*args, **kwargs)[source]

Bases: graph_tool.inference.uncertain_blockmodel.IsingBaseBlockState

State for network reconstruction based on the equilibrium configurations of the continuous Ising model, using the Pseudolikelihood approximation and the stochastic block model as a prior.

See documentation for IsingBaseBlockState for details. Note that in this model “time-series” should be interpreted as a set of uncorrelated samples, not a temporal sequence. Additionally, the s parameter should contain property maps of type vector<double>, with values in the range \([-1,1]\).

graph_tool.inference.uncertain_blockmodel.marginal_multigraph_entropy(g, ecount)[source]

Compute the entropy of the marginal latent multigraph distribution.

Parameters
gGraph

Marginal multigraph.

ecountEdgePropertyMap

Vector-valued edge property map containing the counts of edge multiplicities.

Returns
ehEdgePropertyMap

Marginal entropy of edge multiplicities.

Notes

The mean posterior marginal multiplicity distribution of a multi-edge \((i,j)\) is defined as

\[\pi_{ij}(w) = \sum_{\boldsymbol G}\delta_{w,G_{ij}}P(\boldsymbol G|\boldsymbol D)\]

where \(P(\boldsymbol G|\boldsymbol D)\) is the posterior probability of a multigraph \(\boldsymbol G\) given the data.

The corresponding entropy is therefore given (in nats) by

\[\mathcal{S}_{ij} = -\sum_w\pi_{ij}(w)\ln \pi_{ij}(w).\]
graph_tool.inference.latent_multigraph.latent_multigraph(g, epsilon=1e-08, max_niter=0, verbose=False)[source]

Infer latent Poisson multigraph model given an “erased” simple graph.

Parameters
gGraph

Graph to be used. This is expected to be a simple graph.

epsilonfloat (optional, default: 1e-8)

Convergence criterion.

max_niterint (optional, default: 0)

Maximum number of iterations allowed (if 0, no maximum is assumed).

verboseboolean (optional, default: False)

If True, display verbose information.

Returns
uGraph

Latent graph.

wEdgePropertyMap

Edge property map with inferred edge multiplicities.

Notes

This implements the expectation maximization algorithm described in [peixoto-latent-2020] which consists in iterating the following steps until convergence:

  1. In the “expectation” step we obtain the marginal mean multiedge multiplicities via:

    \[\begin{split}w_{ij} = \begin{cases} \frac{\theta_i\theta_j}{1-\mathrm{e}^{-\theta_i\theta_j}} & \text{ if } G_{ij} = 1,\\ \theta_i^2 & \text{ if } i = j,\\ 0 & \text{ otherwise.} \end{cases}\end{split}\]
  2. In the “maximization” step we use the current values of \(\boldsymbol w\) to update the values of \(\boldsymbol \theta\):

    \[\theta_i = \frac{d_i}{\sqrt{\sum_jd_j}}, \quad\text{ with } d_i = \sum_jw_{ji}. \]

The equations above are adapted accordingly if the supplied graph is directed, where we have \(\theta_i\theta_j\to\theta_i^-\theta_j^+\), \(\theta_i^2\to\theta_i^-\theta_i^+\), and \(\theta_i^{\pm}=\frac{d_i^{\pm}}{\sqrt{\sum_jd_j^{\pm}}}\), with \(d^+_i = \sum_jw_{ji}\) and \(d^-_i = \sum_jw_{ij}\).

A single EM iteration takes time \(O(V + E)\). If enabled during compilation, this algorithm runs in parallel.

References

peixoto-latent-2020

Tiago P. Peixoto, “Latent Poisson models for networks with heterogeneous density”, arXiv: 2002.07803

Examples

>>> g = gt.collection.data["as-22july06"]
>>> gt.scalar_assortativity(g, "out")
(-0.198384..., 0.001338...)
>>> u, w = gt.latent_multigraph(g)
>>> gt.scalar_assortativity(u, "out", eweight=w)
(-0.048426..., 0.034526...)
graph_tool.inference.mcmc.mcmc_equilibrate(state, wait=1000, nbreaks=2, max_niter=inf, force_niter=None, epsilon=0, gibbs=False, multiflip=True, mcmc_args={}, entropy_args={}, history=False, callback=None, verbose=False)[source]

Equilibrate a MCMC with a given starting state.

Parameters
stateAny state class (e.g. BlockState)

Initial state. This state will be modified during the algorithm.

waitint (optional, default: 1000)

Number of iterations to wait for a record-breaking event.

nbreaksint (optional, default: 2)

Number of iteration intervals (of size wait) without record-breaking events necessary to stop the algorithm.

max_niterint (optional, default: numpy.inf)

Maximum number of iterations.

force_niterint (optional, default: None)

If given, will force the algorithm to run this exact number of iterations.

epsilonfloat (optional, default: 0)

Relative changes in entropy smaller than epsilon will not be considered as record-breaking.

gibbsbool (optional, default: False)

If True, each step will call state.gibbs_sweep instead of state.mcmc_sweep.

multiflipbool (optional, default: True)

If True, each step will call state.multiflip_mcmc_sweep instead of state.mcmc_sweep.

mcmc_argsdict (optional, default: {})

Arguments to be passed to state.mcmc_sweep (or state.gibbs_sweep).

historybool (optional, default: False)

If True, a list of tuples of the form (nattempts, nmoves, entropy) will be kept and returned, where entropy is the current entropy and nmoves is the number of vertices moved.

callbackfunction (optional, default: None)

If given, this function will be called after each iteration. The function must accept the current state as an argument, and its return value must be either None or a (possibly empty) list of values that will be append to the history, if history == True.

verbosebool or tuple (optional, default: False)

If True, progress information will be shown. Optionally, this accepts arguments of the type tuple of the form (level, prefix) where level is a positive integer that specifies the level of detail, and prefix is a string that is prepended to the all output messages.

Returns
historylist of tuples of the form (nattempts, nmoves, entropy)

Summary of the MCMC run. This is returned only if history == True.

entropyfloat

Current entropy value after run. This is returned only if history == False.

nattemptsint

Number of node move attempts.

nmovesint

Number of node moves.

Notes

The MCMC equilibration is attempted by keeping track of the maximum and minimum values, and waiting sufficiently long without a record-breaking event.

This function calls state.mcmc_sweep (or state.gibbs_sweep) at each iteration (e.g. graph_tool.inference.blockmodel.BlockState.mcmc_sweep() and graph_tool.inference.blockmodel.BlockState.gibbs_sweep()), and keeps track of the value of state.entropy(**args) with args corresponding to mcmc_args["entropy_args"].

References

peixoto-efficient-2014

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378

graph_tool.inference.mcmc.mcmc_anneal(state, beta_range=1.0, 10.0, niter=100, history=False, mcmc_equilibrate_args={}, verbose=False)[source]

Equilibrate a MCMC at a specified target temperature by performing simulated annealing.

Parameters
stateAny state class (e.g. BlockState)

Initial state. This state will be modified during the algorithm.

beta_rangetuple of two floats (optional, default: (1., 10.))

Inverse temperature range.

niterint (optional, default: 100)

Number of steps (in logspace) from the starting temperature to the final one.

historybool (optional, default: False)

If True, a list of tuples of the form (nattempts, nmoves, beta, entropy)

mcmc_equilibrate_argsdict (optional, default: {})

Arguments to be passed to mcmc_equilibrate().

verbosebool or tuple (optional, default: False)

If True, progress information will be shown. Optionally, this accepts arguments of the type tuple of the form (level, prefix) where level is a positive integer that specifies the level of detail, and prefix is a string that is prepended to the all output messages.

Returns
historylist of tuples of the form (nattempts, nmoves, beta, entropy)

Summary of the MCMC run. This is returned only if history == True.

entropyfloat

Current entropy value after run. This is returned only if history == False.

nattemptsint

Number of node move attempts.

nmovesint

Number of node moves.

Notes

This algorithm employs exponential cooling, where the value of beta is multiplied by a constant at each iteration, so that starting from beta_range[0] the value of beta_range[1] is reached after niter iterations.

At each iteration, the function mcmc_equilibrate() is called with the current value of beta (via the mcmc_args parameter).

References

peixoto-efficient-2014

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378

graph_tool.inference.mcmc.mcmc_multilevel(state, B, r=2, b_cache=None, anneal=False, mcmc_equilibrate_args={}, anneal_args={}, shrink_args={}, verbose=False)[source]

Equilibrate a MCMC from a starting state with a higher order, by performing successive agglomerative initializations and equilibrations until the desired order is reached, such that metastable states are avoided.

Parameters
stateAny state class (e.g. BlockState)

Initial state. This state will not be modified during the algorithm.

Bint

Desired model order (i.e. number of groups).

rint (optional, default: 2)

Greediness of agglomeration. At each iteration, the state order will be reduced by a factor r.

b_cachedict (optional, default: None)

If specified, this should be a dictionary with key-value pairs of the form (B, state) that contain pre-computed states of the specified order. This dictionary will be modified during the algorithm.

annealbool (optional, default: False)

If True, the equilibration steps will use simulated annealing, by calling mcmc_anneal(), instead of mcmc_equilibrate().

mcmc_equilibrate_argsdict (optional, default: {})

Arguments to be passed to mcmc_equilibrate().

mcmc_anneal_argsdict (optional, default: {})

Arguments to be passed to mcmc_anneal().

shrink_argsdict (optional, default: {})

Arguments to be passed to state.shrink (e.g. graph_tool.inference.blockmodel.BlockState.shrink()).

verbosebool or tuple (optional, default: False)

If True, progress information will be shown. Optionally, this accepts arguments of the type tuple of the form (level, prefix) where level is a positive integer that specifies the level of detail, and prefix is a string that is prepended to the all output messages.

Returns
stateThe same type as parameter state

This is the final state after the MCMC run.

Notes

This algorithm alternates between equilibrating the MCMC state and reducing the state order (via calls to state.shrink, e.g. graph_tool.inference.blockmodel.BlockState.shrink()).

This greatly reduces the changes of getting trapped in metastable states if the starting point if far away from equilibrium, as discussed in [peixoto-efficient-2014].

References

peixoto-efficient-2014

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378

class graph_tool.inference.mcmc.MulticanonicalState(state, S_min, S_max, nbins=1000)[source]

Bases: object

The density of states of a multicanonical Monte Carlo algorithm. It is used by graph_tool.inference.mcmc.multicanonical_equilibrate().

Parameters
stateBlockState or OverlapBlockState or NestedBlockState

Block state to be used.

S_minfloat

Minimum energy.

S_maxfloat

Maximum energy.

nbinsint (optional, default: 1000)

Number of bins.

get_energies()[source]

Get energy bounds.

get_allowed_energies()[source]

Get allowed energy bounds.

get_range()[source]

Get energy range.

get_density(B=None)[source]

Get density of states, normalized so that total sum is \(B^N\), where \(B\) is the number of groups, and \(N\) is the number of nodes. If not supplied \(B=N\) is assumed.

get_hist()[source]

Get energy histogram.

get_perm_hist()[source]

Get permanent energy histogram.

get_flatness(h=None, allow_gaps=True)[source]

Get energy histogram flatness.

get_posterior(N=None)[source]

Get posterior probability.

reset_hist()[source]

Reset energy histogram.

graph_tool.inference.mcmc.multicanonical_equilibrate(m_state, f_range=1.0, 1e-06, r=2, flatness=0.95, allow_gaps=True, callback=None, multicanonical_args={}, verbose=False)[source]

Equilibrate a multicanonical Monte Carlo sampling using the Wang-Landau algorithm.

Parameters
m_stateMulticanonicalState

Initial multicanonical state, where the state density will be stored.

f_rangetuple of two floats (optional, default: (1., 1e-6))

Range of density updates.

rfloat (optional, default: 2.)

Greediness of convergence. At each iteration, the density updates will be reduced by a factor r.

flatnessfloat (optional, default: .95)

Sufficient histogram flatness threshold used to continue the algorithm.

allow_gapsbool (optional, default: True)

If True, gaps in the histogram (regions with zero count) will be ignored when computing the flatness.

callbackfunction (optional, default: None)

If given, this function will be called after each iteration. The function must accept the current state and m_state as arguments.

multicanonical_argsdict (optional, default: {})

Arguments to be passed to state.multicanonical_sweep (e.g. graph_tool.inference.blockmodel.BlockState.multicanonical_sweep()).

verbosebool or tuple (optional, default: False)

If True, progress information will be shown. Optionally, this accepts arguments of the type tuple of the form (level, prefix) where level is a positive integer that specifies the level of detail, and prefix is a string that is prepended to the all output messages.

Returns
niterint

Number of iterations required for convergence.

References

wang-efficient-2001

Fugao Wang, D. P. Landau, “An efficient, multiple range random walk algorithm to calculate the density of states”, Phys. Rev. Lett. 86, 2050 (2001), DOI: 10.1103/PhysRevLett.86.2050 [sci-hub, @tor], arXiv: cond-mat/0011174

belardinelli-wang-2007

R. E. Belardinelli, V. D. Pereyra, “Wang-Landau algorithm: A theoretical analysis of the saturation of the error”, J. Chem. Phys. 127, 184105 (2007), DOI: 10.1063/1.2803061 [sci-hub, @tor], arXiv: cond-mat/0702414

class graph_tool.inference.mcmc.TemperingState(states, betas, idx=None, beta_dl=False)[source]

Bases: object

This class aggregates several state classes and corresponding inverse-temperature values to implement parallel tempering MCMC.

This is meant to be used with mcmc_equilibrate().

Parameters
stateslist of state objects (e.g. BlockState)

Initial parallel states.

betaslist of floats

Inverse temperature values.

entropy(**kwargs)[source]

Returns the sum of the entropy of the parallel states. All keyword arguments are propagated to the individual states’ entropy() method.

entropies(**kwargs)[source]

Returns the entropies of the parallel states. All keyword arguments are propagated to the individual states’ entropy() method.

states_swap(**kwargs)[source]

Perform a full sweep of the parallel states, where swaps are attempted. All relevant keyword arguments are propagated to the individual states’ entropy() method.

states_move(sweep_algo, **kwargs)[source]

Perform a full sweep of the parallel states, where state moves are attempted by calling sweep_algo(state, beta=beta, **kwargs).

mcmc_sweep(**kwargs)[source]

Perform a full mcmc sweep of the parallel states, where swap or moves are chosen randomly. It accepts an keyword argument r (default: 0.1) specifying the relative probability with which state swaps are performed with respect to node moves. All remaining keyword arguments are propagated to the individual states’ mcmc_sweep() method.

multiflip_mcmc_sweep(**kwargs)[source]

Perform a full mcmc sweep of the parallel states, where swap or moves are chosen randomly. It accepts an keyword argument r (default: 0.1) specifying the relative probability with which state swaps are performed with respect to node moves. All remaining keyword arguments are propagated to the individual states’ mcmc_sweep() method.

gibbs_sweep(**kwargs)[source]

Perform a full Gibbs mcmc sweep of the parallel states, where swap or moves are chosen randomly. It accepts an keyword argument r (default: 0.1) specifying the relative probability with which state swaps are performed with respect to node moves. All remaining keyword arguments are propagated to the individual states’ gibbs_sweep() method.

graph_tool.inference.bisection.bisection_minimize(init_states, random_bisection=False, mcmc_multilevel_args={}, verbose=False)[source]

Find the best order (number of groups) given an initial set of states by performing a one-dimension minimization, using a Fibonacci (or golden section) search.

Parameters
init_statesAny state class (e.g. BlockState)

List with two or more states that will be used to bracket the search.

random_bisectionbool (optional, default: False)

If True, the bisection will be done randomly in the interval, instead of using the golden rule.

mcmc_multilevel_argsdict (optional, default: {})

Arguments to be passed to mcmc_multilevel().

verbosebool or tuple (optional, default: False)

If True, progress information will be shown. Optionally, this accepts arguments of the type tuple of the form (level, prefix) where level is a positive integer that specifies the level of detail, and prefix is a string that is prepended to the all output messages.

Returns
min_stateAny state class (e.g. BlockState)

State with minimal entropy in the interval.

Notes

This function calls mcmc_multilevel() to reduce the order of a given state, and uses the value of state.entropy(**args) for the minimization, with args obtained from mcmc_multilevel_args.

References

“Golden section search”, https://en.wikipedia.org/wiki/Golden_section_search

peixoto-efficient-2014

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378

graph_tool.inference.minimize.minimize_blockmodel_dl(g, B_min=None, B_max=None, b_min=None, b_max=None, deg_corr=True, overlap=False, nonoverlap_init=True, layers=False, state_args={}, bisection_args={}, mcmc_args={}, anneal_args={}, mcmc_equilibrate_args={}, shrink_args={}, mcmc_multilevel_args={}, verbose=False)[source]

Fit the stochastic block model, by minimizing its description length using an agglomerative heuristic.

Parameters
gGraph

The graph.

B_minint (optional, default: None)

The minimum number of blocks.

B_maxint (optional, default: None)

The maximum number of blocks.

b_minVertexPropertyMap (optional, default: None)

The partition to be used with the minimum number of blocks.

b_maxVertexPropertyMap (optional, default: None)

The partition to be used with the maximum number of blocks.

deg_corrbool (optional, default: True)

If True, the degree-corrected version of the model will be used.

overlapbool (optional, default: False)

If True, the overlapping version of the model will be used.

nonoverlap_initbool (optional, default: True)

If True, and overlap == True a non-overlapping initial state will be used.

layersbool (optional, default: False)

If True, the layered version of the model will be used.

state_argsdict (optional, default: {})

Arguments to be passed to appropriate state constructor (e.g. BlockState, OverlapBlockState or LayeredBlockState)

bisection_argsdict (optional, default: {})

Arguments to be passed to bisection_minimize().

mcmc_argsdict (optional, default: {})

Arguments to be passed to graph_tool.inference.blockmodel.BlockState.mcmc_sweep(), graph_tool.inference.overlap_blockmodel.OverlapBlockState.mcmc_sweep() or graph_tool.inference.layered_blockmodel.LayeredBlockState.mcmc_sweep().

mcmc_equilibrate_argsdict (optional, default: {})

Arguments to be passed to mcmc_equilibrate().

shrink_argsdict (optional, default: {})

Arguments to be passed to graph_tool.inference.blockmodel.BlockState.shrink(), graph_tool.inference.overlap_blockmodel.OverlapBlockState.shrink() or graph_tool.inference.layered_blockmodel.LayeredBlockState.shrink().

mcmc_multilevel_argsdict (optional, default: {})

Arguments to be passed to mcmc_multilevel().

verbosebool or tuple (optional, default: False)

If True, progress information will be shown. Optionally, this accepts arguments of the type tuple of the form (level, prefix) where level is a positive integer that specifies the level of detail, and prefix is a string that is prepended to the all output messages.

Returns
min_stateBlockState or OverlapBlockState or LayeredBlockState

State with minimal description length.

Notes

This function is a convenience wrapper around bisection_minimize().

See [peixoto-efficient-2014] for details on the algorithm.

This algorithm has a complexity of \(O(V \ln^2 V)\), where \(V\) is the number of nodes in the network.

References

peixoto-efficient-2014

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378.

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.minimize_blockmodel_dl(g)
>>> state.draw(pos=g.vp["pos"], vertex_shape=state.get_blocks(),
...            output="polbooks_blocks_mdl.svg")
<...>
_images/polbooks_blocks_mdl.svg

Block partition of a political books network, which minimizes the description length of the network according to the degree-corrected stochastic blockmodel.

>>> g = gt.collection.data["polbooks"]
>>> state = gt.minimize_blockmodel_dl(g, overlap=True)
>>> state.draw(pos=g.vp["pos"], output="polbooks_overlap_blocks_mdl.svg")
<...>
_images/polbooks_overlap_blocks_mdl.svg

Overlapping partition of a political books network, which minimizes the description length of the network according to the overlapping degree-corrected stochastic blockmodel.

graph_tool.inference.minimize.minimize_nested_blockmodel_dl(g, B_min=None, B_max=None, b_min=None, b_max=None, Bs=None, bs=None, deg_corr=True, overlap=False, nonoverlap_init=True, layers=False, hierarchy_minimize_args={}, state_args={}, bisection_args={}, mcmc_args={}, anneal_args={}, mcmc_equilibrate_args={}, shrink_args={}, mcmc_multilevel_args={}, verbose=False)[source]

Fit the nested stochastic block model, by minimizing its description length using an agglomerative heuristic.

Parameters
gGraph

The graph.

B_minint (optional, default: None)

The minimum number of blocks.

B_maxint (optional, default: None)

The maximum number of blocks.

b_minVertexPropertyMap (optional, default: None)

The partition to be used with the minimum number of blocks.

b_maxVertexPropertyMap (optional, default: None)

The partition to be used with the maximum number of blocks.

Bslist of ints (optional, default: None)

If provided, it will correspond to the sizes of the initial hierarchy.

bslist of integer-valued numpy.ndarray objects (optional, default: None)

If provided, it will correspond to the initial hierarchical partition.

deg_corrbool (optional, default: True)

If True, the degree-corrected version of the model will be used.

overlapbool (optional, default: False)

If True, the overlapping version of the model will be used.

nonoverlap_initbool (optional, default: True)

If True, and overlap == True a non-overlapping initial state will be used.

layersbool (optional, default: False)

If True, the layered version of the model will be used.

hierarchy_minimize_argsdict (optional, default: {})

Arguments to be passed to hierarchy_minimize().

state_argsdict (optional, default: {})

Arguments to be passed to appropriate state constructor (e.g. BlockState, OverlapBlockState or LayeredBlockState)

bisection_argsdict (optional, default: {})

Arguments to be passed to bisection_minimize().

mcmc_argsdict (optional, default: {})

Arguments to be passed to graph_tool.inference.blockmodel.BlockState.mcmc_sweep(), graph_tool.inference.overlap_blockmodel.OverlapBlockState.mcmc_sweep() or graph_tool.inference.layered_blockmodel.LayeredBlockState.mcmc_sweep().

mcmc_equilibrate_argsdict (optional, default: {})

Arguments to be passed to mcmc_equilibrate().

shrink_argsdict (optional, default: {})

Arguments to be passed to graph_tool.inference.blockmodel.BlockState.shrink(), graph_tool.inference.overlap_blockmodel.OverlapBlockState.shrink() or graph_tool.inference.layered_blockmodel.LayeredBlockState.shrink().

mcmc_multilevel_argsdict (optional, default: {})

Arguments to be passed to mcmc_multilevel().

verbosebool or tuple (optional, default: False)

If True, progress information will be shown. Optionally, this accepts arguments of the type tuple of the form (level, prefix) where level is a positive integer that specifies the level of detail, and prefix is a string that is prepended to the all output messages.

Returns
min_stateNestedBlockState

Nested state with minimal description length.

Notes

This function is a convenience wrapper around hierarchy_minimize().

See [peixoto-hierarchical-2014] for details on the algorithm.

This algorithm has a complexity of \(O(V \ln^2 V)\), where \(V\) is the number of nodes in the network.

References

peixoto-hierarchical-2014

Tiago P. Peixoto, “Hierarchical block structures and high-resolution model selection in large networks “, Phys. Rev. X 4, 011047 (2014), DOI: 10.1103/PhysRevX.4.011047 [sci-hub, @tor], arXiv: 1310.4377.

Examples

>>> g = gt.collection.data["power"]
>>> state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)
>>> state.draw(output="power_nested_mdl.pdf")
(...)
_images/power_nested_mdl.png

Hierarchical Block partition of a power-grid network, which minimizes the description length of the network according to the nested (degree-corrected) stochastic blockmodel.

>>> g = gt.collection.data["celegansneural"]
>>> state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True, overlap=True)
>>> state.draw(output="celegans_nested_mdl_overlap.pdf")
(...)
_images/celegans_nested_mdl_overlap.png

Overlapping block partition of the C. elegans neural network, which minimizes the description length of the network according to the nested overlapping degree-corrected stochastic blockmodel.

class graph_tool.inference.partition_modes.PartitionModeState(bs, relabel=True, nested=False, converge=False, **kwargs)[source]

Bases: object

The random label model state for a set of labelled partitions, which attempts to align them with a common group labelling.

Parameters
bslist of iterables

List of partitions to be aligned. If nested=True, these should be hierarchical partitions, composed each as a list of partitions.

relabelbool (optional, default: True)

If True, an initial alignment of the partitions will be attempted during instantiation, otherwise they will be incorporated as they are.

nestedbool (optional, default: False)

If True, the partitions will be assumed to be hierarchical.

convergebool (optional, default: False)

If True, the label alignment will be iterated until convergence upon initialization (otherwise replace_partitions() needs to be called repeatedly).

References

peixoto-revealing-2020

Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, arXiv: 2005.13977

copy(bs=None)[source]

Copies the state. The parameters override the state properties, and have the same meaning as in the constructor.

add_partition(b, relabel=True)[source]

Adds partition b to the ensemble, after relabelling it if relabel=True, and returns its index in the population.

remove_partition(i)[source]

Removes partition with index i from the ensemble.

virtual_add_partition(b, relabel=True)[source]

Computes the entropy difference (negative log probability) if partition b were inserted the ensemble, after relabelling it if relabel=True.

virtual_remove_partition(b, relabel=True)[source]

Computes the entropy difference (negative log probability) if partition b were removed from the ensemble, after relabelling it if relabel=True.

replace_partitions()[source]

Removes and re-adds every partition, after relabelling, an returns the entropy difference (negative log probability).

relabel_partition(b)[source]

Returns a relabelled copy of partition b, according to its alignment with the ensemble.

align_mode(mode)[source]

Relabel entire ensemble to align with another ensemble given by mode, which should be an instance of PartitionModeState.

get_partition(i)[source]

Returns partition with index i.

get_nested_partition(i)[source]

Returns nested partition with index i.

get_partitions()[source]

Returns all partitions.

get_nested_partitions()[source]

Returns all nested partitions.

relabel()[source]

Re-order group labels according to group sizes.

entropy()[source]

Return the model entropy (negative log-likelihood).

posterior_entropy(MLE=True)[source]

Return the entropy of the random label model, using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

posterior_cdev(MLE=True)[source]

Return the uncertainty of the mode in the range \([0,1]\), using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

posterior_lprob(b, MLE=True)[source]

Return the log-probability of partition b, using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

get_coupled_state()[source]

Return the instance of PartitionModeState representing the model at the upper hierarchical level.

get_marginal(g)[source]

Return a VertexPropertyMap for Graph g, with vector<int> values containing the marginal group membership counts for each node.

get_max(g)[source]

Return a VertexPropertyMap for Graph g, with int values containing the maximum marginal group membership for each node.

get_max_nested()[source]

Return a hierarchical partition as a list of numpy.ndarray objects, containing the maximum marginal group membership for each node in every level.

get_B()[source]

Return the total number of labels used.

get_M()[source]

Return the number of partitions

sample_partition(MLE=True)[source]

Sampled a partition from the inferred model, using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

sample_nested_partition(MLE=True, fix_empty=True)[source]

Sampled a nested partition from the inferred model, using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

class graph_tool.inference.partition_modes.ModeClusterState(bs, b=None, B=1, nested=False, relabel=True)[source]

Bases: object

The mixed random label model state for a set of labelled partitions, which attempts to align them inside clusters with a common group labelling.

Parameters
bslist of iterables

List of partitions to be aligned. If nested=True, these should be hierarchical partitions, composed each as a list of partitions.

biterable (optional, default: None)

Initial cluster membership for every partition. If None a random division into B groups will be used.

Bint (optional, default: 1)

Number of groups for initial division.

relabelbool (optional, default: True)

If True, an initial alignment of the partitions will be attempted during instantiation, otherwise they will be incorporated as they are.

nestedbool (optional, default: False)

If True, the partitions will be assumed to be hierarchical.

References

peixoto-revealing-2020

Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, arXiv: 2005.13977

copy(bs=None, b=None)[source]

Copies the state. The parameters override the state properties, and have the same meaning as in the constructor.

get_mode(r)[source]

Return the mode in cluster r as an instance of PartitionModeState.

get_modes(sort=True)[source]

Return the list of nonempty modes, as instances of PartitionModeState. If sorted == True, the modes are retured in decreasing order with respect to their size.

get_wr()[source]

Return cluster sizes.

get_B()[source]

Returns the total number of clusters.

get_Be()[source]

Returns the effective number of clusters, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of partitions in cluster \(r\).

virtual_add_partition(b, r, relabel=True)[source]

Computes the entropy difference (negative log probability) if partition b were inserted in cluster r, after relabelling it if relabel=True.

add_partition(b, r, relabel=True)[source]

Add partition b in cluster r, after relabelling it if relabel=True.

classify_partition(b, relabel=True, new_group=True, sample=False)[source]

Returns the cluster r to which partition b would belong, after relabelling it if relabel=True, according to the most probable assignment, or randomly sampled according to the relative probabilities if sample==True. If new_group==True, a new previously unoccupied group is also considered for the classification.

relabel(epsilon=1e-06, maxiter=100)[source]

Attempt to align group labels between clusters via a greedy algorithm. The algorithm stops after maxiter iterations or when the entropy improvement lies below epsilon.

entropy()[source]

Return the model entropy (negative log-likelihood).

posterior_entropy(MLE=True)[source]

Return the entropy of the random label model, using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

posterior_lprob(r, b, MLE=True)[source]

Return the log-probability of partition b belonging to mode r, using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

replace_partitions()[source]

For every cluster, removes and re-adds every partition, after relabelling, and returns the entropy difference (negative log probability).

sample_partition(MLE=True)[source]

Sampled a cluster label and partition from the inferred model, using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

sample_nested_partition(MLE=True, fix_empty=True)[source]

Sampled a cluster label and nested partition from the inferred model, using maximum likelihood estimates for the marginal node probabilities if `MLE=True`, otherwise using posterior mean estimates.

mcmc_sweep(beta=inf, d=0.01, niter=1, allow_vacate=True, sequential=True, deterministic=False, verbose=False, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings rejection sampling MCMC to sample network partitions. See graph_tool.inference.blockmodel.BlockState.mcmc_sweep() for the parameter documentation.

multiflip_mcmc_sweep(beta=inf, psingle=None, psplit=1, pmerge=1, pmergesplit=1, d=0.01, gibbs_sweeps=10, niter=1, accept_stats=None, verbose=False, **kwargs)[source]

Perform sweeps of a merge-split Metropolis-Hastings rejection sampling MCMC to sample network partitions. See graph_tool.inference.blockmodel.BlockState.mcmc_sweep() for the parameter documentation.

graph_tool.inference.partition_modes.partition_overlap(x, y, norm=True)[source]

Returns the maximum overlap between partitions, according to an optimal label alignment.

Parameters
xiterable of int values

First partition.

yiterable of int values

Second partition.

norm(optional, default: True)

If True, the result will be normalized in the range \([0,1]\).

Returns
wfloat or int

Maximum overlap value.

Notes

The maximum overlap between partitions \(\boldsymbol x\) and \(\boldsymbol y\) is defined as

\[\omega(\boldsymbol x,\boldsymbol y) = \underset{\boldsymbol\mu}{\max}\sum_i\delta_{x_i,\mu(y_i)},\]

where \(\boldsymbol\mu\) is a bijective mapping between group labels. It corresponds to solving an instance of the maximum weighted bipartite matching problem, which is done with the Kuhn-Munkres algorithm [kuhn_hungarian_1955] [munkres_algorithms_1957].

If norm == True, the normalized value is returned:

\[\frac{\omega(\boldsymbol x,\boldsymbol y)}{N}\]

which lies in the unit interval \([0,1]\).

This algorithm runs in time \(O[N + (B_x+B_y)E_m]\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\), \(B_x\) and \(B_y\) are the number of labels in partitions \(\boldsymbol x\) and \(\boldsymbol y\), respectively, and \(E_m \le B_xB_y\) is the number of nonzero entries in the contingency table between both partitions.

References

peixoto-revealing-2020

Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, arXiv: 2005.13977

kuhn_hungarian_1955

H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly 2, 83–97 (1955) DOI: 10.1002/nav.3800020109 [sci-hub, @tor]

munkres_algorithms_1957

James Munkres, “Algorithms for the Assignment and Transportation Problems,” Journal of the Society for Industrial and Applied Mathematics 5, 32–38 (1957). DOI: 10.1137/0105003 [sci-hub, @tor]

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> gt.partition_overlap(x, y)
0.143
graph_tool.inference.partition_modes.nested_partition_overlap(x, y, norm=True)[source]

Returns the hierarchical maximum overlap between nested partitions, according to an optimal recursive label alignment.

Parameters
xiterable of iterables of int values

First partition.

yiterable of iterables of int values

Second partition.

norm(optional, default: True)

If True, the result will be normalized in the range \([0,1]\).

Returns
wfloat or int

Maximum hierarchical overlap value.

Notes

The maximum overlap between partitions \(\bar{\boldsymbol x}\) and \(\bar{\boldsymbol y}\) is defined as

\[\omega(\bar{\boldsymbol x},\bar{\boldsymbol y}) = \sum_l\underset{\boldsymbol\mu_l}{\max}\sum_i\delta_{x_i^l,\mu_l(\tilde y_i^l)},\]

where \(\boldsymbol\mu_l\) is a bijective mapping between group labels at level \(l\), and \(\tilde y_i^l = y^i_{\mu_{l-1}(i)}\) are the nodes reordered according to the lower level. It corresponds to solving an instance of the maximum weighted bipartite matching problem for every hierarchical level, which is done with the Kuhn-Munkres algorithm [kuhn_hungarian_1955] [munkres_algorithms_1957].

If norm == True, the normalized value is returned:

\[1 - \frac{\left(\sum_lN_l\right) - \omega(\bar{\boldsymbol x}, \bar{\boldsymbol y})}{\sum_l\left(N_l - 1\right)}\]

which lies in the unit interval \([0,1]\), where \(N_l=\max(N_{{\boldsymbol x}^l}, N_{{\boldsymbol y}^l})\) is the number of nodes in level l.

This algorithm runs in time \(O[\sum_l N_l + (B_x^l+B_y^l)E_m^l]\) where \(B_x^l\) and \(B_y^l\) are the number of labels in partitions \(\bar{\boldsymbol x}\) and \(\bar{\boldsymbol y}\) at level \(l\), respectively, and \(E_m^l \le B_x^lB_y^l\) is the number of nonzero entries in the contingency table between both partitions.

References

peixoto-revealing-2020

Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, arXiv: 2005.13977

kuhn_hungarian_1955

H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly 2, 83–97 (1955) DOI: 10.1002/nav.3800020109 [sci-hub, @tor]

munkres_algorithms_1957

James Munkres, “Algorithms for the Assignment and Transportation Problems,” Journal of the Society for Industrial and Applied Mathematics 5, 32–38 (1957). DOI: 10.1137/0105003 [sci-hub, @tor]

Examples

>>> x = [np.random.randint(0, 100, 1000), np.random.randint(0, 10, 100), np.random.randint(0, 3, 10)]
>>> y = [np.random.randint(0, 100, 1000), np.random.randint(0, 10, 100), np.random.randint(0, 3, 10)]
>>> gt.nested_partition_overlap(x, y)
0.150858...
graph_tool.inference.partition_modes.contingency_graph(x, y)[source]

Returns the contingency graph between both partitions.

Parameters
xiterable of int values

First partition.

yiterable of int values

Second partition.

Returns
gGraph

Contingency graph, containing an internal edge property map mrs with the weights, an internal vertex property map label with the label values, and an internal boolean vertex property map partition indicating the partition membership.

Notes

The contingency graph is a bipartite graph with the labels of \(\boldsymbol x\) and \(\boldsymbol y\) as vertices, and edge weights given by

\[m_{rs} = \sum_i\delta_{x_i,r}\delta_{y_i,s}.\]

This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> g = gt.contingency_graph(x, y)
>>> g.ep.mrs.a
PropertyArray([ 8,  6,  8, 15, 15, 14, 11, 13,  8,  9, 16,  6,  5, 11,  8,
               15,  6,  8,  9, 12, 11,  8, 13,  6, 10, 14, 12, 14, 15, 18,
               13, 15, 10, 12, 13,  6, 12, 13, 15,  9, 11, 11,  5,  7, 11,
                6,  8, 15, 15, 14,  8,  8,  7, 13, 11, 11,  8, 11,  9, 11,
                9, 16, 13, 12,  8, 16,  6, 10, 15, 14,  4,  4,  7, 12, 11,
                8,  6, 16, 11, 13,  3,  5, 13,  9, 11,  4,  4, 12,  7,  5,
                7, 10,  6,  8,  6,  7, 10,  7, 11,  2], dtype=int32)
graph_tool.inference.partition_modes.shuffle_partition_labels(x)[source]

Returns a copy of partition x, with the group labels randomly shuffled.

Parameters
xiterable of int values

Partition.

Returns
ynumpy.ndarray

Partition with shuffled labels.

Examples

>>> x = [0, 0, 0, 1, 1, 1, 2, 2, 2]
>>> gt.shuffle_partition_labels(x)
array([0, 0, 0, 2, 2, 2, 1, 1, 1], dtype=int32)
graph_tool.inference.partition_modes.shuffle_nested_partition_labels(x)[source]

Returns a copy of nested partition x, with the group labels randomly shuffled.

Parameters
xiterable iterable of int values

Partition.

Returns
ylist of numpy.ndarray

Nested partition with shuffled labels.

Examples

>>> x = [[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 0, 1], [1, 0]]
>>> gt.shuffle_nested_partition_labels(x)
[array([1, 1, 1, 0, 0, 0, 2, 2, 2], dtype=int32), array([0, 0, 1], dtype=int32), array([0, 1], dtype=int32)]
graph_tool.inference.partition_modes.order_partition_labels(x)[source]

Returns a copy of partition x, with the group labels ordered decreasingly according to group size.

Parameters
xiterable of int values

Partition.

Returns
ynumpy.ndarray

Partition with ordered labels.

Examples

>>> x = [0, 2, 2, 1, 1, 1, 2, 2, 2]
>>> gt.order_partition_labels(x)
array([2, 0, 0, 1, 1, 1, 0, 0, 0], dtype=int32)
graph_tool.inference.partition_modes.order_nested_partition_labels(x)[source]

Returns a copy of nested partition x, with the group labels ordered decreasingly according to group size at each level.

Parameters
xiterable of iterables of int values

Partition.

Returns
ylist of numpy.ndarray

Nested partition with ordered labels.

Examples

>>> x = [[0, 2, 2, 1, 1, 1, 2, 2, 2], [1, 1, 0], [1, 1]]
>>> gt.order_nested_partition_labels(x)
[array([2, 0, 0, 1, 1, 1, 0, 0, 0], dtype=int32), array([1, 0, 0], dtype=int32), array([0, 0], dtype=int32)]
graph_tool.inference.partition_modes.align_partition_labels(x, y)[source]

Returns a copy of partition x, with the group labels aligned as to maximize the overlap with y.

Parameters
xiterable of int values

Partition.

Returns
ynumpy.ndarray

Partition with aligned labels.

Notes

This algorithm runs in time \(O[N + (B_x+B_y)E_m]\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\), \(B_x\) and \(B_y\) are the number of labels in partitions \(\boldsymbol x\) and \(\boldsymbol y\), respectively, and \(E_m \le B_xB_y\) is the number of nonzero entries in the contingency table between both partitions.

References

peixoto-revealing-2020

Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, arXiv: 2005.13977

Examples

>>> x = [0, 2, 2, 1, 1, 1, 2, 3, 2]
>>> y = gt.shuffle_partition_labels(x)
>>> print(y)
[3 0 0 1 1 1 0 2 0]
>>> gt.align_partition_labels(y, x)
array([0, 2, 2, 1, 1, 1, 2, 3, 2], dtype=int32)
graph_tool.inference.partition_modes.align_nested_partition_labels(x, y)[source]

Returns a copy of nested partition x, with the group labels aligned as to maximize the overlap with y.

Parameters
xiterable of iterables of int values

Partition.

Returns
ylist of numpy.ndarray

Nested partition with aligned labels.

Notes

This algorithm runs in time \(O[\sum_l N_l + (B_x^l+B_y^l)E_m^l]\) where \(B_x^l\) and \(B_y^l\) are the number of labels in partitions \(\bar{\boldsymbol x}\) and \(\bar{\boldsymbol y}\) at level \(l\), respectively, and \(E_m^l \le B_x^lB_y^l\) is the number of nonzero entries in the contingency table between both partitions.

Examples

>>> x = [[0, 2, 2, 1, 1, 1, 2, 3, 2], [1, 0, 1, 0], [0,0]]
>>> y = gt.shuffle_nested_partition_labels(x)
>>> print(y)
[array([1, 3, 3, 2, 2, 2, 3, 0, 3], dtype=int32), array([1, 0, 1, 0], dtype=int32), array([0, 0], dtype=int32)]
>>> gt.align_nested_partition_labels(y, x)
[array([0, 2, 2, 1, 1, 1, 2, 3, 2], dtype=int32), array([1, 0, 1, 0], dtype=int32), array([0, 0], dtype=int32)]
graph_tool.inference.partition_modes.partition_overlap_center(bs, init=None, relabel_bs=False)[source]

Find a partition with a maximal overlap to all items of the list of partitions given.

Parameters
bslist of iterables of int values

List of partitions.

inititerable of int values (optional, default: None)

If given, it will determine the initial partition.

relabel_bsbool (optional, default: False)

If True the given list of partitions will be updated with relabelled values.

Returns
cnumpy.ndarray

Partition containing the overlap consensus.

rfloat

Uncertainty in range \([0,1]\).

Notes

This algorithm obtains a partition \(\hat{\boldsymbol b}\) that has a maximal sum of overlaps with all partitions given in bs. It is obtained by performing the double maximization:

\[\begin{split}\begin{aligned} \hat b_i &= \underset{r}{\operatorname{argmax}}\;\sum_m \delta_{\mu_m(b^m_i), r}\\ \boldsymbol\mu_m &= \underset{\boldsymbol\mu}{\operatorname{argmax}} \sum_rm_{r,\mu(r)}^{(m)}, \end{aligned}\end{split}\]

where \(\boldsymbol\mu\) is a bijective mapping between group labels, and \(m_{rs}^{(m)}\) is the contingency table between \(\hat{\boldsymbol b}\) and \(\boldsymbol b ^{(m)}\). This algorithm simply iterates the above equations, until no further improvement is possible.

The uncertainty is given by:

\[r = 1 - \frac{1}{NM} \sum_i \sum_m \delta_{\mu_m(b^m_i), \hat b_i}\]

This algorithm runs in time \(O[M(N + B^3)]\) where \(M\) is the number of partitions, \(N\) is the length of the partitions and \(B\) is the number of labels used.

If enabled during compilation, this algorithm runs in parallel.

References

peixoto-revealing-2020

Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, arXiv: 2005.13977

Examples

>>> x = [5, 5, 2, 0, 1, 0, 1, 0, 0, 0, 0]
>>> bs = []
>>> for m in range(100):
...     y = np.array(x)
...     y[np.random.randint(len(y))] = np.random.randint(5)
...     bs.append(y)
>>> bs[:3]
[array([5, 5, 2, 0, 1, 2, 1, 0, 0, 0, 0]), array([1, 5, 2, 0, 1, 0, 1, 0, 0, 0, 0]), array([5, 5, 2, 0, 1, 0, 1, 0, 4, 0, 0])]
>>> c, r = gt.partition_overlap_center(bs)
>>> print(c, r)
[1 1 2 0 3 0 3 0 0 0 0] 0.07454545...
>>> gt.align_partition_labels(c, x)
array([5, 5, 2, 0, 1, 0, 1, 0, 0, 0, 0], dtype=int32)
graph_tool.inference.partition_modes.nested_partition_overlap_center(bs, init=None, return_bs=False)[source]

Find a nested partition with a maximal overlap to all items of the list of nested partitions given.

Parameters
bslist of list of iterables of int values

List of nested partitions.

inititerable of iterables of int values (optional, default: None)

If given, it will determine the initial nested partition.

return_bsbool (optional, default: False)

If True the an update list of nested partitions will be return with relabelled values.

Returns
cList of numpy.ndarray

Nested partition containing the overlap consensus.

rfloat

Uncertainty in range \([0,1]\).

bsList of lists of numpy.ndarray

List of relabelled nested partitions.

Notes

This algorithm obtains a nested partition \(\hat{\bar{\boldsymbol b}}\) that has a maximal sum of overlaps with all nested partitions given in bs. It is obtained by performing the double maximization:

\[\begin{split}\begin{aligned} \hat b_i^l &= \underset{r}{\operatorname{argmax}}\;\sum_m \delta_{\mu_m^l(b^{l,m}_i), r}\\ \boldsymbol\mu_m^l &= \underset{\boldsymbol\mu}{\operatorname{argmax}} \sum_rm_{r,\mu(r)}^{(l,m)}, \end{aligned}\end{split}\]

where \(\boldsymbol\mu\) is a bijective mapping between group labels, and \(m_{rs}^{(l,m)}\) is the contingency table between \(\hat{\boldsymbol b}_l\) and \(\boldsymbol b ^{(m)}_l\). This algorithm simply iterates the above equations, until no further improvement is possible.

The uncertainty is given by:

\[r = 1 - \frac{1}{N-L}\sum_l\frac{N_l-1}{N_l}\sum_i\frac{1}{M}\sum_m \delta_{\mu_m(b^{l,m}_i), \hat b_i^l}.\]

This algorithm runs in time \(O[M\sum_l(N_l + B_l^3)]\) where \(M\) is the number of partitions, \(N_l\) is the length of the partitions and \(B_l\) is the number of labels used, in level \(l\).

If enabled during compilation, this algorithm runs in parallel.

References

peixoto-revealing-2020

Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, arXiv: 2005.13977

Examples

>>> x = [[5, 5, 2, 0, 1, 0, 1, 0, 0, 0, 0], [0, 1, 0, 1, 1, 1]]
>>> bs = []
>>> for m in range(100):
...     y = [np.array(xl) for xl in x]
...     y[0][np.random.randint(len(y[0]))] = np.random.randint(5)
...     y[1][np.random.randint(len(y[1]))] = np.random.randint(2)
...     bs.append(y)
>>> bs[:3]
[[array([5, 5, 2, 0, 1, 0, 3, 0, 0, 0, 0]), array([0, 1, 1, 1, 1, 1])], [array([5, 5, 2, 0, 0, 0, 1, 0, 0, 0, 0]), array([0, 0, 0, 1, 1, 1])], [array([1, 5, 2, 0, 1, 0, 1, 0, 0, 0, 0]), array([0, 1, 0, 1, 1, 1])]]
>>> c, r = gt.nested_partition_overlap_center(bs)
>>> print(c, r)
[array([1, 1, 2, 0, 3, 0, 3, 0, 0, 0, 0], dtype=int32), array([0, 1, 0, 1, 1], dtype=int32)] 0.084492...
>>> gt.align_nested_partition_labels(c, x)
[array([5, 5, 2, 0, 1, 0, 1, 0, 0, 0, 0], dtype=int32), array([ 0,  1,  0, -1, -1,  1], dtype=int32)]
graph_tool.inference.partition_modes.nested_partition_clear_null(x)[source]

Returns a copy of nested partition x where the null values -1 are replaced with 0.

Parameters
xiterable of iterables of int values

Partition.

Returns
ylist of numpy.ndarray

Nested partition with null values removed.

Notes

This is useful to pass hierarchical partitions to NestedBlockState.

Examples

>>> x = [[5, 5, 2, 0, 1, 0, 1, 0, 0, 0, 0], [0, 1, 0, -1, -1, 1]]
>>> gt.nested_partition_clear_null(x)
[array([5, 5, 2, 0, 1, 0, 1, 0, 0, 0, 0], dtype=int32), array([0, 1, 0, 0, 0, 1], dtype=int32)]
class graph_tool.inference.partition_centroid.PartitionCentroidState(bs, b=None, RMI=False)[source]

Bases: object

Obtain the center of a set of partitions, according to the variation of information metric or reduced mutual information.

Parameters
bsiterable of iterable of int

List of partitions.

blist or numpy.ndarray (optional, default: None)

Initial partition. If not supplied, a partition into a single group will be used.

RMIbool (optional, default: False)

If True, the reduced mutual information will be used, otherwise the variation of information metric will be used instead.

copy(bs=None, b=None, RMI=None)[source]

Copies the state. The parameters override the state properties, and have the same meaning as in the constructor.

get_B()[source]

Returns the total number of blocks.

get_Be()[source]

Returns the effective number of blocks, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of nodes in group r.

mcmc_sweep(beta=1.0, d=0.01, niter=1, allow_vacate=True, sequential=True, deterministic=False, verbose=False, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings rejection sampling MCMC to sample network partitions. See graph_tool.inference.blockmodel.BlockState.mcmc_sweep() for the parameter documentation.

multiflip_mcmc_sweep(beta=1.0, psingle=None, psplit=1, pmerge=1, pmergesplit=1, d=0.01, gibbs_sweeps=10, niter=1, accept_stats=None, verbose=False, **kwargs)[source]

Perform sweeps of a merge-split Metropolis-Hastings rejection sampling MCMC to sample network partitions. See graph_tool.inference.blockmodel.BlockState.mcmc_sweep() for the parameter documentation.

graph_tool.inference.partition_centroid.variation_information(x, y, norm=False)[source]

Returns the variation of information between two partitions.

Parameters
xiterable of int values

First partition.

yiterable of int values

Second partition.

norm(optional, default: True)

If True, the result will be normalized in the range \([0,1]\).

Returns
VIfloat

Variation of information value.

Notes

The variation of information [meila_comparing_2003] is defined as

\[\text{VI}(\boldsymbol x,\boldsymbol y) = -\frac{1}{N}\sum_{rs}m_{rs}\left[\ln\frac{m_{rs}}{n_r} + \ln\frac{m_{rs}}{n_s'}\right],\]

with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions.

If norm == True, the normalized value is returned:

\[\frac{\text{VI}(\boldsymbol x,\boldsymbol y)}{\ln N}\]

which lies in the unit interval \([0,1]\).

This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).

References

meila_comparing_2003

Marina Meilă, “Comparing Clusterings by the Variation of Information,” in Learning Theory and Kernel Machines, Lecture Notes in Computer Science No. 2777, edited by Bernhard Schölkopf and Manfred K. Warmuth (Springer Berlin Heidelberg, 2003) pp. 173–187. DOI: 10.1007/978-3-540-45167-9_14 [sci-hub, @tor]

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> gt.variation_information(x, y)
4.5346824...
graph_tool.inference.partition_centroid.mutual_information(x, y, norm=False)[source]

Returns the mutual information between two partitions.

Parameters
xiterable of int values

First partition.

yiterable of int values

Second partition.

norm(optional, default: True)

If True, the result will be normalized in the range \([0,1]\).

Returns
MIfloat

Mutual information value

Notes

The mutual information is defined as

\[\text{MI}(\boldsymbol x,\boldsymbol y) = \frac{1}{N}\sum_{rs}m_{rs}\ln\frac{N m_{rs}}{n_rn'_s},\]

with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions.

If norm == True, the normalized value is returned:

\[2\frac{\text{MI}(\boldsymbol x,\boldsymbol y)}{H_x + H_y}\]

which lies in the unit interval \([0,1]\), and where \(H_x = -\frac{1}{N}\sum_rn_r\ln\frac{n_r}{N}\) and \(H_x = -\frac{1}{N}\sum_rn'_r\ln\frac{n'_r}{N}\).

This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> gt.mutual_information(x, y)
0.050321...
graph_tool.inference.partition_centroid.reduced_mutual_information(x, y, norm=False)[source]

Returns the reduced mutual information between two partitions.

Parameters
xiterable of int values

First partition.

yiterable of int values

Second partition.

norm(optional, default: True)

If True, the result will be normalized in the range \([0,1]\).

Returns
RMIfloat

Reduced mutual information value.

Notes

The reduced mutual information [newman_improved_2020] is defined as

\[\text{RMI}(\boldsymbol x,\boldsymbol y) = \frac{1}{N}\left[\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -\ln\Omega(\boldsymbol n, \boldsymbol n')\right],\]

with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions, and \(\Omega(\boldsymbol n, \boldsymbol n')\) is the total number of contingency tables with fixed row and column sums.

If norm == True, the normalized value is returned:

\[\frac{2\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -2\ln\Omega(\boldsymbol n, \boldsymbol n')} {\ln\frac{N!}{\prod_rn_r!} + \ln\frac{N!}{\prod_rn'_r!} -\ln\Omega(\boldsymbol n, \boldsymbol n) -\ln\Omega(\boldsymbol n', \boldsymbol n')}\]

which can take a maximum value of one.

This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\).

References

newman_improved_2020

M. E. J. Newman, G. T. Cantwell and J.-G. Young, “Improved mutual information measure for classification and community detection”, Phys. Rev. E, 101, 042304 (2020), DOI: 10.1103/PhysRevE.101.042304 [sci-hub, @tor], arXiv: 1907.12581

Examples

>>> x = np.random.randint(0, 10, 1000)
>>> y = np.random.randint(0, 10, 1000)
>>> gt.reduced_mutual_information(x, y)
-0.065562...
class graph_tool.inference.planted_partition.PPBlockState(g, b=None)[source]

Bases: object

Obtain the partition of a network according to the Bayesian planted partition model.

Parameters
gGraph

Graph to be modelled.

bPropertyMap (optional, default: None)

Initial partition. If not supplied, a partition into a single group will be used.

References

lizhi-statistical-2020

Lizhi Zhang, Tiago P. Peixoto, “Statistical inference of assortative community structures”, arXiv: 2006.14493

copy(g=None, b=None)[source]

Copies the state. The parameters override the state properties, and have the same meaning as in the constructor.

get_blocks()[source]

Returns the property map which contains the block labels for each vertex.

get_state()[source]

Alias to get_blocks().

get_B()[source]

Returns the total number of blocks.

get_nonempty_B()[source]

Alias to get_B().

get_Be()[source]

Returns the effective number of blocks, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of nodes in group r.

virtual_vertex_move(v, s, **kwargs)[source]

Computes the entropy difference if vertex v is moved to block s. The remaining parameters are the same as in entropy().

move_vertex(v, s)[source]

Move vertex v to block s.

entropy(uniform=False, degree_dl_kind='distributed', **kwargs)[source]

Return the model entropy (negative log-likelihood).

Parameters
uniformbool (optional, default: False)

If True, the uniform planted partition model is used, otherwise a non-uniform version is used.

degree_dl_kindstr (optional, default: "distributed")

This specifies the prior used for the degree sequence. It must be one of: "uniform" or "distributed" (default).

Notes

The “entropy” of the state is the negative log-likelihood of the microcanonical SBM, that includes the generated graph \(\boldsymbol{A}\) and the model parameters \(e_{\text{in}}\), \(e_{\text{out}}\), \(\boldsymbol{k}\) and \(\boldsymbol{b}\),

\[\begin{split}\Sigma &= - \ln P(\boldsymbol{A},e_{\text{in}},e_{\text{out}},\boldsymbol{k},\boldsymbol{b}) \\ &= - \ln P(\boldsymbol{A}|e_{\text{in}},e_{\text{out}},\boldsymbol{k},\boldsymbol{b}) - \ln P(e_{\text{in}},e_{\text{out}},\boldsymbol{k},\boldsymbol{b}).\end{split}\]

This value is also called the description length of the data, and it corresponds to the amount of information required to describe it (in nats).

For the uniform version of the model, the likelihood is

\[P(\boldsymbol{A}|\boldsymbol{k},\boldsymbol{b}) = \frac{e_{\text{in}}!e_{\text{out}}!} {\left(\frac{B}{2}\right)^{e_{\text{in}}}{B\choose 2}^{e_{\text{out}}}(E+1)^{1-\delta_{B,1}}\prod_re_r!}\times \frac{\prod_ik_i!}{\prod_{i<j}A_{ij}!\prod_i A_{ii}!!}.\]

where \(e_{\text{in}}\) and \(e_{\text{out}}\) are the number of edges inside and outside communities, respectively, and \(e_r\) is the sum of degrees in group \(r\).

For the non-uniform model we have instead:

\[P(\boldsymbol{A}|\boldsymbol{k},\boldsymbol{b}) = \frac{e_{\text{out}}!\prod_re_{rr}!!} {{B\choose 2}^{e_{\text{out}}}(E+1)^{1-\delta_{B,1}}\prod_re_r!}\times{B + e_{\text{in}} - 1 \choose e_{\text{in}}}^{-1}\times \frac{\prod_ik_i!}{\prod_{i<j}A_{ij}!\prod_i A_{ii}!!}.\]

Here there are two options for the prior on the degrees:

  1. degree_dl_kind == "uniform"

    \[P(\boldsymbol{k}|\boldsymbol{e},\boldsymbol{b}) = \prod_r\left(\!\!{n_r\choose e_r}\!\!\right)^{-1}.\]

    This corresponds to a noninformative prior, where the degrees are sampled from a uniform distribution.

  2. degree_dl_kind == "distributed" (default)

    \[P(\boldsymbol{k}|\boldsymbol{e},\boldsymbol{b}) = \prod_r\frac{\prod_k\eta_k^r!}{n_r!} \prod_r q(e_r, n_r)^{-1}\]

    with \(\eta_k^r\) being the number of nodes with degree \(k\) in group \(r\), and \(q(n,m)\) being the number of partitions of integer \(n\) into at most \(m\) parts.

    This corresponds to a prior for the degree sequence conditioned on the degree frequencies, which are themselves sampled from a uniform hyperprior. This option should be preferred in most cases.

For the partition prior \(P(\boldsymbol{b})\) please refer to model_entropy().

References

lizhi-statistical-2020

Lizhi Zhang, Tiago P. Peixoto, “Statistical inference of assortative community structures”, arXiv: 2006.14493

draw(**kwargs)[source]

Convenience wrapper to graph_draw() that draws the state of the graph as colors on the vertices and edges.

mcmc_sweep(beta=1.0, c=0.5, d=0.01, niter=1, entropy_args={}, allow_vacate=True, sequential=True, deterministic=False, verbose=False, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings rejection sampling MCMC to sample network partitions. See graph_tool.inference.blockmodel.BlockState.mcmc_sweep() for the parameter documentation.

multiflip_mcmc_sweep(beta=1.0, c=0.5, psingle=None, psplit=1, pmerge=1, pmergesplit=1, d=0.01, gibbs_sweeps=10, niter=1, entropy_args={}, accept_stats=None, verbose=False, **kwargs)[source]

Perform sweeps of a merge-split Metropolis-Hastings rejection sampling MCMC to sample network partitions. See graph_tool.inference.blockmodel.BlockState.mcmc_sweep() for the parameter documentation.

class graph_tool.inference.blockmodel_em.EMBlockState(g, B, init_state=None)[source]

Bases: object

The parametric, undirected stochastic block model state of a given graph.

Parameters
gGraph

Graph to be modelled.

Bint

Number of blocks (or vertex groups).

init_stateBlockState (optional, default: None)

Optional block state used for initialization.

Notes

This class is intended to be used with em_infer() to perform expectation maximization with belief propagation. See [decelle_asymptotic_2011] for more details.

References

decelle_asymptotic_2011

Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová, “Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications”, Phys. Rev. E 84, 066106 (2011), DOI: 10.1103/PhysRevE.84.066106 [sci-hub, @tor], arXiv: 1109.3041

get_vertex_marginals()[source]

Return the vertex marginals.

get_group_sizes()[source]

Return the group sizes.

get_matrix()[source]

Return probability matrix.

get_MAP()[source]

Return the maximum a posteriori (MAP) estimate of the node partition.

get_fe()[source]

Return the Bethe free energy.

get_ak()[source]

Return the model’s average degree.

e_iter(max_iter=1000, epsilon=0.001, verbose=False)[source]

Perform ‘expectation’ iterations, using belief propagation, where the vertex marginals and edge messages are updated, until convergence according to epsilon or the maximum number of iterations given by max_iter. If verbose == True, convergence information is displayed.

The last update delta is returned.

m_iter()[source]

Perform a single ‘maximization’ iteration, where the group sizes and connection probability matrix are updated.

The update delta is returned.

learn(epsilon=0.001)[source]

Perform ‘maximization’ iterations until convergence according to epsilon.

The last update delta is returned.

draw(**kwargs)[source]

Convenience wrapper to graph_draw() that draws the state of the graph as colors on the vertices and edges.

graph_tool.inference.blockmodel_em.em_infer(state, max_iter=1000, max_e_iter=1, epsilon=0.001, learn_first=False, verbose=False)[source]

Infer the model parameters and latent variables using the expectation-maximization (EM) algorithm with initial state given by state.

Parameters
statemodel state

State object, e.g. of type graph_tool.inference.blockmodel_em.EMBlockState.

max_iterint (optional, default: 1000)

Maximum number of iterations.

max_e_iterint (optional, default: 1)

Maximum number of ‘expectation’ iterations inside the main loop.

epsilonfloat (optional, default: 1e-3)

Convergence criterion.

learn_firstbool (optional, default: False)

If True, the maximization (a.k.a parameter learning) is converged before the main loop is run.

verbosebool (optional, default: True)

If True, convergence information is displayed.

Returns
deltafloat

The last update delta.

niterint

The total number of iterations.

References

wiki-EM

“Expectation–maximization algorithm”, https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.EMBlockState(g, B=3)
>>> delta, niter = gt.em_infer(state)
>>> state.draw(pos=g.vp["pos"], output="polbooks_EM_B3.svg")
<...>
_images/polbooks_EM_B3.svg

“Soft” block partition of a political books network with \(B=3\).

class graph_tool.inference.util.DictState(d)[source]

Bases: dict

Dictionary with (key,value) pairs accessible via attributes.

graph_tool.inference.util.dmask(d, ks)[source]

Copy dictionary d and remove key list ks.

graph_tool.inference.util.lbinom(n, k)[source]

Return log of binom(n, k).

graph_tool.inference.util.pmap(prop, value_map)[source]

Maps all the values of prop to the values given by value_map in-place according to: prop[i] = value_map[prop[i]].

graph_tool.inference.util.reverse_map(prop, value_map)[source]

Modify value_map such that the positions indexed by the values in prop correspond to their index in prop.

graph_tool.inference.util.continuous_map(prop)[source]

Remap the values of prop in the continuous range \([0, N-1]\).

graph_tool.inference.modularity.modularity(g, b, gamma=1.0, weight=None)[source]

Calculate Newman’s (generalized) modularity of a network partition.

Parameters
gGraph

Graph to be used.

bVertexPropertyMap

Vertex property map with the community partition.

gammafloat (optional, default: 1.)

Resolution parameter.

weightEdgePropertyMap (optional, default: None)

Edge property map with the optional edge weights.

Returns
Qfloat

Newman’s modularity.

Notes

Given a specific graph partition specified by prop, Newman’s modularity [newman-modularity-2006] is defined as:

\[Q = \frac{1}{2E} \sum_r e_{rr}- \frac{e_r^2}{2E}\]

where \(e_{rs}\) is the number of edges which fall between vertices in communities s and r, or twice that number if \(r = s\), and \(e_r = \sum_s e_{rs}\).

If weights are provided, the matrix \(e_{rs}\) corresponds to the sum of edge weights instead of number of edges, and the value of \(E\) becomes the total sum of edge weights.

References

newman-modularity-2006

M. E. J. Newman, “Modularity and community structure in networks”, Proc. Natl. Acad. Sci. USA 103, 8577-8582 (2006), DOI: 10.1073/pnas.0601602103 [sci-hub, @tor], arXiv: physics/0602124

Examples

>>> g = gt.collection.data["football"]
>>> gt.modularity(g, g.vp.value_tsevans)
0.5744393497...
class graph_tool.inference.modularity.ModularityState(g, b=None)[source]

Bases: object

Obtain the partition of a network according to Newman’s modularity.

Warning

Do not use this approach in the analysis of networks without understanding the consequences. This algorithm is included only for comparison purposes. In general, the inference-based approaches based on BlockState, NestedBlockState, and PPBlockState should be universally preferred.

Parameters
gGraph

Graph to be partitioned.

bPropertyMap (optional, default: None)

Initial partition. If not supplied, a partition into a single group will be used.

copy(g=None, b=None)[source]

Copies the state. The parameters override the state properties, and have the same meaning as in the constructor.

get_B()[source]

Returns the total number of blocks.

get_Be()[source]

Returns the effective number of blocks, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of nodes in group r.

entropy(gamma=1.0, **kwargs)[source]

Return the unnormalized negative generalized modularity.

Notes

The unnormalized negative generalized modularity is defined as

\[-\sum_{ij}\left(A_{ij}-\gamma \frac{k_ik_j}{2E}\right)\]

Where \(A_{ij}\) is the adjacency matrix, \(k_i\) is the degree of node \(i\), and \(E\) is the total number of edges.

modularity(gamma=1)[source]

Return the generalized modularity.

Notes

The generalized modularity is defined as

\[\frac{1}{2E}\sum_{ij}\left(A_{ij}-\gamma \frac{k_ik_j}{2E}\right)\]

Where \(A_{ij}\) is the adjacency matrix, \(k_i\) is the degree of node \(i\), and \(E\) is the total number of edges.

draw(**kwargs)[source]

Convenience wrapper to graph_draw() that draws the state of the graph as colors on the vertices and edges.

mcmc_sweep(beta=1.0, c=0.5, d=0.01, niter=1, entropy_args={}, allow_vacate=True, sequential=True, deterministic=False, verbose=False, **kwargs)[source]

Perform sweeps of a Metropolis-Hastings rejection sampling MCMC to sample network partitions. See graph_tool.inference.blockmodel.BlockState.mcmc_sweep() for the parameter documentation.

multiflip_mcmc_sweep(beta=1.0, c=0.5, psingle=None, psplit=1, pmerge=1, pmergesplit=1, d=0.01, gibbs_sweeps=10, niter=1, entropy_args={}, accept_stats=None, verbose=False, **kwargs)[source]

Perform sweeps of a merge-split Metropolis-Hastings rejection sampling MCMC to sample network partitions. See graph_tool.inference.blockmodel.BlockState.mcmc_sweep() for the parameter documentation.