BlockState#

class graph_tool.inference.BlockState(g, b=None, B=None, eweight=None, vweight=None, recs=[], rec_types=[], rec_params=[], clabel=None, pclabel=None, bfield=None, Bfield=None, deg_corr=True, dense_bg=False, entropy_args={}, **kwargs)[source]#

Bases: MCMCState, MultiflipMCMCState, MultilevelMCMCState, GibbsMCMCState, MulticanonicalMCMCState, ExhaustiveSweepState, DrawBlockState

The stochastic block model state of a given graph.

Parameters:
gGraph

Graph to be modelled.

bVertexPropertyMap (optional, default: None)

Initial block labels on the vertices. If not supplied, it will be randomly sampled.

Bint (optional, default: None)

Number of blocks (or vertex groups). If not supplied it will be obtained from the parameter b.

eweightEdgePropertyMap (optional, default: None)

Edge multiplicities (for multigraphs or block graphs).

vweightVertexPropertyMap (optional, default: None)

Vertex multiplicities (for block graphs).

recslist of EdgePropertyMap instances (optional, default: [])

List of real or discrete-valued edge covariates.

rec_typeslist of edge covariate types (optional, default: [])

List of types of edge covariates. The possible types are: "real-exponential", "real-normal", "discrete-geometric", "discrete-poisson" or "discrete-binomial".

rec_paramslist of dict (optional, default: [])

Model hyperparameters for edge covariates. This should be a list of dict instances, or the string “microcanonical” (the default if nothing is specified). The keys depend on the type of edge covariate:

"real-exponential" or "discrete-poisson"

The parameter list is ["r", "theta"], corresponding to the parameters of the Gamma prior distribution. If unspecified, the default is the “empirical Bayes” choice: r = 1.0 and theta is the global average of the edge covariate.

"discrete-geometric"

The parameter list is ["alpha", "beta"], corresponding to the parameters of the Beta prior distribution. If unspecified, the default is the noninformative choice: alpha = beta = 1.0

"discrete-binomial"

The parameter list is ["N", "alpha", "beta"], corresponding to the number of trials N and the parameters of the Beta prior distribution. If unspecified, the default is the noninformative choice, alpha = beta = 1.0, and N is taken to be the maximum edge covarite value.

"real-normal"

The parameter list is ["m0", "k0", "v0", "nu0"] corresponding to the normal-inverse-chi-squared prior. If unspecified, the defaults are: m0 = rec.fa.mean(), k0 = 1, v0 = rec.fa.std() ** 2, and nu0 = 3, where rec is the corresponding edge covariate property map.

clabelVertexPropertyMap (optional, default: None)

Constraint labels on the vertices. If supplied, vertices with different label values will not be clustered in the same group.

pclabelVertexPropertyMap (optional, default: None)

Partition constraint labels on the vertices. This has the same interpretation as clabel, but will be used to compute the partition description length.

bfieldVertexPropertyMap (optional, default: None)

Local field acting as a prior for the node partition. This should be a vector property map of type vector<double>, and contain the log-probability for each node to be placed in each group.

deg_corrbool (optional, default: True)

If True, the degree-corrected version of the blockmodel ensemble will be assumed, otherwise the traditional variant will be used.

dense_bgbool (optional, default: False)

If True a dense matrix is used for the block graph, otherwise a sparse matrix will be used.

entropy_args: ``dict`` (optional, default: ``{}``)

Override default arguments for entropy() method and releated operations.

Methods

add_vertex(v, r)

Add vertex v to block r.

collect_edge_marginals([p, update])

Collect the edge marginal histogram, which counts the number of times the endpoints of each node have been assigned to a given block pair.

collect_partition_histogram([h, update, unlabel])

Collect a histogram of partitions.

collect_vertex_marginals([p, b, unlabel, update])

Collect the vertex marginal histogram, which counts the number of times a node was assigned to a given block.

copy([g, eweight, vweight, b, B, deg_corr, ...])

Copies the block state.

draw(**kwargs)

Convenience wrapper to graph_draw() that draws the state of the graph as colors on the vertices and edges.

entropy([adjacency, dl, partition_dl, ...])

Calculate the description length (a.k.a.

exhaustive_sweep([entropy_args, callback, ...])

Perform an exhaustive loop over all possible network partitions.

get_B()

Returns the total number of blocks.

get_Be()

Returns the effective number of blocks, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of nodes in group r.

get_E()

Returns the total number of edges.

get_N()

Returns the total number of nodes.

get_bclabel([clabel])

Returns a VertexPropertyMap corresponding to constraint labels for the block graph.

get_bg()

Returns the block graph.

get_block_state([b, vweight])

Returns a BlockState corresponding to the block graph (i.e. the blocks of the current state become the nodes).

get_blocks()

Returns the property map which contains the block labels for each vertex.

get_bpclabel()

Returns a VertexPropertyMap corresponding to partition constraint labels for the block graph.

get_edges_prob(missing[, spurious, entropy_args])

Compute the joint log-probability of the missing and spurious edges given by missing and spurious (a list of (source, target) tuples, or Edge() instances), together with the observed edges.

get_entropy_args()

Return the current default values for the parameters of the function entropy(), together with other operations that depend on them.

get_er()

Returns the vertex property map of the block graph which contains the number \(e_r\) of half-edges incident on block \(r\).

get_ers()

Returns the edge property map of the block graph which contains the \(e_{rs}\) matrix entries.

get_matrix()

Returns the block matrix (as a sparse csr_matrix), which contains the number of edges between each block pair.

get_move_prob(v, s[, c, d, reverse])

Compute the log-probability of a move proposal for vertex v to block s according to sampling parameters c and d, as obtained with graph_tool.inference.BlockState.sample_vertex_move().

get_nonempty_B()

Returns the total number of nonempty blocks.

get_nr()

Returns the vertex property map of the block graph which contains the block sizes \(n_r\).

get_rec_params()

Get model hyperparameters for edge covariates.

get_state()

Alias to get_blocks().

gibbs_sweep([beta, niter, entropy_args, ...])

Perform niter sweeps of a rejection-free Gibbs MCMC to sample network partitions.

mcmc_sweep([beta, c, d, niter, ...])

Perform niter sweeps of a Metropolis-Hastings acceptance-rejection MCMC to sample network partitions.

move_vertex(v, s)

Move vertex v to block s.

multicanonical_sweep(m_state[, multiflip])

Perform niter sweeps of a non-Markovian multicanonical sampling using the Wang-Landau algorithm.

multiflip_mcmc_sweep([beta, c, psingle, ...])

Perform niter sweeps of a Metropolis-Hastings acceptance-rejection MCMC with multiple simultaneous moves (i.e. merges and splits) to sample network partitions.

multilevel_mcmc_sweep([niter, beta, c, d, ...])

Perform niter sweeps of a multilevel agglomerative acceptance-rejection pseudo-MCMC (i.e. detailed balance is not preserved) to sample network partitions, that uses a bisection search on the number of groups, together with group merges and singe-node moves.

remove_vertex(v)

Remove vertex v from its current group.

reset_entropy_args()

Reset the current default values for the parameters of the function entropy(), together with other operations that depend on them.

sample_graph([canonical, multigraph, ...])

Sample a new graph from the fitted model.

sample_vertex_move(v[, c, d])

Sample block membership proposal of vertex v according to real-valued sampling parameters c and d: For \(c\to 0\) the blocks are sampled according to the local neighborhood and their connections; for \(c\to\infty\) the blocks are sampled randomly.

set_rec_params(params)

Update model hyperparameters for edge covariates.

set_state(b)

Sets the internal partition of the state.

update_entropy_args(**kwargs)

Update the default values for the parameters of the function entropy() from the keyword arguments, in a stateful way, together with other operations that depend on them.

virtual_vertex_move(v, s, **kwargs)

Computes the entropy difference if vertex v is moved to block s.

add_vertex(v, r)[source]#

Add vertex v to block r.

This optionally accepts a list of vertices and blocks to add.

Warning

This can leave the state in an inconsistent state if a vertex is added twice to the same group.

collect_edge_marginals(p=None, update=1)[source]#

Collect the edge marginal histogram, which counts the number of times the endpoints of each node have been assigned to a given block pair.

This should be called multiple times, e.g. after repeated runs of the graph_tool.inference.BlockState.mcmc_sweep() function.

Parameters:
pEdgePropertyMap (optional, default: None)

Edge property map with edge marginals to be updated. If not provided, an empty histogram will be created.

updatefloat (optional, default: 1)

Each call increases the current count by the amount given by this parameter.

Returns:
pEdgePropertyMap

Edge property map with updated edge marginals.

Examples

>>> np.random.seed(42)
>>> gt.seed_rng(42)
>>> g = gt.collection.data["polbooks"]
>>> state = gt.BlockState(g, B=4, deg_corr=True)
>>> pe = None
>>> state.mcmc_sweep(niter=1000)   # remove part of the transient
(...)
>>> for i in range(1000):
...     ret = state.mcmc_sweep(niter=10)
...     pe = state.collect_edge_marginals(pe)
>>> gt.bethe_entropy(g, pe)[0]
-20.424228...
collect_partition_histogram(h=None, update=1, unlabel=True)[source]#

Collect a histogram of partitions.

This should be called multiple times, e.g. after repeated runs of the graph_tool.inference.BlockState.mcmc_sweep() function.

Parameters:
hPartitionHist (optional, default: None)

Partition histogram. If not provided, an empty histogram will be created.

updatefloat (optional, default: 1)

Each call increases the current count by the amount given by this parameter.

unlabelbool (optional, default: True)

If True, a canonical labelling of the groups will be used, so that each partition is uniquely represented.

Returns:
hPartitionHist (optional, default: None)

Updated Partition histogram.

Examples

>>> np.random.seed(42)
>>> gt.seed_rng(42)
>>> g = gt.collection.data["polbooks"]
>>> state = gt.BlockState(g, B=4, deg_corr=True)
>>> ph = None
>>> state.mcmc_sweep(niter=1000)   # remove part of the transient
(...)
>>> for i in range(1000):
...     ret = state.mcmc_sweep(niter=10)
...     ph = state.collect_partition_histogram(ph)
>>> gt.microstate_entropy(ph)
137.247829...
collect_vertex_marginals(p=None, b=None, unlabel=False, update=1)[source]#

Collect the vertex marginal histogram, which counts the number of times a node was assigned to a given block.

This should be called multiple times, e.g. after repeated runs of the graph_tool.inference.BlockState.mcmc_sweep() function.

Parameters:
pVertexPropertyMap (optional, default: None)

Vertex property map with vector-type values, storing the previous block membership counts. If not provided, an empty histogram will be created.

bVertexPropertyMap (optional, default: None)

Vertex property map with group partition. If not provided, the state’s partition will be used.

unlabelbool (optional, default: False)

If True, a canonical labelling of the groups will be used, so that each partition is uniquely represented.

updateint (optional, default: 1)

Each call increases the current count by the amount given by this parameter.

Returns:
pVertexPropertyMap

Vertex property map with vector-type values, storing the accumulated block membership counts.

Examples

>>> np.random.seed(42)
>>> gt.seed_rng(42)
>>> g = gt.collection.data["polbooks"]
>>> state = gt.BlockState(g, B=4, deg_corr=True)
>>> pv = None
>>> state.mcmc_sweep(niter=1000)   # remove part of the transient
(...)
>>> for i in range(1000):
...     ret = state.mcmc_sweep(niter=10)
...     pv = state.collect_vertex_marginals(pv)
>>> gt.mf_entropy(g, pv)
33.560691...
>>> gt.graph_draw(g, pos=g.vp["pos"], vertex_shape="pie",
...               vertex_pie_fractions=pv, output="polbooks_blocks_soft_B4.svg")
<...>
../_images/polbooks_blocks_soft_B4.svg

“Soft” block partition of a political books network with \(B=4\).#

copy(g=None, eweight=None, vweight=None, b=None, B=None, deg_corr=None, clabel=None, overlap=False, pclabel=None, bfield=None, dense_bg=None, **kwargs)[source]#

Copies the block state. The parameters override the state properties, and have the same meaning as in the constructor.

draw(**kwargs)#

Convenience wrapper to graph_draw() that draws the state of the graph as colors on the vertices and edges.

entropy(adjacency=True, dl=True, partition_dl=True, degree_dl=True, degree_dl_kind='distributed', edges_dl=True, dense=False, multigraph=True, deg_entropy=True, recs=True, recs_dl=True, beta_dl=1.0, Bfield=True, exact=True, **kwargs)#

Calculate the description length (a.k.a. negative joint log-likelihood) associated with the current block partition.

Warning

The default arguments of this function are overriden by those obtained from get_entropy_args(). To update the defaults in a stateful way, update_entropy_args() should be called.

Parameters:
adjacencybool (optional, default: True)

If True, the adjacency term of the description length will be included.

dlbool (optional, default: True)

If True, the description length for the parameters will be included.

partition_dlbool (optional, default: True)

If True, and dl == True the partition description length will be included.

degree_dlbool (optional, default: True)

If True, and dl == True the degree sequence description length will be included (for degree-corrected models).

degree_dl_kindstr (optional, default: "distributed")

This specifies the prior used for the degree sequence. It must be one of: "uniform", "distributed" (default) or "entropy".

edges_dlbool (optional, default: True)

If True, and dl == True the edge matrix description length will be included.

densebool (optional, default: False)

If True, the “dense” variant of the entropy will be computed.

multigraphbool (optional, default: True)

If True, the multigraph entropy will be used.

deg_entropybool (optional, default: True)

If True, the degree entropy term that is independent of the network partition will be included (for degree-corrected models).

recsbool (optional, default: True)

If True, the likelihood for real or discrete-valued edge covariates is computed.

recs_dlbool (optional, default: True)

If True, and dl == True the edge covariate description length will be included.

beta_dldouble (optional, default: 1.)

Prior inverse temperature.

Bfieldbool (optional, default: False)

If True, the Bfield parameter passed to the construtor will be taken into account.

exactbool (optional, default: True)

If True, the exact expressions will be used. Otherwise, Stirling’s factorial approximation will be used for some terms.

Notes

The “entropy” of the state is the negative log-likelihood of the microcanonical SBM, that includes the generated graph \(\boldsymbol{A}\) and the model parameters \(\boldsymbol{\theta}\),

\[\begin{split}\Sigma &= - \ln P(\boldsymbol{A},\boldsymbol{\theta}) \\ &= - \ln P(\boldsymbol{A}|\boldsymbol{\theta}) - \ln P(\boldsymbol{\theta}).\end{split}\]

This value is also called the description length of the data, and it corresponds to the amount of information required to describe it (in nats).

For the traditional blockmodel (deg_corr == False), the model parameters are \(\boldsymbol{\theta} = \{\boldsymbol{e}, \boldsymbol{b}\}\), where \(\boldsymbol{e}\) is the matrix of edge counts between blocks, and \(\boldsymbol{b}\) is the partition of the nodes into blocks. For the degree-corrected blockmodel (deg_corr == True), we have an additional set of parameters, namely the degree sequence \(\boldsymbol{k}\).

For the traditional blockmodel, the model likelihood is

\[\begin{split}P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b}) &= \frac{\prod_{r<s}e_{rs}!\prod_re_{rr}!!}{\prod_rn_r^{e_r}}\times \frac{1}{\prod_{i<j}A_{ij}!\prod_iA_{ii}!!},\\ P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b}) &= \frac{\prod_{rs}e_{rs}!}{\prod_rn_r^{e_r}}\times \frac{1}{\prod_{ij}A_{ij}!},\end{split}\]

for undirected and directed graphs, respectively, where \(e_{rs}\) is the number of edges from block \(r\) to \(s\) (or the number of half-edges for the undirected case when \(r=s\)), and \(n_r\) is the number of vertices in block \(r\) .

For the degree-corrected variant the equivalent expressions are

\[\begin{split}P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b},\boldsymbol{k}) &= \frac{\prod_{r<s}e_{rs}!\prod_re_{rr}!!}{\prod_re_r!}\times \frac{\prod_ik_i!}{\prod_{i<j}A_{ij}!\prod_iA_{ii}!!},\\ P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b},\boldsymbol{k}) &= \frac{\prod_{rs}e_{rs}!}{\prod_re_r^+!\prod_re_r^-!}\times \frac{\prod_ik_i^+!\prod_ik_i^-!}{\prod_{ij}A_{ij}!},\end{split}\]

where \(e_r = \sum_se_{rs}\) is the number of half-edges incident on block \(r\), and \(e^+_r = \sum_se_{rs}\) and \(e^-_r = \sum_se_{sr}\) are the numbers of out- and in-edges adjacent to block \(r\), respectively.

If exact == False, Stirling’s approximation is used in the above expression.

If dense == True, the likelihood for the non-degree-corrected model becomes instead

\[\begin{split}P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b})^{-1} &= \prod_{r<s}{n_rn_s\choose e_{rs}}\prod_r{{n_r\choose 2}\choose e_{rr}/2},\\ P(\boldsymbol{A}|\boldsymbol{e},\boldsymbol{b})^{-1} &= \prod_{rs}{n_rn_s\choose e_{rs}}\end{split}\]

if multigraph == False, otherwise we replace \({n\choose m}\to\left(\!\!{n\choose m}\!\!\right)\) above, where \(\left(\!\!{n\choose m}\!\!\right) = {n+m-1\choose m}\). A “dense” entropy for the degree-corrected model is not available, and if requested will raise a NotImplementedError.

If dl == True, the description length \(\mathcal{L} = -\ln P(\boldsymbol{\theta})\) of the model will be returned as well. The terms \(P(\boldsymbol{e})\) and \(P(\boldsymbol{b})\) are described in described as follows.

For an undirected graph, the number of distinct \(e_{rs}\) matrices is given by,

\[\Omega_m = \left(\!\!{B(B+1)/2 \choose E}\!\!\right)\]

and for a directed graph,

\[\Omega_m = \left(\!\!{B^2 \choose E}\!\!\right)\]

where \(\left(\!{n \choose k}\!\right) = {n+k-1\choose k}\) is the number of \(k\) combinations with repetitions from a set of size \(n\). Hence, we have the description length of the edge counts

\[-\ln P(\boldsymbol{e}) = \ln \Omega_m.\]

For the node partition \(\boldsymbol{b}\) we assume a two-level Bayesian hierarchy, where first the group size histogram is generated, and conditioned on it the partition, which leads to a description length:

\[-\ln P(\boldsymbol{b}) = \ln {N - 1 \choose B - 1} + \ln N! - \sum_r \ln n_r!.\]

where \(n_r\) is the number of nodes in block \(r\).

The total information necessary to describe the model is then,

\[-\ln P(\boldsymbol{e}, \boldsymbol{b}) = -\ln P(\boldsymbol{e}) - \ln P(\boldsymbol{b}).\]

If nr is None, it is assumed \(n_r=N/B\). If nr is False, the partition term \(-\ln P(\boldsymbol{b})\) is omitted entirely.

For the degree-corrected model we need to specify the prior \(P(\boldsymbol{k})\) for the degree sequence as well. Here there are three options:

  1. degree_dl_kind == "uniform"

    \[P(\boldsymbol{k}|\boldsymbol{e},\boldsymbol{b}) = \prod_r\left(\!\!{n_r\choose e_r}\!\!\right)^{-1}.\]

    This corresponds to a noninformative prior, where the degrees are sampled from a uniform distribution.

  2. degree_dl_kind == "distributed" (default)

    \[P(\boldsymbol{k}|\boldsymbol{e},\boldsymbol{b}) = \prod_r\frac{\prod_k\eta_k^r!}{n_r!} \prod_r q(e_r, n_r)^{-1}\]

    with \(\eta_k^r\) being the number of nodes with degree \(k\) in group \(r\), and \(q(m,n)\) being the number of partitions of integer \(m\) into at most \(n\) parts. This is given by the recurrence

    \[q(m, n) = q(m, n-1) + q(m-n, n),\]

    with boundary conditions \(q(m,n) = q(m,m)\) for \(n>m\), \(q(m,0) = 0\) for \(m>0\), and \(q(0,0) = 1\).

    This corresponds to a prior for the degree sequence conditioned on the degree counts, which are themselves sampled from a uniform hyperprior. This option should be preferred in most cases.

  3. degree_dl_kind == "entropy"

    \[P(\boldsymbol{k}|\boldsymbol{e},\boldsymbol{b}) \approx \prod_r\exp\left(-n_rH(\boldsymbol{k}_r)\right)\]

    where \(H(\boldsymbol{k}_r) = -\sum_kp_r(k)\ln p_r(k)\) is the entropy of the degree distribution inside block \(r\).

    Note that, differently from the other two choices, this represents only an approximation of the description length. It is meant to be used only for comparison purposes, and should be avoided in practice.

For the directed case, the above expressions are duplicated for the in- and out-degrees.

References

[peixoto-nonparametric-2017]

Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub, @tor], arXiv: 1610.02703

[peixoto-hierarchical-2014]

Tiago P. Peixoto, “Hierarchical block structures and high-resolution model selection in large networks “, Phys. Rev. X 4, 011047 (2014), DOI: 10.1103/PhysRevX.4.011047 [sci-hub, @tor], arXiv: 1310.4377.

[peixoto-weighted-2017]

Tiago P. Peixoto, “Nonparametric weighted stochastic block models”, Phys. Rev. E 97, 012306 (2018), DOI: 10.1103/PhysRevE.97.012306 [sci-hub, @tor], arXiv: 1708.01432

exhaustive_sweep(entropy_args={}, callback=None, density=None, vertices=None, initial_partition=None, max_iter=None)#

Perform an exhaustive loop over all possible network partitions.

Parameters:
entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.BlockState.entropy().

callbackcallable object (optional, default: None)

Function to be called for each partition, with three arguments (S, S_min, b_min) corresponding to the the current entropy value, the minimum entropy value so far, and the corresponding partition, respectively. If not provided, and hist is None an iterator over the same values will be returned instead.

densitytuple (optional, default: None)

If provided, it should contain a tuple with values (S_min, S_max, n_bins), which will be used to obtain the density of states via a histogram of size n_bins. This parameter is ignored unless callback is None.

verticesiterable of ints (optional, default: None)

If provided, this should be a list of vertices which will be moved. Otherwise, all vertices will.

initial_partitioniterable of ints (optional, default: None)

If provided, this will provide the initial partition for the iteration.

max_iterint (optional, default: None)

If provided, this will limit the total number of iterations.

Returns:
statesiterator over (S, S_min, b_min)

If callback is None and hist is None, the function will return an iterator over (S, S_min, b_min) corresponding to the the current entropy value, the minimum entropy value so far, and the corresponding partition, respectively.

Ss, countspair of numpy.ndarray

If callback is None and hist is not None, the function will return the values of each bin (Ss) and the state count of each bin (counts).

b_minVertexPropertyMap

If callback is not None or hist is not None, the function will also return partition with smallest entropy.

Notes

This algorithm has an \(O(B^N)\) complexity, where \(B\) is the number of groups, and \(N\) is the number of vertices.

get_B()[source]#

Returns the total number of blocks.

get_Be()[source]#

Returns the effective number of blocks, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of nodes in group r.

get_E()[source]#

Returns the total number of edges.

get_N()[source]#

Returns the total number of nodes.

get_bclabel(clabel=None)[source]#

Returns a VertexPropertyMap corresponding to constraint labels for the block graph.

get_bg()[source]#

Returns the block graph.

get_block_state(b=None, vweight=False, **kwargs)[source]#

Returns a BlockState corresponding to the block graph (i.e. the blocks of the current state become the nodes). The parameters have the same meaning as the in the constructor. If vweight == True the nodes of the block state are weighted with the node counts.

get_blocks()[source]#

Returns the property map which contains the block labels for each vertex.

get_bpclabel()[source]#

Returns a VertexPropertyMap corresponding to partition constraint labels for the block graph.

get_edges_prob(missing, spurious=[], entropy_args={})[source]#

Compute the joint log-probability of the missing and spurious edges given by missing and spurious (a list of (source, target) tuples, or Edge() instances), together with the observed edges.

More precisely, the log-likelihood returned is

\[\ln \frac{P(\boldsymbol G + \delta \boldsymbol G | \boldsymbol b)}{P(\boldsymbol G| \boldsymbol b)}\]

where \(\boldsymbol G + \delta \boldsymbol G\) is the modified graph (with missing edges added and spurious edges deleted).

The values in entropy_args are passed to entropy() to calculate the log-probability.

get_entropy_args()#

Return the current default values for the parameters of the function entropy(), together with other operations that depend on them.

get_er()[source]#

Returns the vertex property map of the block graph which contains the number \(e_r\) of half-edges incident on block \(r\). If the graph is directed, a pair of property maps is returned, with the number of out-edges \(e^+_r\) and in-edges \(e^-_r\), respectively.

get_ers()[source]#

Returns the edge property map of the block graph which contains the \(e_{rs}\) matrix entries. For undirected graphs, the diagonal values (self-loops) contain \(e_{rr}/2\).

get_matrix()[source]#

Returns the block matrix (as a sparse csr_matrix), which contains the number of edges between each block pair.

Warning

This corresponds to the adjacency matrix of the block graph, which by convention includes twice the amount of edges in the diagonal entries if the graph is undirected.

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.BlockState(g, B=5, deg_corr=True)
>>> state.mcmc_sweep(niter=1000)
(...)
>>> m = state.get_matrix()
>>> figure()
<...>
>>> matshow(m.todense())
<...>
>>> savefig("bloc_mat.svg")
../_images/bloc_mat.svg

A 5x5 block matrix.#

get_move_prob(v, s, c=1.0, d=0.1, reverse=False)[source]#

Compute the log-probability of a move proposal for vertex v to block s according to sampling parameters c and d, as obtained with graph_tool.inference.BlockState.sample_vertex_move(). If reverse == True, the reverse probability of moving the node back from block s to its current one is obtained.

get_nonempty_B()[source]#

Returns the total number of nonempty blocks.

get_nr()[source]#

Returns the vertex property map of the block graph which contains the block sizes \(n_r\).

get_rec_params()[source]#

Get model hyperparameters for edge covariates.

get_state()[source]#

Alias to get_blocks().

gibbs_sweep(beta=1.0, niter=1, entropy_args={}, allow_new_group=True, sequential=True, deterministic=False, vertices=None, verbose=False, **kwargs)#

Perform niter sweeps of a rejection-free Gibbs MCMC to sample network partitions.

Parameters:
betafloat (optional, default: 1.)

Inverse temperature.

niterint (optional, default: 1)

Number of sweeps to perform. During each sweep, a move attempt is made for each node.

entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.BlockState.entropy().

allow_new_groupbool (optional, default: True)

Allow the number of groups to increase and decrease.

sequentialbool (optional, default: True)

If sequential == True each vertex move attempt is made sequentially, where vertices are visited in random order. Otherwise the moves are attempted by sampling vertices randomly, so that the same vertex can be moved more than once, before other vertices had the chance to move.

deterministicbool (optional, default: False)

If sequential == True and deterministic == True the vertices will be visited in deterministic order.

verticeslist of ints (optional, default: None)

If provided, this should be a list of vertices which will be moved. Otherwise, all vertices will.

verbosebool (optional, default: False)

If verbose == True, detailed information will be displayed.

Returns:
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E\times B)\) complexity, where \(B\) is the number of groups, and \(E\) is the number of edges.

mcmc_sweep(beta=1.0, c=0.5, d=0.01, niter=1, entropy_args={}, allow_vacate=True, sequential=True, deterministic=False, vertices=None, verbose=False, **kwargs)#

Perform niter sweeps of a Metropolis-Hastings acceptance-rejection MCMC to sample network partitions.

Parameters:
betafloat (optional, default: 1.)

Inverse temperature.

cfloat (optional, default: .5)

Sampling parameter c for move proposals: For \(c\to 0\) the blocks are sampled according to the local neighborhood of a given node and their block connections; for \(c\to\infty\) the blocks are sampled randomly. Note that only for \(c > 0\) the MCMC is guaranteed to be ergodic.

dfloat (optional, default: .01)

Probability of selecting a new (i.e. empty) group for a given move.

niterint (optional, default: 1)

Number of sweeps to perform. During each sweep, a move attempt is made for each node.

entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.BlockState.entropy().

allow_vacatebool (optional, default: True)

Allow groups to be vacated.

sequentialbool (optional, default: True)

If sequential == True each vertex move attempt is made sequentially, where vertices are visited in random order. Otherwise the moves are attempted by sampling vertices randomly, so that the same vertex can be moved more than once, before other vertices had the chance to move.

deterministicbool (optional, default: False)

If sequential == True and deterministic == True the vertices will be visited in deterministic order.

verticeslist of ints (optional, default: None)

If provided, this should be a list of vertices which will be moved. Otherwise, all vertices will.

verbosebool (optional, default: False)

If verbose == True, detailed information will be displayed.

Returns:
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of groups).

References

[peixoto-efficient-2014]

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378

move_vertex(v, s)[source]#

Move vertex v to block s.

This optionally accepts a list of vertices and blocks to move simultaneously.

multicanonical_sweep(m_state, multiflip=False, **kwargs)#

Perform niter sweeps of a non-Markovian multicanonical sampling using the Wang-Landau algorithm.

Parameters:
m_stateMulticanonicalState

MulticanonicalState instance containing the current state of the Wang-Landau run.

multiflipbool (optional, default: False)

If True, multiflip_mcmc_sweep() will be used, otherwise mcmc_sweep().

**kwargsKeyword parameter list

The remaining parameters will be passed to multiflip_mcmc_sweep() or mcmc_sweep().

Returns:
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of groups).

References

[wang-efficient-2001]

Fugao Wang, D. P. Landau, “An efficient, multiple range random walk algorithm to calculate the density of states”, Phys. Rev. Lett. 86, 2050 (2001), DOI: 10.1103/PhysRevLett.86.2050 [sci-hub, @tor], arXiv: cond-mat/0011174

multiflip_mcmc_sweep(beta=1.0, c=0.5, psingle=None, psplit=1, pmerge=1, pmergesplit=1, pmovelabel=0, d=0.01, gibbs_sweeps=10, niter=1, entropy_args={}, accept_stats=None, verbose=False, **kwargs)#

Perform niter sweeps of a Metropolis-Hastings acceptance-rejection MCMC with multiple simultaneous moves (i.e. merges and splits) to sample network partitions.

Parameters:
betafloat (optional, default: 1.)

Inverse temperature.

cfloat (optional, default: .5)

Sampling parameter c for move proposals: For \(c\to 0\) the blocks are sampled according to the local neighborhood of a given node and their block connections; for \(c\to\infty\) the blocks are sampled randomly. Note that only for \(c > 0\) the MCMC is guaranteed to be ergodic.

psinglefloat (optional, default: None)

Relative probability of proposing a single node move. If None, it will be selected as the number of nodes in the graph.

psplitfloat (optional, default: 1)

Relative probability of proposing a group split.

pmergesplitfloat (optional, default: 1)

Relative probability of proposing a marge-split move.

pmovelabelfloat (optional, default: 0)

Relative probability of proposing a group label move.

dfloat (optional, default: 1)

Probability of selecting a new (i.e. empty) group for a given single-node move.

gibbs_sweepsint (optional, default: 10)

Number of sweeps of Gibbs sampling to be performed (i.e. each node is attempted once per sweep) to refine a split proposal.

niterint (optional, default: 1)

Number of sweeps to perform. During each sweep, a move attempt is made for each node, on average.

entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.BlockState.entropy().

accept_statsdict (optional, default: None)

If provided, this dictionary will be updated with the proposal and acceptance counts for each kind of move.

verbosebool (optional, default: False)

If verbose == True, detailed information will be displayed.

Returns:
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of groups).

References

[peixoto-merge-split-2020]

Tiago P. Peixoto, “Merge-split Markov chain Monte Carlo for community detection”, Phys. Rev. E 102, 012305 (2020), DOI: 10.1103/PhysRevE.102.012305 [sci-hub, @tor], arXiv: 2003.07070

multilevel_mcmc_sweep(niter=1, beta=inf, c=0.5, d=0.01, r=0.9, random_bisect=True, merge_sweeps=10, mh_sweeps=10, init_r=0.99, init_min_iter=5, init_beta=1.0, gibbs=False, B_min=1, B_max=18446744073709551615, b_min=None, b_max=None, M=None, cache_states=True, force_accept=False, parallel=False, entropy_args={}, verbose=False, **kwargs)#

Perform niter sweeps of a multilevel agglomerative acceptance-rejection pseudo-MCMC (i.e. detailed balance is not preserved) to sample network partitions, that uses a bisection search on the number of groups, together with group merges and singe-node moves.

Parameters:
niterint (optional, default: 1)

Number of sweeps to perform. During each sweep, a move attempt is made for each node, on average.

betafloat (optional, default: numpy.inf)

Inverse temperature.

cfloat (optional, default: .5)

Sampling parameter c for move proposals: For \(c\to 0\) the blocks are sampled according to the local neighborhood of a given node and their block connections; for \(c\to\infty\) the blocks are sampled randomly. Note that only for \(c > 0\) the MCMC is guaranteed to be ergodic.

dfloat (optional, default: .01)

Probability of selecting a new (i.e. empty) group for a given single-node move.

rfloat (optional, default: 0.9)

Group shrink ratio. The number of groups is reduced by this fraction at each merge sweep.

random_bisectbool (optional, default: True)

If True, bisections are done at randomly chosen intervals. Otherwise a Fibonacci sequence is used.

merge_sweepsint (optional, default: 10)

Number of sweeps spent to find good merge proposals.

mh_sweepsint (optional, default: 10)

Number of single-node Metropolis-Hastings sweeps between merge splits.

init_rdouble (optional, default: 0.99)

Stopping criterion for the intialization phase, after each node is put in their own group, to set the initial upper bound of the bisection search. A number of single-node Metropolis-Hastings sweeps is done until the number of groups is shrunk by a factor that is larger than this parameter.

init_min_iterint (optional, default: 5)

Minimum number of iterations at the intialization phase.

init_betafloat (optional, default: 1.)

Inverse temperature to be used for the very first sweep of the initialization phase.

gibbsbool (optional, default: False)

If True, the single node moves use (slower) Gibbs sampling, rather than Metropolis-Hastings.

B_minint (optional, default: 1)

Minimum number of groups to be considered in the search.

b_minVertexPropertyMap (optional, default: None)

If provided, this will be used for the partition corresponding to B_min.

B_maxint (optional, default: 1)

Maximum number of groups to be considered in the search.

b_maxVertexPropertyMap (optional, default: None)

If provided, this will be used for the partition corresponding to B_max.

Mint (optional, default: None)

Maximum number of groups to select for the multilevel move. If None is provided, then all groups are always elected.

cache_statesbool (optional, default: True)

If True, intermediary states will be cached during the bisection search.

force_acceptbool (optional, default: False)

If True, new state will be accepted even if it does not improve the objective function.

parallelbool (optional, default: False)

If True, the algorithm will run in parallel (if enabled during compilation).

entropy_argsdict (optional, default: {})

Entropy arguments, with the same meaning and defaults as in graph_tool.inference.BlockState.entropy().

verbosebool (optional, default: False)

If verbose == True, detailed information will be displayed.

Returns:
dSfloat

Entropy difference after the sweeps.

nattemptsint

Number of vertex moves attempted.

nmovesint

Number of vertices moved.

Notes

This algorithm has an \(O(E\ln^2 N)\) complexity, where \(E\) is the number of edges and \(N\) is the number of nodes (independently of the number of groups).

References

[peixoto-efficient-2014]

Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378

remove_vertex(v)[source]#

Remove vertex v from its current group.

This optionally accepts a list of vertices to remove.

Warning

This will leave the state in an inconsistent state before the vertex is returned to some other group, or if the same vertex is removed twice.

reset_entropy_args()#

Reset the current default values for the parameters of the function entropy(), together with other operations that depend on them.

sample_graph(canonical=False, multigraph=True, self_loops=True, sample_params=False, max_ent=False, n_iter=1000)[source]#

Sample a new graph from the fitted model.

Parameters:
canonicalbool (optional, default: False)

If canonical == True, the graph will be sampled from the maximum-likelihood estimate of the canonical stochastic block model. Otherwise, it will be sampled from the microcanonical model.

multigraphbool (optional, default: True)

If True, parallel edges will be allowed.

self-loopsbool (optional, default: True)

If True, self-loops will be allowed.

sample_paramsbool (optional, default: True)

If True, and canonical == True and max_ent == False, the count parameters (edges between groups and node degrees) will be sampled from their posterior distribution conditioned on the actual state. Otherwise, their maximum-likelihood values will be used.

max_entbool (optional, default: False)

If True, maximum-entropy model variants will be used.

n_iterint (optional, default: 1000)

Number of iterations used (only relevant if canonical == False and max_ent == True).

Returns:
gGraph

Generated graph.

Notes

This function is just a convenience wrapper to generate_sbm(). However, if max_ent==True and canonical == False it wraps random_rewire() instead.

Examples

>>> g = gt.collection.data["polbooks"]
>>> state = gt.minimize_blockmodel_dl(g, multilevel_mcmc_args=dict(B_max=3))
>>> u = state.sample_graph(canonical=True, self_loops=False, multigraph=False)
>>> ustate = gt.BlockState(u, b=state.b)
>>> state.draw(pos=g.vp.pos, output="polbooks-sbm.svg")
<...>
>>> ustate.draw(pos=u.own_property(g.vp.pos), output="polbooks-sbm-sampled.svg")
<...>
../_images/polbooks-sbm.svg ../_images/polbooks-sbm-sampled.svg

Left: Political books network. Right: Sample from the degree-corrected SBM fitted to the original network.

sample_vertex_move(v, c=1.0, d=0.1)[source]#

Sample block membership proposal of vertex v according to real-valued sampling parameters c and d: For \(c\to 0\) the blocks are sampled according to the local neighborhood and their connections; for \(c\to\infty\) the blocks are sampled randomly. With a probability d, a new (empty) group is sampled.

set_rec_params(params)[source]#

Update model hyperparameters for edge covariates.

set_state(b)[source]#

Sets the internal partition of the state.

update_entropy_args(**kwargs)#

Update the default values for the parameters of the function entropy() from the keyword arguments, in a stateful way, together with other operations that depend on them.

Values updated in this manner are preserved by the copying or pickling of the state.

virtual_vertex_move(v, s, **kwargs)[source]#

Computes the entropy difference if vertex v is moved to block s. The remaining parameters are the same as in graph_tool.inference.BlockState.entropy().