ModeClusterState#
- class graph_tool.inference.ModeClusterState(bs, b=None, B=1, nested=False, relabel=True, **kwargs)[source]#
Bases:
MCMCState
,MultiflipMCMCState
,MultilevelMCMCState
The mixed random label model state for a set of labelled partitions, which attempts to align them inside clusters with a common group labelling.
- Parameters:
- bslist of iterables
List of partitions to be aligned. If
nested=True
, these should be hierarchical partitions, composed each as a list of partitions.- biterable (optional, default:
None
) Initial cluster membership for every partition. If
None
a random division intoB
groups will be used.- B
int
(optional, default:1
) Number of groups for initial division.
- relabel
bool
(optional, default:True
) If
True
, an initial alignment of the partitions will be attempted during instantiation, otherwise they will be incorporated as they are.- nested
bool
(optional, default:False
) If
True
, the partitions will be assumed to be hierarchical.
References
[peixoto-revealing-2021]Tiago P. Peixoto, “Revealing consensus and dissensus between network partitions”, Phys. Rev. X 11 021003 (2021) DOI: 10.1103/PhysRevX.11.021003 [sci-hub, @tor], arXiv: 2005.13977
Methods
add_partition
(b, r[, relabel])Add partition
b
in clusterr
, after relabelling it ifrelabel=True
.classify_partition
(b[, relabel, new_group, ...])Returns the cluster
r
to which partitionb
would belong, after relabelling it ifrelabel=True
, according to the most probable assignment, or randomly sampled according to the relative probabilities ifsample==True
.copy
([bs, b])Copies the state.
entropy
()Return the description length (negative joint log-likelihood).
get_B
()Returns the total number of clusters.
get_Be
()Returns the effective number of clusters, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of partitions in cluster \(r\).
Return the current default values for the parameters of the function
entropy()
, together with other operations that depend on them.get_mode
(r)Return the mode in cluster
r
as an instance ofPartitionModeState
.get_modes
([sort])Return the list of nonempty modes, as instances of
PartitionModeState
.get_wr
()Return cluster sizes.
mcmc_sweep
([beta, c, d, niter, ...])Perform
niter
sweeps of a Metropolis-Hastings acceptance-rejection MCMC to sample network partitions.multiflip_mcmc_sweep
([beta, c, psingle, ...])Perform
niter
sweeps of a Metropolis-Hastings acceptance-rejection MCMC with multiple simultaneous moves (i.e. merges and splits) to sample network partitions.multilevel_mcmc_sweep
([niter, beta, c, d, ...])Perform
niter
sweeps of a multilevel agglomerative acceptance-rejection pseudo-MCMC (i.e. detailed balance is not preserved) to sample network partitions, that uses a bisection search on the number of groups, together with group merges and singe-node moves.posterior_entropy
([MLE])Return the entropy of the random label model, using maximum likelihood estimates for the marginal node probabilities if
`MLE=True`
, otherwise using posterior mean estimates.posterior_lprob
(r, b[, MLE])Return the log-probability of partition
b
belonging to moder
, using maximum likelihood estimates for the marginal node probabilities if`MLE=True`
, otherwise using posterior mean estimates.relabel
([epsilon, maxiter])Attempt to align group labels between clusters via a greedy algorithm.
For every cluster, removes and re-adds every partition, after relabelling, and returns the entropy difference (negative log probability).
Reset the current default values for the parameters of the function
entropy()
, together with other operations that depend on them.sample_nested_partition
([MLE, fix_empty])Sampled a cluster label and nested partition from the inferred model, using maximum likelihood estimates for the marginal node probabilities if
`MLE=True`
, otherwise using posterior mean estimates.sample_partition
([MLE])Sampled a cluster label and partition from the inferred model, using maximum likelihood estimates for the marginal node probabilities if
`MLE=True`
, otherwise using posterior mean estimates.update_entropy_args
(**kwargs)Update the default values for the parameters of the function
entropy()
from the keyword arguments, in a stateful way, together with other operations that depend on them.virtual_add_partition
(b, r[, relabel])Computes the entropy difference (negative log probability) if partition
b
were inserted in clusterr
, after relabelling it ifrelabel=True
.- add_partition(b, r, relabel=True)[source]#
Add partition
b
in clusterr
, after relabelling it ifrelabel=True
.
- classify_partition(b, relabel=True, new_group=True, sample=False)[source]#
Returns the cluster
r
to which partitionb
would belong, after relabelling it ifrelabel=True
, according to the most probable assignment, or randomly sampled according to the relative probabilities ifsample==True
. Ifnew_group==True
, a new previously unoccupied group is also considered for the classification.
- copy(bs=None, b=None)[source]#
Copies the state. The parameters override the state properties, and have the same meaning as in the constructor.
- entropy()#
Return the description length (negative joint log-likelihood).
Warning
The default arguments of this function are overriden by those obtained from
get_entropy_args()
. To update the defaults in a stateful way,update_entropy_args()
should be called.
- get_Be()[source]#
Returns the effective number of clusters, defined as \(e^{H}\), with \(H=-\sum_r\frac{n_r}{N}\ln \frac{n_r}{N}\), where \(n_r\) is the number of partitions in cluster \(r\).
- get_entropy_args()#
Return the current default values for the parameters of the function
entropy()
, together with other operations that depend on them.
- get_mode(r)[source]#
Return the mode in cluster
r
as an instance ofPartitionModeState
.
- get_modes(sort=True)[source]#
Return the list of nonempty modes, as instances of
PartitionModeState
. If sorted == True, the modes are retured in decreasing order with respect to their size.
- mcmc_sweep(beta=1.0, c=0.5, d=0.01, niter=1, entropy_args={}, allow_vacate=True, sequential=True, deterministic=False, vertices=None, verbose=False, **kwargs)#
Perform
niter
sweeps of a Metropolis-Hastings acceptance-rejection MCMC to sample network partitions.- Parameters:
- beta
float
(optional, default:1.
) Inverse temperature.
- c
float
(optional, default:.5
) Sampling parameter
c
for move proposals: For \(c\to 0\) the blocks are sampled according to the local neighborhood of a given node and their block connections; for \(c\to\infty\) the blocks are sampled randomly. Note that only for \(c > 0\) the MCMC is guaranteed to be ergodic.- d
float
(optional, default:.01
) Probability of selecting a new (i.e. empty) group for a given move.
- niter
int
(optional, default:1
) Number of sweeps to perform. During each sweep, a move attempt is made for each node.
- entropy_args
dict
(optional, default:{}
) Entropy arguments, with the same meaning and defaults as in
graph_tool.inference.BlockState.entropy()
.- allow_vacate
bool
(optional, default:True
) Allow groups to be vacated.
- sequential
bool
(optional, default:True
) If
sequential == True
each vertex move attempt is made sequentially, where vertices are visited in random order. Otherwise the moves are attempted by sampling vertices randomly, so that the same vertex can be moved more than once, before other vertices had the chance to move.- deterministic
bool
(optional, default:False
) If
sequential == True
anddeterministic == True
the vertices will be visited in deterministic order.- vertices
list
of ints (optional, default:None
) If provided, this should be a list of vertices which will be moved. Otherwise, all vertices will.
- verbose
bool
(optional, default:False
) If
verbose == True
, detailed information will be displayed.
- beta
- Returns:
- dS
float
Entropy difference after the sweeps.
- nattempts
int
Number of vertex moves attempted.
- nmoves
int
Number of vertices moved.
- dS
Notes
This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of groups).
References
[peixoto-efficient-2014]Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378
- multiflip_mcmc_sweep(beta=1.0, c=0.5, psingle=None, psplit=1, pmerge=1, pmergesplit=1, pmovelabel=0, d=0.01, gibbs_sweeps=10, niter=1, entropy_args={}, accept_stats=None, verbose=False, **kwargs)#
Perform
niter
sweeps of a Metropolis-Hastings acceptance-rejection MCMC with multiple simultaneous moves (i.e. merges and splits) to sample network partitions.- Parameters:
- beta
float
(optional, default:1.
) Inverse temperature.
- c
float
(optional, default:.5
) Sampling parameter
c
for move proposals: For \(c\to 0\) the blocks are sampled according to the local neighborhood of a given node and their block connections; for \(c\to\infty\) the blocks are sampled randomly. Note that only for \(c > 0\) the MCMC is guaranteed to be ergodic.- psingle
float
(optional, default:None
) Relative probability of proposing a single node move. If
None
, it will be selected as the number of nodes in the graph.- psplit
float
(optional, default:1
) Relative probability of proposing a group split.
- pmergesplit
float
(optional, default:1
) Relative probability of proposing a marge-split move.
- pmovelabel
float
(optional, default:0
) Relative probability of proposing a group label move.
- d
float
(optional, default:1
) Probability of selecting a new (i.e. empty) group for a given single-node move.
- gibbs_sweeps
int
(optional, default:10
) Number of sweeps of Gibbs sampling to be performed (i.e. each node is attempted once per sweep) to refine a split proposal.
- niter
int
(optional, default:1
) Number of sweeps to perform. During each sweep, a move attempt is made for each node, on average.
- entropy_args
dict
(optional, default:{}
) Entropy arguments, with the same meaning and defaults as in
graph_tool.inference.BlockState.entropy()
.- accept_stats
dict
(optional, default:None
) If provided, this dictionary will be updated with the proposal and acceptance counts for each kind of move.
- verbose
bool
(optional, default:False
) If
verbose == True
, detailed information will be displayed.
- beta
- Returns:
- dS
float
Entropy difference after the sweeps.
- nattempts
int
Number of vertex moves attempted.
- nmoves
int
Number of vertices moved.
- dS
Notes
This algorithm has an \(O(E)\) complexity, where \(E\) is the number of edges (independent of the number of groups).
References
[peixoto-merge-split-2020]Tiago P. Peixoto, “Merge-split Markov chain Monte Carlo for community detection”, Phys. Rev. E 102, 012305 (2020), DOI: 10.1103/PhysRevE.102.012305 [sci-hub, @tor], arXiv: 2003.07070
- multilevel_mcmc_sweep(niter=1, beta=inf, c=0.5, d=0.01, r=0.9, random_bisect=True, merge_sweeps=10, mh_sweeps=10, init_r=0.99, init_min_iter=5, init_beta=1.0, gibbs=False, B_min=1, B_max=18446744073709551615, b_min=None, b_max=None, M=None, cache_states=True, force_accept=False, parallel=False, entropy_args={}, verbose=False, **kwargs)#
Perform
niter
sweeps of a multilevel agglomerative acceptance-rejection pseudo-MCMC (i.e. detailed balance is not preserved) to sample network partitions, that uses a bisection search on the number of groups, together with group merges and singe-node moves.- Parameters:
- niter
int
(optional, default:1
) Number of sweeps to perform. During each sweep, a move attempt is made for each node, on average.
- beta
float
(optional, default:numpy.inf
) Inverse temperature.
- c
float
(optional, default:.5
) Sampling parameter
c
for move proposals: For \(c\to 0\) the blocks are sampled according to the local neighborhood of a given node and their block connections; for \(c\to\infty\) the blocks are sampled randomly. Note that only for \(c > 0\) the MCMC is guaranteed to be ergodic.- d
float
(optional, default:.01
) Probability of selecting a new (i.e. empty) group for a given single-node move.
- r
float
(optional, default:0.9
) Group shrink ratio. The number of groups is reduced by this fraction at each merge sweep.
- random_bisect
bool
(optional, default:True
) If
True
, bisections are done at randomly chosen intervals. Otherwise a Fibonacci sequence is used.- merge_sweeps
int
(optional, default:10
) Number of sweeps spent to find good merge proposals.
- mh_sweeps
int
(optional, default:10
) Number of single-node Metropolis-Hastings sweeps between merge splits.
- init_r
double
(optional, default:0.99
) Stopping criterion for the intialization phase, after each node is put in their own group, to set the initial upper bound of the bisection search. A number of single-node Metropolis-Hastings sweeps is done until the number of groups is shrunk by a factor that is larger than this parameter.
- init_min_iter
int
(optional, default:5
) Minimum number of iterations at the intialization phase.
- init_beta
float
(optional, default:1.
) Inverse temperature to be used for the very first sweep of the initialization phase.
- gibbs
bool
(optional, default:False
) If
True
, the single node moves use (slower) Gibbs sampling, rather than Metropolis-Hastings.- B_min
int
(optional, default:1
) Minimum number of groups to be considered in the search.
- b_min
VertexPropertyMap
(optional, default:None
) If provided, this will be used for the partition corresponding to
B_min
.- B_max
int
(optional, default:1
) Maximum number of groups to be considered in the search.
- b_max
VertexPropertyMap
(optional, default:None
) If provided, this will be used for the partition corresponding to
B_max
.- M
int
(optional, default:None
) Maximum number of groups to select for the multilevel move. If
None
is provided, then all groups are always elected.- cache_states
bool
(optional, default:True
) If
True
, intermediary states will be cached during the bisection search.- force_accept
bool
(optional, default:False
) If
True
, new state will be accepted even if it does not improve the objective function.- parallel
bool
(optional, default:False
) If
True
, the algorithm will run in parallel (if enabled during compilation).- entropy_args
dict
(optional, default:{}
) Entropy arguments, with the same meaning and defaults as in
graph_tool.inference.BlockState.entropy()
.- verbose
bool
(optional, default:False
) If
verbose == True
, detailed information will be displayed.
- niter
- Returns:
- dS
float
Entropy difference after the sweeps.
- nattempts
int
Number of vertex moves attempted.
- nmoves
int
Number of vertices moved.
- dS
Notes
This algorithm has an \(O(E\ln^2 N)\) complexity, where \(E\) is the number of edges and \(N\) is the number of nodes (independently of the number of groups).
References
[peixoto-efficient-2014]Tiago P. Peixoto, “Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models”, Phys. Rev. E 89, 012804 (2014), DOI: 10.1103/PhysRevE.89.012804 [sci-hub, @tor], arXiv: 1310.4378
- posterior_entropy(MLE=True)[source]#
Return the entropy of the random label model, using maximum likelihood estimates for the marginal node probabilities if
`MLE=True`
, otherwise using posterior mean estimates.
- posterior_lprob(r, b, MLE=True)[source]#
Return the log-probability of partition
b
belonging to moder
, using maximum likelihood estimates for the marginal node probabilities if`MLE=True`
, otherwise using posterior mean estimates.
- relabel(epsilon=1e-06, maxiter=100)[source]#
Attempt to align group labels between clusters via a greedy algorithm. The algorithm stops after
maxiter
iterations or when the entropy improvement lies belowepsilon
.
- replace_partitions()[source]#
For every cluster, removes and re-adds every partition, after relabelling, and returns the entropy difference (negative log probability).
- reset_entropy_args()#
Reset the current default values for the parameters of the function
entropy()
, together with other operations that depend on them.
- sample_nested_partition(MLE=True, fix_empty=True)[source]#
Sampled a cluster label and nested partition from the inferred model, using maximum likelihood estimates for the marginal node probabilities if
`MLE=True`
, otherwise using posterior mean estimates.
- sample_partition(MLE=True)[source]#
Sampled a cluster label and partition from the inferred model, using maximum likelihood estimates for the marginal node probabilities if
`MLE=True`
, otherwise using posterior mean estimates.