Network reconstruction from dynamics and behavior#
In some cases, direct measurements of the edges of a network are either
impossible, or can be done only at significant experimental cost. In such
situations, we need to infer the network of interactions from the observed
functional behavior [peixoto-network-2024]. In graph-tool
this can be done
for a variety of static and dynamical models, where the data \(\boldsymbol
X\) is a \(N\times M\) matrix containing \(M\) samples (or “features”)
for each node.
For static models we have that the likelihood is given by
where each sample is assumed to be i.i.d., whereas for dynamic models \(\boldsymbol X\) is assumed to contain the states of a Markov chain with likelihood
In both cases above, \(\boldsymbol W\) represents the weighted adjacency matrix and \(\boldsymbol \theta\) are additional node parameters.
The inference procedure consists in the maximum a posteriori (MAP) estimation:
where the prior \(P(\boldsymbol W,\boldsymbol\theta)\) implements regularization.
In graph-tool
, a variety of models are implemented, as given by the following table:
Name |
State |
Data domain |
Kind |
---|---|---|---|
\(\mathbb{R}\) |
Static |
||
\(\mathbb{R}\) |
Dynamic |
||
\(\mathbb{R}^+\) |
Dynamic |
||
\(\{0, 1\}\) |
Dynamic |
||
\(\{-1, +1\}\) |
Static |
||
\(\{-1, +1\}\) |
Dynamic |
||
Continuous Ising model |
\([-1,1]\) |
Static |
|
Continuous kinetic Ising model |
\([-1,1]\) |
Dynamic |
Reconstruction with synthetic data#
Here we demonstrate how reconstruction is performed with a simple example, based
on the simulation of the Kinetic Ising model on the dolphins network (we can use the
IsingGlauberState
class for this purpose):
Note
g = gt.collection.data["dolphins"]
E = g.num_edges()
N = g.num_vertices()
w = g.new_ep("double", vals=np.random.normal(N/(2*E), .05, E)) # True edge weights
istate = gt.IsingGlauberState(g, w=w)
M = 1000
X = []
for m in range(M):
istate.iterate_sync()
X.append(istate.get_state().a.copy())
X = np.array(X).T
At this point the array X
contains our simulated data. We will now
reconstruct the original network from it using a nonparametric approach, which
consists of initializing an IsingGlauberBlockState
object, and iterating over
mcmc_sweep()
until sufficient
convergence is reached:
Note
When doing reconstruction, the default for
mcmc_sweep()
is
beta=numpy.inf
, meaning that this function performs a greedy optimization,
not actual samples from a reversible Markov chain.
state = gt.IsingGlauberBlockState(X)
for i in range(10):
delta, *_ = state.mcmc_sweep(niter=10)
print(delta)
This outputs the following deltas, indicating that convergence was achieved.
-4374.171741316452
-3430.442095268032
-904.7176098519118
-190.28064153057187
-26.28956787113603
-1.9477590574467705
-0.4964012530509585
-0.0012287004910120913
-1.295718874416707e-07
-8.93862761586206e-12
We can compare the reconstructed network using the similarity()
function, as follows
u = state.get_graph() # reconstructed network
w_r = state.get_x() # reconstructed weights
t_r = state.get_theta() # reconstructed thetas
print(gt.similarity(g, u, w, w_r))
which gives:
0.907831...
We can visualize the reconstruction with:
cnorm = matplotlib.colors.Normalize(vmin=-abs(w.fa).max(), vmax=abs(w.fa).max())
# Original graph
gt.graph_draw(g, g.vp.pos, edge_pen_width=gt.prop_to_size(w.t(abs), 2, 8, power=1), edge_color=w,
ecmap=matplotlib.cm.coolwarm_r, ecnorm=cnorm, output="dolphins-orig.svg")
# Inferred graph
gt.graph_draw(u, g.vp.pos, edge_pen_width=gt.prop_to_size(w_r.t(abs), 2, 8, power=1), edge_color=w_r,
ecmap=matplotlib.cm.coolwarm_r, ecnorm=cnorm, output="dolphins-inf.svg")
# Inferred graph with SBM partition
state.bstate.draw(pos=g.vp.pos, edge_gradient=[],
edge_pen_width=gt.prop_to_size(w_r.t(abs), 2, 8, power=1),
edge_color=w_r, ecmap=matplotlib.cm.coolwarm_r, ecnorm=cnorm,
output="dolphins-inf-sbm.svg")
which outputs the networks below:
True network |
Reconstructed |
Reconstructed with SBM |
Note
The algorithm implemented is the one described in [peixoto-scalable-2024] combined with the MDL regularization based on weight quantization from [peixoto-network-2024], which includes also a SBM as part of the prior for the latent edges.
The algorithm runs in time \(O(N\log^2 N)\) in the typical case, where \(N\) is the number of nodes, and if enabled during compilation will run in parallel using OpenMP, and therefore can be employed to reconstruct large networks.
\(L_1\) regularization#
The default reconstruction methods employs a nonparametric regularization scheme [peixoto-network-2024], but optionally \(L_1\) can be used instead. This is done by initializing the reconstruction state as
lstate = gt.IsingGlauberBlockState(X, disable_xdist=True, disable_tdist=True)
This method requires a sparsity parameter \(\lambda\) to be set, e.g. the following sets \(\lambda=200\),
lstate.update_entropy_args(xl1=200)
and then we can proceed in the same way,
for i in range(10):
delta, *_ = lstate.mcmc_sweep(niter=10)
which produces the network:
u = lstate.get_graph()
w_r = lstate.get_x()
gt.graph_draw(u, g.vp.pos, edge_pen_width=gt.prop_to_size(w_r.t(abs), 2, 8, power=1), edge_color=w_r,
ecmap=matplotlib.cm.coolwarm_r, ecnorm=cnorm, output="dolphins-inf-l1.svg")
Caution
The \(L_1\) regularization method is faster than MDL, but the most appropriate value of \(\lambda\) is tricky to find, usually requiring cross-validation, which in the end makes it slower than MDL. It also introduces a shrinkage bias and tends to overfit when compared to MDL, which in general should be preferred. See [peixoto-network-2024] for details.
Reconstruction with empirical data#
As an example of how to perform reconstructions on empirical data, we will consider the log-returns of the stock prices of the S&P 500 companies containing the 500 largest US companies.
We will begin by fetching the data and computing the log-returns:
import requests
from collections import defaultdict
import csv
r = requests.get('https://raw.githubusercontent.com/plotly/datasets/master/all_stocks_5yr.csv')
lines = r.iter_lines(decode_unicode=True)
next(lines) # remove header
prices = defaultdict(list)
for vals in csv.reader(lines, delimiter=','):
prices[vals[-1]].append(float(vals[4])) # use closing price
stocks, s = zip(*[(stock, vals) for stock, vals in prices.items() if len(vals) == 1259])
# compute log-returns
s = [[log(p[i+1]) - log(p[i]) for i in range(len(p)-1)] for p in s]
s = array(s)
We will use a multivariate normal model via PseudoNormalBlockState
:
Tip
For some models/data it may be better to first do a sweep over the edges before any other latent variable. This happens to be true for this case!
state = gt.PseudoNormalBlockState(s)
# Do an initial sweep over the edges before any other latent variable.
ret = state.edge_mcmc_sweep(niter=10)
delta = abs(ret[0])
while delta > 20:
ret = state.mcmc_sweep(niter=10)
delta = abs(ret[0])
We can now visualize the reconstructed network:
bstate = state.get_block_state().levels[0]
pos = gt.sfdp_layout(state.get_graph(), groups=bstate.get_blocks(), gamma=.02)
bstate.draw(pos=pos, vertex_color="white", edge_gradient=[],
edge_pen_width=gt.prop_to_size(state.get_x().t(abs), .1, 3, power=1),
output="SP500.svg")
References#
Tiago P. Peixoto, “Network reconstruction and community detection from dynamics”, Phys. Rev. Lett. 123 128301 (2019), DOI: 10.1103/PhysRevLett.123.128301 [sci-hub, @tor], arXiv: 1903.10833
Tiago P. Peixoto, “Scalable network reconstruction in subquadratic time”, arXiv: 2401.01404
Tiago P. Peixoto, “Network reconstruction via the minimum description length principle”, arXiv: 2405.01015