generate_knn#
- graph_tool.generation.generate_knn(points, k, dist=None, pairs=False, exact=False, r=0.5, max_rk=None, epsilon=0.001, c_stop=False, max_iter=0, directed=False, verbose=False)[source]#
Generate a graph of k-nearest neighbors (or pairs) from a set of multidimensional points.
- Parameters:
- pointsiterable of lists (or
numpy.ndarray
) of dimension \(N\times D\) orint
Points of dimension \(D\) to be considered. If the parameter dist is passed, this should be just an int containing the number of points.
- k
int
Number of nearest neighbors per node (or number of pairs if
pairs is True
).- distfunction (optional, default:
None
) If given, this should be a function that returns the distance between two points. The arguments of this function should just be two integers, corresponding to the vertex index. In this case the value of
points
should just be the total number of points. Ifdist is None
, then the L2-norm (Euclidean distance) is used.- pairs
bool
(optional, default:False
) If
True
, thek
closest pairs of nodes will be returned, otherwise thek
nearest neighbors for every edge is returned.- exact
bool
(optional, default:False
) If
False
, an fast approximation will be used, otherwise an exact but slow algorithm will be used.- r
float
(optional, default:.5
) If
exact is False
, this specifies the fraction of randomly chosen neighbors that are used for the search.- max_rk
int
(optional, default:None
) If provided and
exact is False
, this specifies the maximum number of randomly chosen out neighbors to consider during each iteration. A value ofNone
implies that all out neighbors are considered.- epsilon
float
(optional, default:.001
) If
exact is False
andc_stop is False
, this determines the convergence criterion used by the algorithm. When the fraction of updated neighbors drops below this value, the algorithm stops.- c_stop
bool
(optional, default:False
) If
True
, an alternative stopping criterion will be used: The iteration ends when the global clustering coefficient of the undirected KNN graph stopped increasing. In this case, the paramterepsilon
is ignored.- max_iter
int
(optional, default:0
) If
exact is False
, this determines the maximum number of iterations allowed. A value of0
means that no limit is imposed.- directed
bool
(optional, default:False
) If
True
a directed version of the graph will be returned, otherwise the graph is undirected.
- pointsiterable of lists (or
- Returns:
- g
Graph
The k-nearest neighbors graph.
- w
EdgePropertyMap
Edge property map with the computed distances.
- g
Notes
The approximate version of this algorithm is based on [dong-efficient-2011], and has a (conjectured) run-time of \(O(k^2N\log N)\), where \(N\) is the number of points. The exact version has a complexity of \(O(N^2)\).
If
pairs is True
, the \(k\) closest pairs are found from the nearest neighbors problem as described in [lenhof-k-closest], which has a complexity upper bounded byParallel implementation.
If enabled during compilation, this algorithm will run in parallel using OpenMP. See the parallel algorithms section for information about how to control several aspects of parallelization.
References
[dong-efficient-2011]Wei Dong, Charikar Moses, and Kai Li, “Efficient k-nearest neighbor graph construction for generic similarity measures”, In Proceedings of the 20th international conference on World wide web (WWW ‘11). Association for Computing Machinery, New York, NY, USA, 577–586, (2011) DOI: https://doi.org/10.1145/1963405.1963487 [sci-hub, @tor]
[lenhof-k-closest]HP Lenhof, M Smid, “The k closest pairs problem”, https://people.scs.carleton.ca/~michiel/k-closestnote.pdf
Examples
>>> points = np.random.random((1000, 10)) >>> g, w = gt.generate_knn(points, k=5)