# graph_tool.generation.generate_knn#

graph_tool.generation.generate_knn(points, k, dist=None, pairs=False, exact=False, local=True, r=0.5, max_rk=None, epsilon=0.001, directed=False, verbose=False)[source]#

Generate a graph of k-nearest neighbors (or pairs) from a set of multidimensional points.

Parameters:
pointsiterable of lists (or numpy.ndarray) of dimension $$N\times D$$ or int

Points of dimension $$D$$ to be considered. If the parameter dist is passed, this should be just an int containing the number of points.

kint

Number of nearest neighbors per node (or number of pairs if pairs is True).

distfunction (optional, default: None)

If given, this should be a function that returns the distance between two points. The arguments of this function should just be two integers, corresponding to the vertex index. In this case the value of points should just be the total number of points. If dist is None, then the L2-norm (Euclidean distance) is used.

pairsbool (optional, default: False)

If True, the k closest pairs of nodes will be returned, otherwise the k nearest neighbors for every edge is returned.

exactbool (optional, default: False)

If False, an fast approximation will be used, otherwise an exact but slow algorithm will be used.

localbool (optional, default: True)

If True, the local version of the algorithm will be used.

rfloat (optional, default: .5)

If exact is False, this specifies the fraction of randomly chosen neighbors that are used for the search.

max_rkint (optional, default: None)

If provided and exact is False and local is false, this specifies the maximum number of randomly chosen out neighbors to consider during each iteration. A value of None implies that all out neighbors are considered.

epsilonfloat (optional, default: .001)

If exact is False, this determines the convergence criterion used by the algorithm. When the fraction of updated neighbors drops below this value, the algorithm stops.

directedbool (optional, default: False)

If True a directed version of the graph will be returned, otherwise the graph is undirected.

Returns:
gGraph

The k-nearest neighbors graph.

wEdgePropertyMap

Edge property map with the computed distances.

Notes

The approximate version of this algorithm is based on [dong-efficient-2011], and has an (empirical) run-time of $$O(N^{1.14})$$. The exact version has a complexity of $$O(N^2)$$.

If pairs is True, the $$k$$ closest pairs are found from the nearest neighbors problem as described in [lenhof-k-closest].

If enabled during compilation, this algorithm runs in parallel.

References

Wei Dong, Charikar Moses, and Kai Li, “Efficient k-nearest neighbor graph construction for generic similarity measures”, In Proceedings of the 20th international conference on World wide web (WWW ‘11). Association for Computing Machinery, New York, NY, USA, 577–586, (2011) DOI: https://doi.org/10.1145/1963405.1963487 [sci-hub, @tor]

HP Lenhof, M Smid, “The k closest pairs problem”, https://people.scs.carleton.ca/~michiel/k-closestnote.pdf

Examples

>>> points = np.random.random((1000, 10))
>>> g, w = gt.generate_knn(points, k=5)