graph_tool.generation.generate_knn#
- graph_tool.generation.generate_knn(points, k, dist=None, pairs=False, exact=False, r=0.5, epsilon=0.001, directed=False, verbose=False, cache_dist=True)[source]#
Generate a graph of k-nearest neighbors (or pairs) from a set of multidimensional points.
- Parameters:
- pointsiterable of lists (or
numpy.ndarray
) of dimension \(N\times D\) orint
Points of dimension \(D\) to be considered. If the parameter dist is passed, this should be just an int containing the number of points.
- k
int
Number of nearest neighbors per node (or number of pairs if
pairs is True
).- distfunction (optional, default:
None
) If given, this should be a function that returns the distance between two points. The arguments of this function should just be two integers, corresponding to the vertex index. In this case the value of
points
should just be the total number of points. Ifdist is None
, then the L2-norm (Euclidean distance) is used.- pairs
bool
(optional, default:False
) If
True
, thek
closest pairs of nodes will be returned, otherwise thek
nearest neighbors for every edge is returned.- exact
bool
(optional, default:False
) If
False
, an fast approximation will be used, otherwise an exact but slow algorithm will be used.- r
float
(optional, default:.5
) If
exact is False
, this specifies the fraction of randomly chosen neighbors that are used for the search.- epsilon
float
(optional, default:.001
) If
exact is False
, this determines the convergence criterion used by the algorithm. When the fraction of updated neighbors drops below this value, the algorithm stops.- directed
bool
(optional, default:False
) If
True
a directed version of the graph will be returned, otherwise the graph is undirected.- cache_dist
bool
(optional, default:True
) If
True
, an internal cache of the distance values are kept, implemented as a hash table.
- pointsiterable of lists (or
- Returns:
- g
Graph
The k-nearest neighbors graph.
- w
EdgePropertyMap
Edge property map with the computed distances.
- g
Notes
The approximate version of this algorithm is based on [dong-efficient-2011], and has an (empirical) run-time of \(O(N^{1.14})\). The exact version has a complexity of \(O(N^2)\).
If
pairs is True
, the \(k\) closest pairs are found from the nearest neighbors problem as described in [lenhof-k-closest].If enabled during compilation, this algorithm runs in parallel.
References
[dong-efficient-2011]Wei Dong, Charikar Moses, and Kai Li, “Efficient k-nearest neighbor graph construction for generic similarity measures”, In Proceedings of the 20th international conference on World wide web (WWW ‘11). Association for Computing Machinery, New York, NY, USA, 577–586, (2011) DOI: https://doi.org/10.1145/1963405.1963487 [sci-hub, @tor]
[lenhof-k-closest]HP Lenhof, M Smid, “The k closest pairs problem”, https://people.scs.carleton.ca/~michiel/k-closestnote.pdf
Examples
>>> points = np.random.random((1000, 10)) >>> g, w = gt.generate_knn(points, k=5)