K-Nearest Points to Origin¶
Finding the k closest points to a reference location is the geometric core of k-nearest neighbors, one of the most intuitive ML algorithms. This problem strips it down to Euclidean distance and a single reference point (the origin).
Problem¶
Given a list of 2D points, return the k points closest to the origin (0, 0) using Euclidean distance.
Input: a list of (x, y) coordinates and integer k. Output: the k closest points.
Approach¶
Sort-based, O(n log n). Compute each point's distance, sort, slice the first k. Simple, but you're paying to fully order all n points when you only need the top k.
Max-heap, O(n log k). Maintain a max-heap of size k. For each point, push it onto the heap; if the heap exceeds k, pop the farthest point. Since k is typically much smaller than n, this is significantly faster on large datasets, exactly the scenario you'd face when doing nearest-neighbor lookups over embeddings.
Implementation¶
Sort-based¶
def k_closest_sorted(points, k):
return sorted(points, key=lambda p: p[0] ** 2 + p[1] ** 2)[:k]
points = [[1, 3], [-2, 2], [4, 1], [0, 1], [-1, 3]]
print(k_closest_sorted(points, 3))
# [[0, 1], [-2, 2], [1, 3]]
Max-heap¶
import heapq
def k_closest_heap(points, k):
max_heap = []
for x, y in points:
dist = x ** 2 + y ** 2
heapq.heappush(max_heap, (-dist, [x, y]))
if len(max_heap) > k:
heapq.heappop(max_heap)
return [p for _, p in sorted(max_heap, key=lambda t: t[0], reverse=True)]
points = [[1, 3], [-2, 2], [4, 1], [0, 1], [-1, 3]]
print(k_closest_heap(points, 3))
# [[0, 1], [-2, 2], [1, 3]]
Complexity¶
| Approach | Time | Space |
|---|---|---|
| Sort | O(n log n) | O(1) in-place sort |
| Max-heap | O(n log k) | O(k) |
Takeaway¶
The heap approach is the right default when k << n. Libraries like FAISS go further with approximate methods, but the heap-based mental model is the foundation for any top-k retrieval.