Python 實現 KD-Tree 最近鄰算法

原創

2019-03-21 13:40

這裏將寫了一個KDTree類，僅實現了最近鄰，K近鄰之後若有時間再更新：

from collections import namedtuple
from operator import itemgetter
from pprint import pformat
import numpy as np


class Node(namedtuple('Node', 'location left_child right_child')):
    def __repr__(self):
        return pformat(tuple(self))


class KDTree():
    def __init__(self, points):
        self.tree = self._make_kdtree(points)
        if len(points) > 0:
            self.k = len(points[0])
        else:
            self.k = None

    def _make_kdtree(self, points, depth=0):
        if not points:
            return None

        k = len(points[0])
        axis = depth % k

        points.sort(key=itemgetter(axis))
        median = len(points) // 2

        return Node(
            location=points[median],
            left_child=self._make_kdtree(points[:median], depth + 1),
            right_child=self._make_kdtree(points[median + 1:], depth + 1))

    def find_nearest(self,
                     point,
                     root=None,
                     axis=0,
                     dist_func=lambda x, y: np.linalg.norm(x - y)):

        if root is None:
            root = self.tree
            self._best = None

        # 若不是葉節點，則繼續向下走
        if root.left_child or root.right_child:
            new_axis = (axis + 1) % self.k
            if point[axis] < root.location[axis] and root.left_child:
                self.find_nearest(point, root.left_child, new_axis)
            elif root.right_child:
                self.find_nearest(point, root.right_child, new_axis)

        # 回溯：嘗試更新 best
        dist = dist_func(root.location, point)
        if self._best is None or dist < self._best[0]:
            self._best = (dist, root.location)

        # 若超球與另一邊超矩形相交
        if abs(point[axis] - root.location[axis]) < self._best[0]:
            new_axis = (axis + 1) % self.k
            if root.left_child and point[axis] >= root.location[axis]:
                self.find_nearest(point, root.left_child, new_axis)
            elif root.right_child and point[axis] < root.location[axis]:
                self.find_nearest(point, root.right_child, new_axis)

        return self._best

測試：

point_list = [(2, 3, 3), (5, 4, 4), (9, 6, 7), (4, 7, 7), (8, 1, 1), (7, 2, 2)]
kdtree = KDTree(point_list)

point = np.array([5, 5, 5])
print(kdtree.find_nearest(point))

輸出：

(1.4142135623730951, (5, 4, 4))

與 Scikit-Learn 性能對比（上是我的實現，下是 Scikit-Learn 的實現）：

可以看到僅相差 1 毫秒，所以性能說得過去。

（本文完）

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Python 實現 KD-Tree 最近鄰算法

AI模型 Llama 3體驗筆記

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

Address Sanitizer 簡介

[LeetCode] 單向鏈表常用操作

雜記：騰訊暑期實習 Web 後端開發面試經歷

雜記：Python 兩坑

騰訊暑期實習 Web 後端開發面試經歷

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結