kNN Join for Dynamic High-Dimensional Data: A Parallel Approach
Nimish Ukey, Zhengyi Yang*, Wenke Yang, Binghao Li, Runze Li
RAIDS Lab Authors
Details
Research Area
Tags
Resources
Abstract
The k nearest neighbor (kNN) join operation is a fundamental task that combines two high-dimensional databases, enabling data points in the User dataset U to identify their k nearest neighbor points from the Item dataset I. This operation plays a crucial role in various domains, including knowledge discovery, data mining, similarity search applications, and scientific research. However, exact kNN search in high-dimensional spaces is computationally demanding, and existing sequential methods face challenges in handling large datasets. In this paper, we propose an efficient parallel solution for dynamic kNN join over high-dimensional data, leveraging the high-dimensional R tree (HDR Tree) for improved efficiency. Our solution harnesses the power of Simultaneous Multi-Threading (SMT) technologies and Single-Instruction-Multiple-Data (SIMD) instructions in modern CPUs for parallelisation. Importantly, our research is the first to introduce parallel computation for exact kNN join over high-dimensional data. Experimental results demonstrate that our proposed approach outperforms the sequential HDR Tree method by up to 1.2 times with a single thread. Moreover, our solution provides near-linear scalability as the number of threads increases.
Author Affiliations
BibTeX
@inproceedings{ukey2023knn,
title = {kNN Join for Dynamic High-Dimensional Data: A Parallel Approach},
author = {Ukey, Nimish and Yang, Zhengyi and Yang, Wenke and Li, Binghao and Li, Runze},
editor = {Bao, Zhifeng and Borovica-Gajic, Renata and Qiu, Ruihong and Choudhury, Farhana and Yang, Zhengyi},
booktitle = {Databases Theory and Applications},
year = {2024},
publisher = {Springer Nature Switzerland},
address = {Cham},
pages = {3--16},
isbn = {978-3-031-47843-7}
}
