← Publications
journal2023CCF BSJR Q1

Efficient continuous kNN join over dynamic high-dimensional data

Nimish Ukey#, Guangjian Zhang#, Zhengyi Yang*, Binghao Li, Wei Li, Wenjie Zhang

World Wide Web (WWWJ)

RAIDS Lab Authors

Details

Year
2023
Publisher
Springer
Rankings
CCF B · SJR Q1

Research Area

Scalable Data Systems

Tags

Resources

Abstract

Given a user dataset U and an object dataset I, a kNN join query in high-dimensional space returns the k nearest neighbors of each object in dataset U from the object dataset I. The kNN join is a basic and necessary operation in many applications, such as databases, data mining, computer vision, multi-media, machine learning, recommendation systems, and many more. In the real world, datasets frequently update dynamically as objects are added or removed. In this paper, we propose novel methods of continuous kNN join over dynamic high-dimensional data. We firstly propose the HDR+ Tree, which supports more efficient insertion, deletion, and batch update. Further observed that the existing methods rely on globally correlated datasets for effective dimensionality reduction, we then propose the HDR Forest. It clusters the dataset and constructs multiple HDR Trees to capture local correlations among the data. As a result, our HDR Forest is able to process non-globally correlated datasets efficiently. Two novel optimisations are applied to the proposed HDR Forest, including the precomputation of the PCA states of data items and pruning-based kNN recomputation during item deletion. For the completeness of the work, we also present the proof of computing distances in reduced dimensions of PCA in HDR Tree. Extensive experiments on real-world datasets show that the proposed methods and optimisations outperform the baseline algorithms of naive RkNN join and HDR Tree.

Author Affiliations

Nimish Ukey
University of New South Wales
Guangjian Zhang
University of New South Wales
Zhengyi Yang
University of New South Wales
Binghao Li
University of New South Wales
Wei Li
Harbin Engineering University
Wenjie Zhang
University of New South Wales

BibTeX

@article{ukey2023efficient,
  title = {Efficient continuous kNN join over dynamic high-dimensional data},
  author = {Ukey, Nimish and Zhang, Guangjian and Yang, Zhengyi and Li, Binghao and Li, Wei and Zhang, Wenjie},
  journal = {World Wide Web},
  year = {2023},
  month = {Nov},
  day = {01},
  volume = {26},
  number = {6},
  pages = {3759--3794},
  issn = {1573-1413},
  doi = {10.1007/s11280-023-01204-9},
  url = {https://doi.org/10.1007/s11280-023-01204-9}
}