← Publications
conference2026ICORE 2026 A*CORE 2023 A*CCF A

CMANNS: GPU-Accelerated Graph Index Construction for ANNS via Compute-Memory Disaggregation

Chengying Huan, Renjie Yao, Shaonan Ma, Rong Gu*, Zhengyi Yang, Lizheng Chen, Zhibin Wang, Mingxing Zhang, Fang Xi, Guihai Chen, Chen Tian

ACM SIGMOD International Conference on Management of Data (SIGMOD)

RAIDS Lab Authors

Details

Year
2026
Publisher
Association for Computing Machinery (ACM)
Rankings
ICORE 2026 A* · CORE 2023 A* · CCF A

Research Area

Scalable Data Systems

Tags

Resources

Abstract

Graph-based approximate nearest neighbor search (ANNS) delivers state-of-the-art accuracy latency tradeoffs, yet index construction remains the bottleneck: fusing dense distance evaluation with irregular traversal or pruning collapses GPU throughput, and limited device memory forces costly data movement at scale. To address these problems, in this paper, we present CMANNS, a GPU-accelerated graph index construction framework that preserves the algorithmic rules of target graph (e.g., NSG and HNSW) and its query procedure. The core idea is compute-memory (CM) disaggregation: distance evaluation is reformulated as high-arithmetic-intensity GEMMs on Tensor Core accelerators with fused epilogues, while memory-intensive phases employ hot-set-aware on-chip locality (e.g., shared-memory staging, warp-cooperative gathers and scatters) to maximize effective bandwidth. To scale beyond the HBM capacity, we stream device-sized shards through a double-buffered pipeline and write back only compact adjacency. Data transfers and kernel execution overlap, so each shard completes in roughly the time of the slower step, keeping the GPU highly utilized even with irregular access. Across seven benchmarks, CMANNS reduces end-to-end index build time by up to 13.05x (vs. FAISS) and 2.20x (vs. FLASH), increases the cache hit rate by up to 58.7%, and preserves vector query latency and recall.

Author Affiliations

Chengying Huan
Nanjing University
Renjie Yao
Nanjing University
Shaonan Ma
Qiyuan Lab
Rong Gu
Nanjing University
Zhengyi Yang
University of New South Wales
Lizheng Chen
Nanjing University
Zhibin Wang
Nanjing University
Mingxing Zhang
Tsinghua University
Fang Xi
Qiyuan Lab
Guihai Chen
Nanjing University
Chen Tian
Nanjing University

BibTeX

@article{huan2026cmanns,
  title = {CMANNS: GPU-Accelerated Graph Index Construction for ANNS via Compute-Memory Disaggregation},
  author = {Huan, Chengying and Yao, Renjie and Ma, Shaonan and Gu, Rong and Yang, Zhengyi and Chen, Lizheng and Wang, Zhibin and Zhang, Mingxing and Xi, Fang and Chen, Guihai and Tian, Chen},
  year = {2026},
  issue_date = {June 2026},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {4},
  number = {3},
  url = {https://doi.org/10.1145/3802027},
  doi = {10.1145/3802027},
  journal = {Proc. ACM Manag. Data},
  month = may,
  articleno = {150},
  numpages = {27},
  keywords = {anns, graph index, gpu acceleration}
}