conference2026ICORE 2026 A*CORE 2023 A*CCF A

CMANNS: GPU-Accelerated Graph Index Construction for ANNS via Compute-Memory Disaggregation

Chengying Huan, Renjie Yao, Shaonan Ma, Rong Gu^*, Zhengyi Yang, Lizheng Chen, Zhibin Wang, Mingxing Zhang, Fang Xi, Guihai Chen, Chen Tian

ACM SIGMOD International Conference on Management of Data (SIGMOD)

RAIDS Lab Authors

Zhengyi Yang

Director

Details

Year

2026

Venue

ACM SIGMOD International Conference on Management of Data (SIGMOD)

Publisher

Association for Computing Machinery (ACM)

Rankings

ICORE 2026 A* · CORE 2023 A* · CCF A

Research Area

Scalable Data Systems

Resources

DOI ↗

Abstract

Graph-based approximate nearest neighbor search (ANNS) delivers state-of-the-art accuracy latency tradeoffs, yet index construction remains the bottleneck: fusing dense distance evaluation with irregular traversal or pruning collapses GPU throughput, and limited device memory forces costly data movement at scale. To address these problems, in this paper, we present CMANNS, a GPU-accelerated graph index construction framework that preserves the algorithmic rules of target graph (e.g., NSG and HNSW) and its query procedure. The core idea is compute-memory (CM) disaggregation: distance evaluation is reformulated as high-arithmetic-intensity GEMMs on Tensor Core accelerators with fused epilogues, while memory-intensive phases employ hot-set-aware on-chip locality (e.g., shared-memory staging, warp-cooperative gathers and scatters) to maximize effective bandwidth. To scale beyond the HBM capacity, we stream device-sized shards through a double-buffered pipeline and write back only compact adjacency. Data transfers and kernel execution overlap, so each shard completes in roughly the time of the slower step, keeping the GPU highly utilized even with irregular access. Across seven benchmarks, CMANNS reduces end-to-end index build time by up to 13.05x (vs. FAISS) and 2.20x (vs. FLASH), increases the cache hit rate by up to 58.7%, and preserves vector query latency and recall.

Author Affiliations

Chengying Huan

Nanjing University

Renjie Yao

Nanjing University

Shaonan Ma

Qiyuan Lab

Rong Gu

Nanjing University

Zhengyi Yang

University of New South Wales

Lizheng Chen

Nanjing University

Zhibin Wang

Nanjing University

Mingxing Zhang

Tsinghua University

Fang Xi

Qiyuan Lab

Guihai Chen

Nanjing University

Chen Tian

Nanjing University

BibTeX

@article{huan2026cmanns,
  title = {CMANNS: GPU-Accelerated Graph Index Construction for ANNS via Compute-Memory Disaggregation},
  author = {Huan, Chengying and Yao, Renjie and Ma, Shaonan and Gu, Rong and Yang, Zhengyi and Chen, Lizheng and Wang, Zhibin and Zhang, Mingxing and Xi, Fang and Chen, Guihai and Tian, Chen},
  year = {2026},
  issue_date = {June 2026},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {4},
  number = {3},
  url = {https://doi.org/10.1145/3802027},
  doi = {10.1145/3802027},
  journal = {Proc. ACM Manag. Data},
  month = may,
  articleno = {150},
  numpages = {27},
  keywords = {anns, graph index, gpu acceleration}
}

CMANNS: GPU-Accelerated Graph Index Construction for ANNS via Compute-Memory Disaggregation

RAIDS Lab Authors

Details

Research Area

Tags

Resources

Abstract

Author Affiliations

BibTeX