← Publications
preprint2026

HyperSU: Corpus-Driven Semantic-Unit Hypergraph for Retrieval-Augmented Generation

Jiate Liu#, Liuyi Chen#, Zhengyi Yang, Chuan He, Mingchen Ju, Bocheng Han, Ruyi Liu, Xu Zhou

arXiv

RAIDS Lab Authors

Details

Year
2026
Venue

Research Area

Responsible Data IntelligenceScalable Data Systems

Tags

Resources

Abstract

Recent Hypergraph-based retrieval-augmented generation (HyperRAG) methods use hyperedges to connect multiple entities simultaneously, enabling more efficient multi-entity evidence organization than pairwise graph structures. However, existing HyperRAG methods often rely on LLM-generated summaries to construct hyperedges, which can introduce hallucinations while also incurring high indexing costs. In addition, during retrieval, existing methods typically rely on either one-hop neighbor expansion or PageRank diffusion. The former may miss useful multi-hop evidence, while the latter can suffer from uncontrolled propagation over excessive hub nodes, leading to semantic drift and noisy reasoning chains. To address these challenges, we propose HyperSU, a novel hypergraph-based RAG framework featuring semantic-unit hyperedges and clue-guided bidirectional retrieval. During construction, HyperSU formulates hyperedge construction as an entity-aware minimum-description-length (MDL) optimization problem, inducing source-grounded semantic-unit hyperedges that balance sentence-level semantic coherence and entity compactness. It then constructs a hypergraph by modeling each semantic unit as a hyperedge over its co-mentioned entities. During retrieval, HyperSU performs clue-guided bidirectional expansion over the semantic-unit hypergraph, enabling both multi-hop evidence discovery and answer-aware noise reduction. Experiments show that HyperSU consistently improves answer accuracy over standard, graph-based, and hypergraph-based RAG baselines, achieving up to a 14.7% relative accuracy improvement on GraphRAG-Bench, with larger gains on reasoning-intensive tasks.

Author Affiliations

Jiate Liu
University of New South Wales
Liuyi Chen
Hunan University
Zhengyi Yang
University of Sydney
Chuan He
University of New South Wales
Mingchen Ju
Euler AI
Bocheng Han
Vecton AI
Ruyi Liu
University of New South Wales
Xu Zhou
Hunan University

BibTeX

@misc{liu2026hypersu,
  title = {HyperSU: Corpus-Driven Semantic-Unit Hypergraph for Retrieval-Augmented Generation},
  author = {Jiate Liu and Liuyi Chen and Zhengyi Yang and Chuan He and Mingchen Ju and Bocheng Han and Ruyi Liu and Xu Zhou},
  year = {2026},
  eprint = {2606.28351},
  archivePrefix = {arXiv},
  primaryClass = {cs.IR},
  url = {https://arxiv.org/abs/2606.28351}
}