conference2026ICORE 2026 A*CORE 2023 A*CCF A

C2TC: A Training-Free Framework for Efficient Tabular Data Condensation

Sijia Xu, Fan Li, Xiaoyang Wang, Zhengyi Yang, Xuemin Lin

IEEE International Conference on Data Engineering (ICDE)

RAIDS Lab Authors

Zhengyi Yang

Director

Details

Year

2026

Venue

IEEE International Conference on Data Engineering (ICDE)

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Rankings

ICORE 2026 A* · CORE 2023 A* · CCF A

Research Area

Responsible Data Intelligence

Resources

Code ↗arXiv ↗

Abstract

Tabular data is the primary data format in industrial relational databases, underpinning modern data analytics and decision-making. However, the increasing scale of tabular data poses significant computational and storage challenges to learning-based analytical systems. This highlights the need for data-efficient learning, which enables effective model training and generalization using substantially fewer samples. Dataset condensation (DC) has emerged as a promising data-centric paradigm that synthesizes small yet informative datasets to preserve data utility while reducing storage and training costs. However, existing DC methods are computationally intensive due to reliance on complex gradient-based optimization. Moreover, they often overlook key characteristics of tabular data, such as heterogeneous features and class imbalance. To address these limitations, we introduce C2TC (Class-Adaptive Clustering for Tabular Condensation), the first training-free tabular dataset condensation framework that jointly optimizes class allocation and feature representation, enabling efficient and scalable condensation. Specifically, we reformulate the dataset condensation objective into a novel class-adaptive cluster allocation problem (CCAP), which eliminates costly training and integrates adaptive label allocation to handle class imbalance. To solve the NP-hard CCAP, we develop HFILS, a heuristic local search that alternates between soft allocation and class-wise clustering to efficiently obtain high-quality solutions. Moreover, a hybrid categorical feature encoding (HCFE) is proposed for semantics-preserving clustering of heterogeneous discrete attributes. Extensive experiments on 10 real-world datasets demonstrate that C2TC improves efficiency by at least 2 orders of magnitude over state-of-the-art baselines, while achieving superior downstream performance.

Author Affiliations

Sijia Xu

University of New South Wales

Fan Li

University of New South Wales

Xiaoyang Wang

University of New South Wales

Zhengyi Yang

University of New South Wales

Xuemin Lin

Shanghai Jiao Tong University

BibTeX

BibTeX has not been added for this publication yet.

C2TC: A Training-Free Framework for Efficient Tabular Data Condensation

RAIDS Lab Authors

Details

Research Area

Tags

Resources

Abstract

Author Affiliations

BibTeX