An Experimental Study of Graph Pattern Mining Systems
Yi Ding#, Yijie Zhao#, Wantong Zhang, Zhengyi Yang*, Wenke Yang, Dong Wen, Xiaoyang Wang
RAIDS Lab Authors
Details
Research Area
Tags
Resources
Abstract
Graph pattern mining (GPM) extracts subgraph structures (e.g., motifs and cliques) from large graphs, but traditional vertex-centric graph computing frameworks struggle with the explosive growth of intermediate embeddings and redundant isomorphism checks. Building on the first GPM system, Arabesque, this thesis presents a systematic survey and an empirical study of modern GPM systems. We then conduct a comparative evaluation across six real-world graphs (CiteSeer, Mico, YouTube, LiveJournal, Orkut, and Twitter20) and three representative mining tasks: Motif Counting (MC), Clique Finding (CF), and Triangle Counting (TC) on large graphs. The experiments measure runtime together with CPU and memory usage. Results show clear specialization across systems: Peregrine is the most efficient for MC on small to large graphs; Sandslash (and, to a lesser extent, Pangolin) is strongest on CF and is the only tested system to finish 4-clique finding on the billion-edge Twitter20 graph, and Arya delivers state-of-the-art TC on large graphs, completing Orkut and Twitter20 in fractions of a second, far faster than CPU-centric baselines. Overall, pattern-aware exploration, decomposition (with or without sampling), and heterogeneous execution are the dominant factors behind scalable performance, while exhaustive embedding enumeration remains the primary bottleneck. These findings provide a unified view of the GPM systems landscape and practical guidance for building the next generation of scalable graph mining systems.
Author Affiliations
BibTeX
@inproceedings{ding2025experimental,
title = {An Experimental Study of Graph Pattern Mining Systems},
author = {Ding, Yi and Zhao, Yijie and Zhang, Wantong and Yang, Zhengyi and Yang, Wenke and Wen, Dong and Wang, Xiaoyang},
editor = {Borovica-Gajic, Renata and Khan, Arijit and Zheng, Bolong and Wang, Xiaoyang and Gan, Junhao},
booktitle = {Databases Theory and Applications},
year = {2026},
publisher = {Springer Nature Singapore},
address = {Singapore},
pages = {49--63},
isbn = {978-981-95-6196-4}
}
