An Experimental Evaluation of LLM on Image Classification
Jiaxuan Wu, Xushuo Tang, Zhengyi Yang*, Kongzhang Hao, Longbin Lai, Yongfei Liu
RAIDS Lab Authors
Details
Research Area
Tags
Resources
Abstract
Image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decision trees, and Convolutional Neural Networks (CNN) have been widely used to perform this task. However, with the recent emergence of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), originally designed for natural language processing, their cross-domain applications, including in CV, are now being explored. In this paper, we investigate the capabilities of GPT-4o, a variant of the GPT model, for image classification on the Fashion-MNIST dataset. By using carefully designed prompts, we evaluate GPT-4o's performance and compare it with more traditional models. Our study offers insights into the cross-domain potential of GPT models, explores how prompt engineering can enhance GPT's performance on image classification tasks, and suggests new avenues for developing more flexible and adaptable multimodal LLM systems. The code can be found at https://github.com/Tanghaha1424/gpt-fashionmnist.
Author Affiliations
BibTeX
@inproceedings{wu2024experimental,
title = {An Experimental Evaluation of LLM on Image Classification},
author = {Wu, Jiaxuan and Tang, Xushuo and Yang, Zhengyi and Hao, Kongzhang and Lai, Longbin and Liu, Yongfei},
editor = {Chen, Tong and Cao, Yang and Nguyen, Quoc Viet Hung and Nguyen, Thanh Tam},
booktitle = {Databases Theory and Applications},
year = {2025},
publisher = {Springer Nature Singapore},
address = {Singapore},
pages = {506--518},
isbn = {978-981-96-1242-0}
}
