Image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decision trees, and Convolutional Neural Networks (CNN) have been widely used to perform this task. However, with the recent emergence of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), originally designed for natural language processing, their cross-domain applications, including in CV, are now being explored. In this paper, we investigate the capabilities of GPT-4o, a variant of the GPT model, for image classification on the Fashion-MNIST dataset. By using carefully designed prompts, we evaluate GPT-4o's performance and compare it with more traditional models. Our study offers insights into the cross-domain potential of GPT models, explores how prompt engineering can enhance GPT's performance on image classification tasks, and suggests new avenues for developing more flexible and adaptable multimodal LLM systems. The code can be found at https://github.com/Tanghaha1424/gpt-fashionmnist.

Author Affiliations

Jiaxuan Wu

University of California

Xushuo Tang

Euler AI

Zhengyi Yang

University of New South Wales

Kongzhang Hao

University of New South Wales

Longbin Lai

Alibaba

Yongfei Liu

Euler AI

BibTeX

@inproceedings{wu2024experimental,
  title = {An Experimental Evaluation of LLM on Image Classification},
  author = {Wu, Jiaxuan and Tang, Xushuo and Yang, Zhengyi and Hao, Kongzhang and Lai, Longbin and Liu, Yongfei},
  editor = {Chen, Tong and Cao, Yang and Nguyen, Quoc Viet Hung and Nguyen, Thanh Tam},
  booktitle = {Databases Theory and Applications},
  year = {2025},
  publisher = {Springer Nature Singapore},
  address = {Singapore},
  pages = {506--518},
  isbn = {978-981-96-1242-0}
}

An Experimental Evaluation of LLM on Image Classification

RAIDS Lab Authors

Details

Research Area

Tags

Resources

Abstract

Author Affiliations

BibTeX