Probe by Gaming: A Game-based Benchmark for Assessing Conceptual Knowledge in LLMs

Beijing Normal University, Australian National University, Beijing 101 Education Group
✉️Correspondence to fangweizhong@bnu.edu.cn

Abstract

Concepts represent generalized abstractions that enable humans to categorize and reason efffciently, yet it is unclear to what extent Large Language Models (LLMs) comprehend these semantic relationships. Existing benchmarks typically focus on factual recall and isolated tasks, failing to evaluate the ability of LLMs to understand conceptual boundaries. To address this gap, we introduce CK-Arena, a multi-agent interaction game built upon the Undercover game, designed to evaluate the capacity of LLMs to reason with concepts in interactive settings. CK-Arena challenges models to describe, differentiate, and infer conceptual boundaries based on partial information, encouraging models to explore commonalities and distinctions between closely related concepts. By simulating real-world interaction, CK-Arena provides a scalable and realistic benchmark for assessing conceptual reasoning in dynamic environments. Experimental results show that LLMs' understanding of conceptual knowledge varies signiffcantly across different categories and is not strictly aligned with parameter size or general model capabilities.

CK-Arena Demo: Undercover Game

Below is an interactive demonstration of the Undercover game used in CK-Arena. In this game, LLM agents are assigned either the main concept ("bee") or an undercover concept ("butterfly"). Players take turns making statements about their concept without revealing it directly. The goal for the civilians is to identify and eliminate the undercover agents through voting, while undercover agents try to blend in without being detected.

BibTeX

@article{xu2025probe,
  author    = {Xu, Shuhang and Deng, Weijian and Zhou, Yixuan and Zhong, Fangwei},
  title     = {Probe by Gaming: A Game-based Benchmark for Assessing Conceptual Knowledge in LLMs},
  year      = {2025},
}