The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding their capacity to replicate or surpass human creativity, yet systematic comparisons of their divergent thinking abilities remain scarce. This study addresses this gap by conducting a large-scale evaluation of divergent creativity, analyzing responses from 9,198 humans and 215,542 observations across various LLMs using the Divergent Association Task (DAT). Our results indicate that while the average creativity scores of humans and AI are comparable, humans exhibit significantly greater variability and linguistic diversity, with top-tier human performers consistently outscoring even the most advanced models like GPT-4 and DeepSeek. The research further reveals critical limitations in current AI capabilities; specifically, prompting strategies designed to simulate "genius" personas or specific demographics failed to reliably enhance creativity and often produced results contrary to empirical human patterns. Moreover, attempts to increase model creativity by adjusting temperature parameters resulted in a trade-off between novelty and semantic coherence, eventually leading to nonsensical outputs. These findings suggest that LLMs currently function best as tools for augmentation rather than replacement, as they lack the unique, experiential cognition that drives expert-level human innovation. This study refines our understanding of the distinct boundaries between biological and artificial creativity, underscoring the necessity of human-in-the-loop approaches for high-stakes creative problem-solving.
Publication:
Nature Human Behaviour
https://doi.org/10.1038/s41562-025-02331-1
Author:
Difang Huang
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
附件下载: