Human-like object concept representations emerge naturally in multimodal large language models

Biederman, I. Recognition-by-components: a theory of human image understanding. Psych. Rev. 94, 115–147 (1987).
Google Scholar
Edelman, S. Representation is representation of similarities. Behav. Brain Sci. 21, 449–467 (1998).
Google Scholar
Nosofsky, R. M. Attention, similarity, and the identification–categorization relationship. J. Exp. Psychol. Gen. 115, 39–61 (1986).
Google Scholar
Goldstone, R. L. The role of similarity in categorization: providing a groundwork. Cognition 52, 125–157 (1994).
Google Scholar
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M. & Boyes-Braem, P. Basic objects in natural categories. Cogn. Psychol. 8, 382–439 (1976).
Google Scholar
Mahon, B. Z. & Caramazza, A. Concepts and categories: a cognitive neuropsychological perspective. Annu. Rev. Psychol. 60, 27–51 (2009).
Google Scholar
Rogers, T. T. & McClelland, J. L. Semantic Cognition: A Parallel Distributed Processing Approach (MIT Press, 2004).
Shepard, R. N. Toward a universal law of generalization for psychological science. Science 237, 1317–1323 (1987).
Google Scholar
Battleday, R. M., Peterson, J. C. & Griffiths, T. L. Capturing human categorization of natural images by combining deep networks and cognitive models. Nat. Commun. 11, 5418 (2020).
Google Scholar
Jagadeesh, A. V. & Gardner, J. L. Texture-like representation of objects in human visual cortex. Proc. Natl Acad. Sci. USA 119, e2115302119 (2022).
Google Scholar
Grand, G., Blank, I. A., Pereira, F. & Fedorenko, E. Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nat. Hum. Behav. 6, 975–987 (2022).
Google Scholar
Connolly, A. C. et al. The representation of biological classes in the human brain. J. Neurosci. 32, 2608–2618 (2012).
Google Scholar
Downing, P. E., Chan, A.-Y., Peelen, M. V., Dodds, C. & Kanwisher, N. Domain specificity in visual cortex. Cereb. Cortex 16, 1453–1461 (2006).
Google Scholar
Kriegeskorte, N. et al. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60, 1126–1141 (2008).
Google Scholar
Caramazza, A. & Shelton, J. R. Domain-specific knowledge systems in the brain: the animate-inanimate distinction. J. Cogn. Neurosci. 10, 1–34 (1998).
Google Scholar
Hebart, M. N., Zheng, C. Y., Pereira, F. & Baker, C. I. Revealing the multidimensional mental representations of natural objects underlying human similarity judgements. Nat. Hum. Behav. 4, 1173–1185 (2020).
Google Scholar
Hebart, M. N. et al. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife 12, e82580 (2023).
Google Scholar
Konkle, T. & Oliva, A. A real-world size organization of object responses in occipitotemporal cortex. Neuron 74, 1114–1124 (2012).
Google Scholar
Konkle, T. & Oliva, A. Canonical visual size for real-world objects. J. Exp. Psychol. 37, 23–37 (2011).
Bowers, J. S. et al. Deep problems with neural network models of human vision. Behav. Brain Sci. 46, e385 (2023).
Google Scholar
Hermann, K., Nayebi, A., van Steenkiste, S. & Jones, M. For human-like models, train on human-like tasks. Behav. Brain Sci. 46, e394 (2023).
Google Scholar
Jha, A., Peterson, J. C. & Griffiths, T. L. Extracting low-dimensional psychological representations from convolutional neural networks. Cogn. Sci. 47, e13226 (2023).
Google Scholar
Nadler, E. O. et al. Divergences in color perception between deep neural networks and humans. Cognition 241, 105621 (2023).
Google Scholar
Cohen, U., Chung, S., Lee, D. D. & Sompolinsky, H. Separability and geometry of object manifolds in deep neural networks. Nat. Commun. 11, 746 (2020).
Google Scholar
Dobs, K., Martinez, J., Kell, A. J. & Kanwisher, N. Brain-like functional specialization emerges spontaneously in deep neural networks. Sci. Adv. 8, eabl8913 (2022).
Google Scholar
Mahner, F. P., Muttenthaler, L., Güçlü, U. & Hebart, M. N. Dimensions underlying the representational alignment of deep neural networks with humans. Preprint at https://arxiv.org/abs/2406.19087 (2024).
Jacob, G., Pramod, R., Katti, H. & Arun, S. Qualitative similarities and differences in visual object representations between brains and deep networks. Nat. Commun. 12, 1872 (2021).
Google Scholar
Goldstein, A. et al. Shared computational principles for language processing in humans and deep language models. Nat. Neurosci. 25, 369–380 (2022).
Google Scholar
Muttenthaler, L. & Hebart, M. N. Interpretable object dimensions in deep neural networks and their similarities to human representations. J. Vis. 22, 4516 (2022).
Google Scholar
Saxe, A., Nelli, S. & Summerfield, C. If deep learning is the answer, what is the question? Nat. Rev. Neurosci. 22, 55–67 (2021).
Google Scholar
Prince, J. S., Alvarez, G. A. & Konkle, T. Contrastive learning explains the emergence and function of visual category-selective regions. Sci. Adv. 10, eadl1776 (2024).
Google Scholar
Konkle, T. & Alvarez, G. A. A self-supervised domain-general learning framework for human ventral stream representation. Nat. Commun. 13, 491 (2022).
Google Scholar
Zhuang, C. et al. Unsupervised neural network models of the ventral visual stream. Proc. Natl Acad. Sci. USA 118, e2014196118 (2021).
Google Scholar
Feather, J., Leclerc, G., Ma̧dry, A. & McDermott, J. H. Model metamers reveal divergent invariances between biological and artificial neural networks. Nat. Neurosci. 26, 2017–2034 (2023).
Google Scholar
Demszky, D. et al. Using large language models in psychology. Nat. Rev. Psychol. 2, 688–701 (2023).
Dillion, D., Tandon, N., Gu, Y. & Gray, K. Can AI language models replace human participants? trends Cogn. Sci. 27, 597–600 (2023).
Messeri, L. & Crockett, M. Artificial intelligence and illusions of understanding in scientific research. Nature 627, 49–58 (2024).
Google Scholar
Josephs, E. L., Hebart, M. N. & Konkle, T. Dimensions underlying human understanding of the reachable world. Cognition 234, 105368 (2023).
Google Scholar
Zheng, C. Y., Pereira, F., Baker, C. I. & Hebart, M. N. Revealing interpretable object representations from human behavior. In International Conference on Learning Representations 1–16 (ICLR, 2019).
Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3. Proc. Natl Acad. Sci. USA 120, e2218523120 (2023).
Google Scholar
Webb, T., Holyoak, K. J. & Lu, H. Emergent analogical reasoning in large language models. Nat. Hum. Behav. 7, 1526–1541 (2023).
Google Scholar
Wei, J. et al. Emergent abilities of large language models. In Proc. International Conference on Learning Representations (ICLR 2022) 1–30 (TMLR, 2022)
Schaeffer, R., Miranda, B. & Koyejo, S. Are emergent abilities of large language models a mirage? Adv. Neural Inf. Process. Syst. 36, 55565–55581 (2024).
Hagendorff, T. Machine psychology: investigating emergent capabilities and behavior in large language models using psychological methods. Preprint at https://arxiv.org/abs/2303.13988v1 (2023).
Hagendorff, T., Fabi, S. & Kosinski, M. Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nat. Comput. Sci. 3, 833–838 (2023).
Google Scholar
Strachan, J. W. et al. Testing theory of mind in large language models and humans. Nat. Hum. Behav. 8, 1285–1295 (2024).
Google Scholar
Kumar, S. et al. Shared functional specialization in transformer-based language models and the human brain. Nat. Commun. 15, 5523 (2024).
Google Scholar
Chen, Y., Liu, T. X., Shan, Y. & Zhong, S. The emergence of economic rationality of GPT. Proc. Natl Acad. Sci. USA 120, e2316205120 (2023).
Google Scholar
Zhang, R. et al. MathVerse: does your multi-modal LLM truly see the diagrams in visual math problems? In Proc. European Conference on Computer Vision 169–186 (Springer, 2024).
Hebart, M. N. et al. THINGS: a database of 1,854 object concepts and more than 26,000 naturalistic object images. PLoS ONE 14, e0223792 (2019).
Google Scholar
Wei, C., Zou, J., Heinke, D. & Liu, Q. CoCoG: controllable visual stimuli generation based on human concept representations. In Proc. Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24) 3178–3186 (International Joint Conferences on Artificial Intelligence Organization, 2024).
Allen, E. J. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126 (2022).
Google Scholar
Kriegeskorte, N., Mur, M. & Bandettini, P. A. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, 249 (2008).
Wang, P. et al. Qwen2-VL: enhancing vision-language model’s perception of the world at any resolution. Preprint at https://arxiv.org/abs/2409.12191 (2024).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations 1–14 (ICLR, 2015).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning 1597–1607 (PMLR, 2020).
Rajalingham, R. et al. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J. Neurosci. 38, 7255–7269 (2018).
Google Scholar
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning 8748–8763 (PMLR, 2021).
Wang, A. Y., Kay, K., Naselaris, T., Tarr, M. J. & Wehbe, L. Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset. Nat. Mach. Intell. 5, 1415–1426 (2023).
Google Scholar
Epstein, R. A. & Baker, C. I. Scene perception in the human brain. Annu. Rev. Vis. Sci. 5, 373–397 (2019).
Google Scholar
Downing, P. E., Jiang, Y., Shuman, M. & Kanwisher, N. A cortical area selective for visual processing of the human body. Science 293, 2470–2473 (2001).
Google Scholar
Sergent, J., Ohta, S. & Macdonald, B. Functional neuroanatomy of face and object processing: a positron emission tomography study. Brain 115, 15–36 (1992).
Google Scholar
Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).
Google Scholar
Chang, Y. et al. A survey on evaluation of large language models. ACM Trans. Intelligent Syst. Technol. 15, 39 (2024).
Minaee, S. et al. Large language models: a survey. Preprint at https://arxiv.org/abs/2402.06196 (2024).
Yin, S. et al. A survey on multimodal large language models. Natl Sci. Rev. 11, nwae403 (2024).
Google Scholar
Conwell, C., Prince, J. S., Kay, K. N., Alvarez, G. A. & Konkle, T. What can 1.8 billion regressions tell us about the pressures shaping high-level visual representation in brains and machines? Preprint at bioRxiv https://doi.org/10.1101/2022.03.28.485868 (2022).
Zador, A. et al. Catalyzing next-generation artificial intelligence through neuroAI. Nat. Commun. 14, 1597 (2023).
Google Scholar
Thibeault, V., Allard, A. & Desrosiers, P. The low-rank hypothesis of complex systems. Nat. Phys. 20, 294–302 (2024).
Murphy, K. A. & Bassett, D. S. Information decomposition in complex systems via machine learning. Proc. Natl Acad. Sci. USA 121, e2312988121 (2024).
Google Scholar
Doerig, A. et al. Semantic scene descriptions as an objective of human vision. Preprint at https://arxiv.org/abs/2209.11737v1 (2022).
Conwell, C., Prince, J., Alvarez, G. & Konkle, T. The unreasonable effectiveness of word models in predicting high-level visual cortex responses to natural images. In Conference on Computational Cognitive Neuroscience (CCN, 2023); https://2023.ccneuro.org/view_paper1c01.html?PaperNum=1642
McMahon, E., Conwell, C., Garcia, K., Bonner, M. F. & Isik, L. Language model prediction of visual cortex responses to dynamic social scenes. J. Vis. 24, 904 (2024).
Google Scholar
Conwell, C. et al. Monkey see, model knew: large language models accurately predict human and macaque visual brain activity. In UniReps: 2nd Edition of the Workshop on Unifying Representations in Neural Models (NeurIPS, 2024); https://openreview.net/pdf?id=IvwgXU20IZ
Tuckute, G., Kanwisher, N. & Fedorenko, E. Language in brains, minds, and machines. Annu. Rev. Neurosci. 47, 277–301 (2024).
Tuckute, G. et al. Driving and suppressing the human language network using large language models. Nat. Hum. Behav. 8, 544–561 (2024).
Google Scholar
Popham, S. F. et al. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nat. Neurosci. 24, 1628–1636 (2021).
Google Scholar
Roads, B. D. & Love, B. C. Learning as the unsupervised alignment of conceptual systems. Nat. Mach. Intell. 2, 76–82 (2020).
Google Scholar
Sereno, M. I. et al. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science 268, 889–893 (1995).
Google Scholar
Engel, S. A., Glover, G. H. & Wandell, B. A. Retinotopic organization in human visual cortex and the spatial precision of functional MRI. Cereb. Cortex 7, 181–192 (1997).
Google Scholar
Hansen, K. A., Kay, K. N. & Gallant, J. L. Topographic organization in and near human visual area V4. J. Neurosci. 27, 11896–11911 (2007).
Google Scholar
Huth, A. G., Nishimoto, S., Vu, A. T. & Gallant, J. L. A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76, 1210–1224 (2012).
Google Scholar
Harvey, B. M., Klein, B. P., Petridou, N. & Dumoulin, S. O. Topographic representation of numerosity in the human parietal cortex. Science 341, 1123–1126 (2013).
Google Scholar
Sha, L. et al. The animacy continuum in the human ventral vision pathway. J. Cogn. Neurosci. 27, 665–678 (2015).
Google Scholar
Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunissen, F. E. & Gallant, J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016).
Google Scholar
Margulies, D. S. et al. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc. Natl Acad. Sci. USA 113, 12574–12579 (2016).
Google Scholar
Huntenburg, J. M., Bazin, P.-L. & Margulies, D. S. Large-scale gradients in human cortical organization. Trends Cogn. Sci. 22, 21–31 (2018).
Bau, D. et al. Understanding the role of individual units in a deep neural network. Proc. Natl Acad. Sci. USA 117, 30071–30078 (2020).
Google Scholar
McGrath, T. et al. Acquisition of chess knowledge in AlphaZero. Proc. Natl Acad. Sci. USA 119, e2206625119 (2022).
Google Scholar
Achtibat, R. et al. From attribution maps to human-understandable explanations through concept relevance propagation. Nat. Mach. Intell. 5, 1006–1019 (2023).
Google Scholar
Bills, S. et al. Language models can explain neurons in language models. OpenAI https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html (2023).
Sanborn, A. N., Griffiths, T. L. & Shiffrin, R. M. Uncovering mental representations with Markov chain Monte Carlo. Cogn. Psychol. 60, 63–106 (2010).
Google Scholar
Mahowald, K. et al. Dissociating language and thought in large language models. Trends Cogn. Sci. 28, 517–540 (2024).
Qu, Y. et al. Integration of cognitive tasks into artificial general intelligence test for large models. iScience 27, 109550 (2024).
Meng, J. AI emerges as the frontier in behavioral science. Proc. Natl Acad. Sci. USA 121, e2401336121 (2024).
Google Scholar
Marjieh, R., Sucholutsky, I., van Rijn, P., Jacoby, N. & Griffiths, T. Large language models predict human sensory judgments across six modalities. Sci. Rep. 14, 21445 (2024).
Google Scholar
Campbell, D., Kumar, S., Giallanza, T., Griffiths, T. L. & Cohen, J. D. Human-like geometric abstraction in large pre-trained neural networks. In ICLR 2024 Workshop on Representational Alignment (Re-Align) (ICLR, 2024); https://openreview.net/pdf?id=h15aZUyxjw
Kawakita, G., Zeleznikow-Johnston, A., Tsuchiya, N. & Oizumi, M. Gromov–wasserstein unsupervised alignment reveals structural correspondences between the color similarity structures of humans and large language models. Sci. Rep. 14, 15917 (2024).
Google Scholar
Li, C. et al. Large language models understand and can be enhanced by emotional stimuli. Preprint at https://arxiv.org/abs/2307.11760 (2023).
Sabour, S. et al. EmoBench: evaluating the emotional intelligence of large language models. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics 5986–6004 (Association for Computational Linguistics, 2024).
Janik, R. A. Aspects of human memory and large language models. Preprint at https://arxiv.org/abs/2311.03839 (2023).
Huff, M. & Ulakçı, E. Towards a psychology of machines: large language models predict human memory. Preprint at https://arxiv.org/abs/2403.05152 (2024).
Schramowski, P., Turan, C., Andersen, N., Rothkopf, C. A. & Kersting, K. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nat. Mach. Intell. 4, 258–268 (2022).
Google Scholar
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover theories of human decision-making. Science 372, 1209–1214 (2021).
Google Scholar
Alsagheer, D. et al. Comparing rationality between large language models and humans: insights and open questions. Preprint at https://arxiv.org/abs/2403.09798 (2024).
Achiam, J. et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).
St-Yves, G., Allen, E. J., Wu, Y., Kay, K. & Naselaris, T. Brain-optimized deep neural network models of human visual areas learn non-hierarchical representations. Nat. Commun. 14, 3329 (2023).
Google Scholar
Lin, T.-Y. et al. Microsoft COCO: common objects in context. In 13th European Conference on Computer Vision 740–755 (Springer, 2014).
Kingma, D. & Ba, J. Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations 1–15 (ICLR, 2015).
Hebart, M. N., Kaniuth, P. & Perkuhn, J. Efficiently-generated object similarity scores predicted from human feature ratings and deep neural network activations. J. Vis. 22, 4057 (2022).
Google Scholar
Muttenthaler, L., Dippel, J., Linhardt, L., Vandermeulen, R. A. & Kornblith, S. Human alignment of neural network representations. In Proc. 11th International Conference on Learning Representations (ICLR, 2023); https://openreview.net/pdf?id=ReDQ1OUQR0X
Fischl, B. FreeSurfer. Neuroimage 62, 774–781 (2012).
Google Scholar
Gao, J. S., Huth, A. G., Lescroart, M. D. & Gallant, J. L. Pycortex: an interactive surface visualizer for fMRI. Front. Neuroinform. 9, 23 (2015).
Du, C. & CDDU. ChangdeDu/LLMs_core_dimensions. Zenodo https://doi.org/10.5281/zenodo.15090333 (2025).
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-06-09 00:00:00