AI

Human-like object concept representations emerge naturally in multimodal large language models

  • Biederman, I. Recognition-by-components: a theory of human image understanding. Psych. Rev. 94, 115–147 (1987).

    Article 

    Google Scholar 

  • Edelman, S. Representation is representation of similarities. Behav. Brain Sci. 21, 449–467 (1998).

    Article 

    Google Scholar 

  • Nosofsky, R. M. Attention, similarity, and the identification–categorization relationship. J. Exp. Psychol. Gen. 115, 39–61 (1986).

    Article 

    Google Scholar 

  • Goldstone, R. L. The role of similarity in categorization: providing a groundwork. Cognition 52, 125–157 (1994).

    Article 

    Google Scholar 

  • Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M. & Boyes-Braem, P. Basic objects in natural categories. Cogn. Psychol. 8, 382–439 (1976).

    Article 

    Google Scholar 

  • Mahon, B. Z. & Caramazza, A. Concepts and categories: a cognitive neuropsychological perspective. Annu. Rev. Psychol. 60, 27–51 (2009).

    Article 

    Google Scholar 

  • Rogers, T. T. & McClelland, J. L. Semantic Cognition: A Parallel Distributed Processing Approach (MIT Press, 2004).

  • Shepard, R. N. Toward a universal law of generalization for psychological science. Science 237, 1317–1323 (1987).

    Article 
    MathSciNet 

    Google Scholar 

  • Battleday, R. M., Peterson, J. C. & Griffiths, T. L. Capturing human categorization of natural images by combining deep networks and cognitive models. Nat. Commun. 11, 5418 (2020).

    Article 

    Google Scholar 

  • Jagadeesh, A. V. & Gardner, J. L. Texture-like representation of objects in human visual cortex. Proc. Natl Acad. Sci. USA 119, e2115302119 (2022).

    Article 

    Google Scholar 

  • Grand, G., Blank, I. A., Pereira, F. & Fedorenko, E. Semantic projection recovers rich human knowledge of multiple object features from word embeddings. Nat. Hum. Behav. 6, 975–987 (2022).

    Article 

    Google Scholar 

  • Connolly, A. C. et al. The representation of biological classes in the human brain. J. Neurosci. 32, 2608–2618 (2012).

    Article 

    Google Scholar 

  • Downing, P. E., Chan, A.-Y., Peelen, M. V., Dodds, C. & Kanwisher, N. Domain specificity in visual cortex. Cereb. Cortex 16, 1453–1461 (2006).

    Article 

    Google Scholar 

  • Kriegeskorte, N. et al. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60, 1126–1141 (2008).

    Article 

    Google Scholar 

  • Caramazza, A. & Shelton, J. R. Domain-specific knowledge systems in the brain: the animate-inanimate distinction. J. Cogn. Neurosci. 10, 1–34 (1998).

    Article 

    Google Scholar 

  • Hebart, M. N., Zheng, C. Y., Pereira, F. & Baker, C. I. Revealing the multidimensional mental representations of natural objects underlying human similarity judgements. Nat. Hum. Behav. 4, 1173–1185 (2020).

    Article 

    Google Scholar 

  • Hebart, M. N. et al. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife 12, e82580 (2023).

    Article 

    Google Scholar 

  • Konkle, T. & Oliva, A. A real-world size organization of object responses in occipitotemporal cortex. Neuron 74, 1114–1124 (2012).

    Article 

    Google Scholar 

  • Konkle, T. & Oliva, A. Canonical visual size for real-world objects. J. Exp. Psychol. 37, 23–37 (2011).

    Google Scholar 

  • Bowers, J. S. et al. Deep problems with neural network models of human vision. Behav. Brain Sci. 46, e385 (2023).

    Article 

    Google Scholar 

  • Hermann, K., Nayebi, A., van Steenkiste, S. & Jones, M. For human-like models, train on human-like tasks. Behav. Brain Sci. 46, e394 (2023).

    Article 

    Google Scholar 

  • Jha, A., Peterson, J. C. & Griffiths, T. L. Extracting low-dimensional psychological representations from convolutional neural networks. Cogn. Sci. 47, e13226 (2023).

    Article 

    Google Scholar 

  • Nadler, E. O. et al. Divergences in color perception between deep neural networks and humans. Cognition 241, 105621 (2023).

    Article 

    Google Scholar 

  • Cohen, U., Chung, S., Lee, D. D. & Sompolinsky, H. Separability and geometry of object manifolds in deep neural networks. Nat. Commun. 11, 746 (2020).

    Article 

    Google Scholar 

  • Dobs, K., Martinez, J., Kell, A. J. & Kanwisher, N. Brain-like functional specialization emerges spontaneously in deep neural networks. Sci. Adv. 8, eabl8913 (2022).

    Article 

    Google Scholar 

  • Mahner, F. P., Muttenthaler, L., Güçlü, U. & Hebart, M. N. Dimensions underlying the representational alignment of deep neural networks with humans. Preprint at https://arxiv.org/abs/2406.19087 (2024).

  • Jacob, G., Pramod, R., Katti, H. & Arun, S. Qualitative similarities and differences in visual object representations between brains and deep networks. Nat. Commun. 12, 1872 (2021).

    Article 

    Google Scholar 

  • Goldstein, A. et al. Shared computational principles for language processing in humans and deep language models. Nat. Neurosci. 25, 369–380 (2022).

    Article 

    Google Scholar 

  • Muttenthaler, L. & Hebart, M. N. Interpretable object dimensions in deep neural networks and their similarities to human representations. J. Vis. 22, 4516 (2022).

    Article 

    Google Scholar 

  • Saxe, A., Nelli, S. & Summerfield, C. If deep learning is the answer, what is the question? Nat. Rev. Neurosci. 22, 55–67 (2021).

    Article 

    Google Scholar 

  • Prince, J. S., Alvarez, G. A. & Konkle, T. Contrastive learning explains the emergence and function of visual category-selective regions. Sci. Adv. 10, eadl1776 (2024).

    Article 

    Google Scholar 

  • Konkle, T. & Alvarez, G. A. A self-supervised domain-general learning framework for human ventral stream representation. Nat. Commun. 13, 491 (2022).

    Article 

    Google Scholar 

  • Zhuang, C. et al. Unsupervised neural network models of the ventral visual stream. Proc. Natl Acad. Sci. USA 118, e2014196118 (2021).

    Article 

    Google Scholar 

  • Feather, J., Leclerc, G., Ma̧dry, A. & McDermott, J. H. Model metamers reveal divergent invariances between biological and artificial neural networks. Nat. Neurosci. 26, 2017–2034 (2023).

    Article 

    Google Scholar 

  • Demszky, D. et al. Using large language models in psychology. Nat. Rev. Psychol. 2, 688–701 (2023).

    Google Scholar 

  • Dillion, D., Tandon, N., Gu, Y. & Gray, K. Can AI language models replace human participants? trends Cogn. Sci. 27, 597–600 (2023).

  • Messeri, L. & Crockett, M. Artificial intelligence and illusions of understanding in scientific research. Nature 627, 49–58 (2024).

    Article 

    Google Scholar 

  • Josephs, E. L., Hebart, M. N. & Konkle, T. Dimensions underlying human understanding of the reachable world. Cognition 234, 105368 (2023).

    Article 

    Google Scholar 

  • Zheng, C. Y., Pereira, F., Baker, C. I. & Hebart, M. N. Revealing interpretable object representations from human behavior. In International Conference on Learning Representations 1–16 (ICLR, 2019).

  • Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3. Proc. Natl Acad. Sci. USA 120, e2218523120 (2023).

    Article 

    Google Scholar 

  • Webb, T., Holyoak, K. J. & Lu, H. Emergent analogical reasoning in large language models. Nat. Hum. Behav. 7, 1526–1541 (2023).

    Article 

    Google Scholar 

  • Wei, J. et al. Emergent abilities of large language models. In Proc. International Conference on Learning Representations (ICLR 2022) 1–30 (TMLR, 2022)

  • Schaeffer, R., Miranda, B. & Koyejo, S. Are emergent abilities of large language models a mirage? Adv. Neural Inf. Process. Syst. 36, 55565–55581 (2024).

  • Hagendorff, T. Machine psychology: investigating emergent capabilities and behavior in large language models using psychological methods. Preprint at https://arxiv.org/abs/2303.13988v1 (2023).

  • Hagendorff, T., Fabi, S. & Kosinski, M. Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nat. Comput. Sci. 3, 833–838 (2023).

    Article 

    Google Scholar 

  • Strachan, J. W. et al. Testing theory of mind in large language models and humans. Nat. Hum. Behav. 8, 1285–1295 (2024).

    Article 

    Google Scholar 

  • Kumar, S. et al. Shared functional specialization in transformer-based language models and the human brain. Nat. Commun. 15, 5523 (2024).

    Article 

    Google Scholar 

  • Chen, Y., Liu, T. X., Shan, Y. & Zhong, S. The emergence of economic rationality of GPT. Proc. Natl Acad. Sci. USA 120, e2316205120 (2023).

    Article 

    Google Scholar 

  • Zhang, R. et al. MathVerse: does your multi-modal LLM truly see the diagrams in visual math problems? In Proc. European Conference on Computer Vision 169–186 (Springer, 2024).

  • Hebart, M. N. et al. THINGS: a database of 1,854 object concepts and more than 26,000 naturalistic object images. PLoS ONE 14, e0223792 (2019).

    Article 

    Google Scholar 

  • Wei, C., Zou, J., Heinke, D. & Liu, Q. CoCoG: controllable visual stimuli generation based on human concept representations. In Proc. Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24) 3178–3186 (International Joint Conferences on Artificial Intelligence Organization, 2024).

  • Allen, E. J. et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126 (2022).

    Article 

    Google Scholar 

  • Kriegeskorte, N., Mur, M. & Bandettini, P. A. Representational similarity analysis—connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, 249 (2008).

    Google Scholar 

  • Wang, P. et al. Qwen2-VL: enhancing vision-language model’s perception of the world at any resolution. Preprint at https://arxiv.org/abs/2409.12191 (2024).

  • Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations 1–14 (ICLR, 2015).

  • Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning 1597–1607 (PMLR, 2020).

  • Rajalingham, R. et al. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J. Neurosci. 38, 7255–7269 (2018).

    Article 

    Google Scholar 

  • Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning 8748–8763 (PMLR, 2021).

  • Wang, A. Y., Kay, K., Naselaris, T., Tarr, M. J. & Wehbe, L. Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset. Nat. Mach. Intell. 5, 1415–1426 (2023).

    Article 

    Google Scholar 

  • Epstein, R. A. & Baker, C. I. Scene perception in the human brain. Annu. Rev. Vis. Sci. 5, 373–397 (2019).

    Article 

    Google Scholar 

  • Downing, P. E., Jiang, Y., Shuman, M. & Kanwisher, N. A cortical area selective for visual processing of the human body. Science 293, 2470–2473 (2001).

    Article 

    Google Scholar 

  • Sergent, J., Ohta, S. & Macdonald, B. Functional neuroanatomy of face and object processing: a positron emission tomography study. Brain 115, 15–36 (1992).

    Article 

    Google Scholar 

  • Kanwisher, N., McDermott, J. & Chun, M. M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).

    Article 

    Google Scholar 

  • Chang, Y. et al. A survey on evaluation of large language models. ACM Trans. Intelligent Syst. Technol. 15, 39 (2024).

    Google Scholar 

  • Minaee, S. et al. Large language models: a survey. Preprint at https://arxiv.org/abs/2402.06196 (2024).

  • Yin, S. et al. A survey on multimodal large language models. Natl Sci. Rev. 11, nwae403 (2024).

    Article 

    Google Scholar 

  • Conwell, C., Prince, J. S., Kay, K. N., Alvarez, G. A. & Konkle, T. What can 1.8 billion regressions tell us about the pressures shaping high-level visual representation in brains and machines? Preprint at bioRxiv https://doi.org/10.1101/2022.03.28.485868 (2022).

  • Zador, A. et al. Catalyzing next-generation artificial intelligence through neuroAI. Nat. Commun. 14, 1597 (2023).

    Article 

    Google Scholar 

  • Thibeault, V., Allard, A. & Desrosiers, P. The low-rank hypothesis of complex systems. Nat. Phys. 20, 294–302 (2024).

  • Murphy, K. A. & Bassett, D. S. Information decomposition in complex systems via machine learning. Proc. Natl Acad. Sci. USA 121, e2312988121 (2024).

    Article 

    Google Scholar 

  • Doerig, A. et al. Semantic scene descriptions as an objective of human vision. Preprint at https://arxiv.org/abs/2209.11737v1 (2022).

  • Conwell, C., Prince, J., Alvarez, G. & Konkle, T. The unreasonable effectiveness of word models in predicting high-level visual cortex responses to natural images. In Conference on Computational Cognitive Neuroscience (CCN, 2023); https://2023.ccneuro.org/view_paper1c01.html?PaperNum=1642

  • McMahon, E., Conwell, C., Garcia, K., Bonner, M. F. & Isik, L. Language model prediction of visual cortex responses to dynamic social scenes. J. Vis. 24, 904 (2024).

    Article 

    Google Scholar 

  • Conwell, C. et al. Monkey see, model knew: large language models accurately predict human and macaque visual brain activity. In UniReps: 2nd Edition of the Workshop on Unifying Representations in Neural Models (NeurIPS, 2024); https://openreview.net/pdf?id=IvwgXU20IZ

  • Tuckute, G., Kanwisher, N. & Fedorenko, E. Language in brains, minds, and machines. Annu. Rev. Neurosci. 47, 277–301 (2024).

  • Tuckute, G. et al. Driving and suppressing the human language network using large language models. Nat. Hum. Behav. 8, 544–561 (2024).

    Article 

    Google Scholar 

  • Popham, S. F. et al. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nat. Neurosci. 24, 1628–1636 (2021).

    Article 

    Google Scholar 

  • Roads, B. D. & Love, B. C. Learning as the unsupervised alignment of conceptual systems. Nat. Mach. Intell. 2, 76–82 (2020).

    Article 

    Google Scholar 

  • Sereno, M. I. et al. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science 268, 889–893 (1995).

    Article 

    Google Scholar 

  • Engel, S. A., Glover, G. H. & Wandell, B. A. Retinotopic organization in human visual cortex and the spatial precision of functional MRI. Cereb. Cortex 7, 181–192 (1997).

    Article 

    Google Scholar 

  • Hansen, K. A., Kay, K. N. & Gallant, J. L. Topographic organization in and near human visual area V4. J. Neurosci. 27, 11896–11911 (2007).

    Article 

    Google Scholar 

  • Huth, A. G., Nishimoto, S., Vu, A. T. & Gallant, J. L. A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76, 1210–1224 (2012).

    Article 

    Google Scholar 

  • Harvey, B. M., Klein, B. P., Petridou, N. & Dumoulin, S. O. Topographic representation of numerosity in the human parietal cortex. Science 341, 1123–1126 (2013).

    Article 

    Google Scholar 

  • Sha, L. et al. The animacy continuum in the human ventral vision pathway. J. Cogn. Neurosci. 27, 665–678 (2015).

    Article 

    Google Scholar 

  • Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunissen, F. E. & Gallant, J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016).

    Article 

    Google Scholar 

  • Margulies, D. S. et al. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc. Natl Acad. Sci. USA 113, 12574–12579 (2016).

    Article 

    Google Scholar 

  • Huntenburg, J. M., Bazin, P.-L. & Margulies, D. S. Large-scale gradients in human cortical organization. Trends Cogn. Sci. 22, 21–31 (2018).

    Google Scholar 

  • Bau, D. et al. Understanding the role of individual units in a deep neural network. Proc. Natl Acad. Sci. USA 117, 30071–30078 (2020).

    Article 

    Google Scholar 

  • McGrath, T. et al. Acquisition of chess knowledge in AlphaZero. Proc. Natl Acad. Sci. USA 119, e2206625119 (2022).

    Article 
    MathSciNet 

    Google Scholar 

  • Achtibat, R. et al. From attribution maps to human-understandable explanations through concept relevance propagation. Nat. Mach. Intell. 5, 1006–1019 (2023).

    Article 

    Google Scholar 

  • Bills, S. et al. Language models can explain neurons in language models. OpenAI https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html (2023).

  • Sanborn, A. N., Griffiths, T. L. & Shiffrin, R. M. Uncovering mental representations with Markov chain Monte Carlo. Cogn. Psychol. 60, 63–106 (2010).

    Article 

    Google Scholar 

  • Mahowald, K. et al. Dissociating language and thought in large language models. Trends Cogn. Sci. 28, 517–540 (2024).

  • Qu, Y. et al. Integration of cognitive tasks into artificial general intelligence test for large models. iScience 27, 109550 (2024).

  • Meng, J. AI emerges as the frontier in behavioral science. Proc. Natl Acad. Sci. USA 121, e2401336121 (2024).

    Article 

    Google Scholar 

  • Marjieh, R., Sucholutsky, I., van Rijn, P., Jacoby, N. & Griffiths, T. Large language models predict human sensory judgments across six modalities. Sci. Rep. 14, 21445 (2024).

    Article 

    Google Scholar 

  • Campbell, D., Kumar, S., Giallanza, T., Griffiths, T. L. & Cohen, J. D. Human-like geometric abstraction in large pre-trained neural networks. In ICLR 2024 Workshop on Representational Alignment (Re-Align) (ICLR, 2024); https://openreview.net/pdf?id=h15aZUyxjw

  • Kawakita, G., Zeleznikow-Johnston, A., Tsuchiya, N. & Oizumi, M. Gromov–wasserstein unsupervised alignment reveals structural correspondences between the color similarity structures of humans and large language models. Sci. Rep. 14, 15917 (2024).

    Article 

    Google Scholar 

  • Li, C. et al. Large language models understand and can be enhanced by emotional stimuli. Preprint at https://arxiv.org/abs/2307.11760 (2023).

  • Sabour, S. et al. EmoBench: evaluating the emotional intelligence of large language models. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics 5986–6004 (Association for Computational Linguistics, 2024).

  • Janik, R. A. Aspects of human memory and large language models. Preprint at https://arxiv.org/abs/2311.03839 (2023).

  • Huff, M. & Ulakçı, E. Towards a psychology of machines: large language models predict human memory. Preprint at https://arxiv.org/abs/2403.05152 (2024).

  • Schramowski, P., Turan, C., Andersen, N., Rothkopf, C. A. & Kersting, K. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nat. Mach. Intell. 4, 258–268 (2022).

    Article 

    Google Scholar 

  • Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover theories of human decision-making. Science 372, 1209–1214 (2021).

    Article 

    Google Scholar 

  • Alsagheer, D. et al. Comparing rationality between large language models and humans: insights and open questions. Preprint at https://arxiv.org/abs/2403.09798 (2024).

  • Achiam, J. et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

  • St-Yves, G., Allen, E. J., Wu, Y., Kay, K. & Naselaris, T. Brain-optimized deep neural network models of human visual areas learn non-hierarchical representations. Nat. Commun. 14, 3329 (2023).

    Article 

    Google Scholar 

  • Lin, T.-Y. et al. Microsoft COCO: common objects in context. In 13th European Conference on Computer Vision 740–755 (Springer, 2014).

  • Kingma, D. & Ba, J. Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations 1–15 (ICLR, 2015).

  • Hebart, M. N., Kaniuth, P. & Perkuhn, J. Efficiently-generated object similarity scores predicted from human feature ratings and deep neural network activations. J. Vis. 22, 4057 (2022).

    Article 

    Google Scholar 

  • Muttenthaler, L., Dippel, J., Linhardt, L., Vandermeulen, R. A. & Kornblith, S. Human alignment of neural network representations. In Proc. 11th International Conference on Learning Representations (ICLR, 2023); https://openreview.net/pdf?id=ReDQ1OUQR0X

  • Fischl, B. FreeSurfer. Neuroimage 62, 774–781 (2012).

    Article 

    Google Scholar 

  • Gao, J. S., Huth, A. G., Lescroart, M. D. & Gallant, J. L. Pycortex: an interactive surface visualizer for fMRI. Front. Neuroinform. 9, 23 (2015).

  • Du, C. & CDDU. ChangdeDu/LLMs_core_dimensions. Zenodo https://doi.org/10.5281/zenodo.15090333 (2025).

  • Don’t miss more hot News like this! Click here to discover the latest in AI news!

    2025-06-09 00:00:00

    Related Articles

    Back to top button