What Neuroscience Can Tell AI About Learning In Continuously Changing Environments

Wong, B. B. M. & Candolin, U. Behavioral responses to changing environments. Behav. Ecol. 26, 665–673 (2014).

Article

Google Scholar

Mazza, V. & Šlipogor, V. Behavioral flexibility and novel environments: integrating current perspectives for future directions. Curr. Zool. 70, 304–309 (2024).

Article

Google Scholar

Jones, C. B. Behavioral Flexibility in Primates: Causes and Consequences (Springer, 2005).

Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. 34th International Conference on Machine Learning Vol. 70, 1126–1135 (PMLR, 2017).

Koutra, D. et al. Towards agentic AI for science: hypothesis generation, comprehension, quantification, and validation. In ICLR 2025 Workshop Proposals (2025).

Faraboschi, P., Giles, E., Hotard, J., Owczarek, K. & Wheeler, A. Reducing the barriers to entry for foundation model training. Preprint at https://doi.org/10.48550/arXiv.2404.08811 (2024).

Magee, J. C. & Grienberger, C. Synaptic plasticity forms and functions. Annu. Rev. Neurosci. 43, 95–117 (2020).

Article

Google Scholar

Wu, Y. & Maass, W. A simple model for behavioral time scale synaptic plasticity (BTSP) provides content addressable memory with binary synapses and one-shot learning. Nat. Commun. 16, 342 (2025).

Article

Google Scholar

Zhao, C. et al. Is chain-of-thought reasoning of LLMs a mirage? A data distribution lens. Preprint at https://doi.org/10.48550/arXiv.2508.01191 (2025).

Parisi, G. I., Kemker, R., Part, J. L., Kanan, C. & Wermter, S. Continual lifelong learning with neural networks: a review. Neural Netw. 113, 54–71 (2019).

Article

Google Scholar

Kudithipudi, D. et al. Biological underpinnings for lifelong learning machines. Nat. Mach. Intell. 4, 196–210 (2022).

Article

Google Scholar

Gupta, R. et al. Personalized artificial general intelligence (AGI) via neuroscience-inspired continuous learning systems. Preprint at https://doi.org/10.48550/arXiv.2504.20109 (2025).

Mazurek, S., Caputa, J., Argasiński, J. K. & Wielgosz, M. Three-factor learning in spiking neural networks: an overview of methods and trends from a machine learning perspective. Preprint at https://doi.org/10.48550/arXiv.2504.05341 (2025).

Bittner, K. C., Milstein, A. D., Grienberger, C., Romani, S. & Magee, J. C. Behavioral time scale synaptic plasticity underlies CA1 place fields. Science 357, 1033–1036 (2017).

Article

Google Scholar

Grazzi, R., Siems, J. N., Schrodi, S., Brox, T. & Hutter, F. Is mamba capable of in-context learning? In Proc. International Conference on Automated Machine Learning 1–26 (AutoML, 2024).

Singh, A. K. et al. The transient nature of emergent in-context learning in transformers. Adv. Neural Inf. Process. Syst. 36, 27801–27819 (2023).

Bai, Y., Chen, F., Wang, H., Xiong, C. & Mei, S. Transformers as statisticians: provable in-context learning with in-context algorithm selection. Adv. Neural Inf. Process. Syst. 36, 57125–57211 (2023).

Google Scholar

Brown, T. B. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

Dai, D. et al. Why can GPT learn incontext? Language models implicitly perform gradient descent as meta-optimizers. In Workshop on Mathematical and Empirical Understanding of Foundation Models (2023).

Garg, S., Tsipras, D., Liang, P. S. & Valiant, G. What can transformers learn in-context? A case study of simple function classes. Adv. Neural Inf. Process. Syst. 35, 30583–30598 (2022).

Google Scholar

Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).

Liu, L. et al. On the variance of the adaptive learning rate and beyond. In Proc. 8th International Conference on Learning Representations (ICLR, 2020).

Hemmer, C. J. & Durstewitz, D. True zero-shot inference of dynamical systems preserving long-term statistics. Adv. Neural Inf. Process. Syst. 39, 1–44 (2025).

Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://doi.org/10.48550/arXiv.2307.09288 (2023).

Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A. & Bengio, Y. An empirical investigation of catastrophic forgetting in gradient-based neural networks. Preprint at https://arxiv.org/abs/1312.6211 (2013).

Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).

Article
MathSciNet

Google Scholar

Ramasesh, V., Lewkowycz, A. & Dyer, E. Effect of scale on catastrophic forgetting in neural networks. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).

Carpenter, G. A. & Grossberg, S. ART 2: self-organization of stable category recognition codes for analog input patterns. Appl. Opt. 26, 4919–4930 (1987).

Article

Google Scholar

Jung, D. et al. New insights for the stability-plasticity dilemma in online continual learning. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).

McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989).

Article

Google Scholar

French, R. M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3, 128–135 (1999).

Article

Google Scholar

Wang, Z., Li, Y., Shen, L. & Huang, H. A unified and general framework for continual learning. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).

Wang, L., Zhang, X., Su, H. & Zhu, J. A comprehensive survey of continual learning: theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell 46, 5362–5383 (2024).

Zheng, W.-L., Wu, Z., Hummos, A., Yang, G. R. & Halassa, M. M. Rapid context inference in a thalamocortical model using recurrent neural networks. Nat. Commun. 15, 8275 (2024).

Article

Google Scholar

Nguyen, C. V., Li, Y., Bui, T. D. & Turner, R. E. Variational continual learning. In Proc. 6th International Conference on Learning Representations (ICLR, 2018).

Wu, Y., Huang, L.-K., Wang, R., Meng, D. & Wei, Y. Meta continual learning revisited: implicitly enhancing online Hessian approximation via variance reduction. In Proc. 12th International Conference on Learning Representations Vol. 2 (ICLR, 2024).

Li, Z. & Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2018).

Article

Google Scholar

McDonnell, M. D., Gong, D., Parvaneh, A., Abbasnejad, E. & Van den Hengel, A. Ranpac: random projections and pre-trained models for continual learning. Adv. Neural Inf. Process. Syst. 36, 12022–12053 (2023).

Google Scholar

Ostapenko, O., Rodriguez, P., Caccia, M. & Charlin, L. Continual learning via local module composition. Adv. Neural Inf. Process. Syst. 34, 30298–30312 (2021).

Google Scholar

Sorscher, B., Ganguli, S. & Sompolinsky, H. Neural representational geometry underlies few-shot concept learning. Proc. Natl Acad. Sci. USA 119, e2200800119 (2022).

Article
MathSciNet

Google Scholar

Riemer, M. et al. Learning to learn without forgetting by maximizing transfer and minimizing interference. In Proc. 7th International Conference on Learning Representations (ICLR, 2019).

Shin, H., Lee, J. K., Kim, J. & Kim, J. Continual learning with deep generative replay. Adv. Neural Inf. Process. Syst. 30, 2994–3003 (2017).

Google Scholar

Dohare, S. et al. Loss of plasticity in deep continual learning. Nature 632, 768–774 (2024).

Article

Google Scholar

van de Ven, G. M., Tuytelaars, T. & Tolias, A. S. Three types of incremental learning. Nat. Mach. Intell. 4, 1185–1197 (2022).

Article

Google Scholar

Houlsby, N. Parameter-efficient transfer learning for NLP. In Proc. 36th International Conference on Machine Learning Vol. 97, 2790–2799 (PMLR, 2019).

Hu, E. J. et al. LoRA: low-rank adaptation of large language models. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).

Mendez, J. A., van Seijen, H. & EATON, E. Modular lifelong reinforcement learning via neural composition. In Proc.10th International Conference on Learning Representations (ICLR, 2022).

Graves, A., Wayne, G. & Danihelka, I. Neural Turing machines. Preprint at https://arxiv.org/abs/1410.5401 (2014).

Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 33, 9459–9474 (2020).

Yu, Y. et al. RankRAG: unifying context ranking with retrieval-augmented generation in LLMs. Adv. Neural Inf. Process. Syst. 37, 121156–121184 (2024).

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proc. 33rd International Conference on Machine Learning Vol. 48, 1842–(PMLR, 2016).

Skaggs, W. E. & McNaughton, B. L. Replay of neuronal firing sequences in rat hippocampus during sleep following spatial experience. Science 271, 1870–1873 (1996).

Article

Google Scholar

Mallory, C. S., Widloski, J. & Foster, D. J. The time course and organization of hippocampal replay. Science 387, 541–548 (2025).

Article

Google Scholar

Grienberger, C. & Magee, J. C. Entorhinal cortex directs learning-related changes in CA1 representations. Nature 611, 554–562 (2022).

Article

Google Scholar

Krueger, D. et al. Out-of-distribution generalization via risk extrapolation (REx). In Proc. 38th International Conference on Machine Learning Vol. 139, 5815–5826 (PMLR, 2021).

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009).

Göring, N. A., Hess, F., Brenner, M., Monfared, Z. & Durstewitz, D. Out-of-domain generalization in dynamical systems reconstruction. In Proc. 41st International Conference on Machine Learning Vol. 235, 16071–16114 (PMLR, 2024).

Lampinen, A. K., Chan, S. C., Singh, A. K. & Shanahan, M. The broader spectrum of in-context learning. Preprint at https://arxiv.org/abs/2412.03782 (2024).

Li, Y., Ildiz, M. E., Papailiopoulos, D. & Oymak, S. Transformers as algorithms: generalization and stability in in-context learning. In Proc. 40th International Conference on Machine Learning Vol. 202, 19565–19594 (PMLR, 2023).

Li, Y., Wei, X., Zhao, H. & Ma, T. Can Mamba in-context learn task mixtures? In ICML 2024 Workshop on In-Context Learning (2024).

Oswald, J. V. et al. Transformers learn in-context by gradient descent. In Proc. 40th International Conference on Machine Learning Vol. 202, 35151–35174 (PMLR, 2023).

Shen, L., Mishra, A. & Khashabi, D. Position: do pretrained transformers learn in-context by gradient descent? In Proc. 41st International Conference on Machine Learning Vol. 235, 44712–44740 (PMLR, 2024).

Li, J., Hou, Y., Sachan, M. & Cotterell, R. What do language models learn in context? The structured task hypothesis. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics Vol. 1, 12365–12379 (Association for Computational Linguistics, 2024).

Deutch, G., Magar, N., Natan, T. & Dar, G. In-context learning and gradient descent revisited. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) 1017–1028 (Association for Computational Linguistics, 2024).

Yadlowsky, S., Doshi, L. & Tripuraneni, N. Pretraining data mixtures enable narrow model selection capabilities in transformer models. Preprint at https://arxiv.org/abs/2311.00871 (2023).

Hahn, M. & Goyal, N. A theory of emergent in-context learning as implicit structure induction. Preprint at https://arxiv.org/abs/2303.07971 (2023).

Chan, S. et al. Data distributional properties drive emergent in-context learning in transformers. Adv. Neural Inf. Process. Syst. 35, 18878–18891 (2022).

Snell, C., Lee, J., Xu, K. & Kumar, A. Scaling LLM test-time compute optimally can be more effective than scaling model parameters. In Proc. 13th International Conference on Learning Representations, 1-37 (ICLR, 2025).

Domjan, M. The Principles of Learning and Behavior 7th edn (Cengage Learning, 2014).

Shettleworth, S. J. Cognition, Evolution, and Behavior 2nd edn (Oxford Univ. Press, 2009).

Bähner, F. et al. Abstract rule learning promotes cognitive flexibility in complex environments across species. Nat. Commun. 16, 5396 (2025).

Article

Google Scholar

Bouchacourt, F., Tafazoli, S., Mattar, M. G., Buschman, T. J. & Daw, N. D. Fast rule switching and slow rule updating in a perceptual categorization task. eLife 11, e82531 (2022).

Article

Google Scholar

Stokes, M. G. et al. Dynamic coding for cognitive control in prefrontal cortex. Neuron 78, 364–375 (2013).

Article

Google Scholar

Beiran, M., Meirhaeghe, N., Sohn, H., Jazayeri, M. & Ostojic, S. Parametric control of flexible timing through low-dimensional neural manifolds. Neuron 111, 739–753.e738 (2023).

Article

Google Scholar

Evenden, J. L. & Robbins, T. W. Win–stay behaviour in the rat. Q. J. Exp. Psychol. B 36, 1–26 (1984).

Article

Google Scholar

Cohen, Y., Schneidman, E. & Paz, R. The geometry of neuronal representations during rule learning reveals complementary roles of cingulate cortex and putamen. Neuron 109, 839–851.e839 (2021).

Article

Google Scholar

Tang, H., Costa, V. D., Bartolo, R. & Averbeck, B. B. Differential coding of goals and actions in ventral and dorsal corticostriatal circuits during goal-directed behavior. Cell Rep. 38, 110198 (2022).

Article

Google Scholar

Passecker, J. et al. Activity of prefrontal neurons predict future choices during gambling. Neuron 101, 152–164.e157 (2019).

Article

Google Scholar

Pereira-Obilinovic, U., Hou, H., Svoboda, K. & Wang, X.-J. Brain mechanism of foraging: reward-dependent synaptic plasticity versus neural integration of values. Proc. Natl Acad. Sci. USA 121, e2318521121 (2024).

Article

Google Scholar

Egner, T. & Siqi-Liu, A. Insights into control over cognitive flexibility from studies of task-switching. Curr. Opin. Syst. Biol. 55, 101342 (2024).

Google Scholar

Uddin, L. Q. Cognitive and behavioural flexibility: neural mechanisms and clinical considerations. Nat. Rev. Neurosci. 22, 167–179 (2021).

Article

Google Scholar

Durstewitz, D. & Seamans, J. K. The dual-state theory of prefrontal cortex dopamine function with relevance to catechol-O-methyltransferase genotypes and schizophrenia. Biol. Psychiatry 64, 739–749 (2008).

Article

Google Scholar

Goudar, V., Peysakhovich, B., Freedman, D. J., Buffalo, E. A. & Wang, X.-J. Schema formation in a neural population subspace underlies learning-to-learn in flexible sensorimotor problem-solving. Nat. Neurosci. 26, 879–890 (2023).

Article

Google Scholar

Driscoll, L. N., Shenoy, K. & Sussillo, D. Flexible multitask computation in recurrent networks utilizes shared dynamical motifs. Nat. Neurosci. 27, 1349–1363 (2024).

Article

Google Scholar

Bakermans, J. J. W., Warren, J., Whittington, J. C. R. & Behrens, T. E. J. Constructing future behavior in the hippocampal formation through composition and replay. Nat. Neurosci. 28, 1061–1072 (2025).

Article

Google Scholar

Gallistel, C. R., Fairhurst, S. & Balsam, P. The learning curve: Implications of a quantitative analysis. Proc. Natl Acad. Sci. USA 101, 13124–13131 (2004).

Article

Google Scholar

Papachristos, E. B. & Gallistel, C. Autoshaped head poking in the mouse: a quantitative analysis of the learning curve. J. Exp. Anal. Behav. 85, 293–308 (2006).

Article

Google Scholar

Durstewitz, D., Vittoz, N. M., Floresco, S. B. & Seamans, J. K. Abrupt transitions between prefrontal neural ensemble states accompany behavioral transitions during rule learning. Neuron 66, 438–448 (2010).

Article

Google Scholar

Powell, N. J. & Redish, A. D. Representational changes of latent strategies in rat medial prefrontal cortex precede changes in behaviour. Nat. Commun. 7, 12830 (2016).

Article

Google Scholar

Karlsson, M. P., Tervo, D. G. R. & Karpova, A. Y. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science 338, 135–139 (2012).

Article

Google Scholar

Russo, E. et al. Coordinated prefrontal state transition leads extinction of reward-seeking behaviors. J. Neurosci. 41, 2406–2419 (2021).

Article

Google Scholar

Miles, J. T., Mullins, G. L. & Mizumori, S. J. Flexible decision-making is related to strategy learning, vicarious trial and error, and medial prefrontal rhythms during spatial set-shifting. Learn. Mem. 31, a053911 (2024).

Article

Google Scholar

Gottlieb, J. & Oudeyer, P.-Y. Towards a neuroscience of active sampling and curiosity. Nat. Rev. Neurosci. 19, 758–770 (2018).

Article

Google Scholar

Friston, K. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).

Article

Google Scholar

Burda, Y., Edwards, H., Storkey, A. & Klimov, O. Exploration by random network distillation. In Proc. 7th International Conference on Learning Representations 1–17 (ICLR, 2019).

Li, D. et al. A survey on deep active learning: recent advances and new frontiers. IEEE Trans. Neural. Networks. Learn. Syst. 36, 5879–5899 (2025).

Article

Google Scholar

Millidge, B. Deep active inference as variational policy gradients. J. Math. Psychol. 96, 102348 (2020).

Article
MathSciNet
MATH

Google Scholar

Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven exploration by self-supervised prediction. In Proc. 34th International Conference on Machine Learning Vol. 70, 2778–2787 (PMLR, 2017).

Settles, B. Active Learning Literature Survey 1648 (Univ. Wisconsin-Madison Department of Computer Sciences, 1995).

van der Himst, O. & Lanillos, P. in Active Inference (eds Verbelen, T. et al.) 61–71 (Springer, 2020).

Branicky, M. S. Universal computation and other capabilities of hybrid and continuous dynamical systems. Theor. Comput. Sci. 138, 67–100 (1995).

Article
MathSciNet
MATH

Google Scholar

Koiran, P., Cosnard, M. & Garzon, M. Computability with low-dimensional dynamical systems. Theor. Comput. Sci. 132, 113–128 (1994).

Article
MathSciNet
MATH

Google Scholar

Siegelmann, H. T. & Sontag, E. D. On the computational power of neural nets. J. Comput. Syst. Sci. 50, 132–150 (1995).

Article
MathSciNet
MATH

Google Scholar

Fernando, J. & Guitchounts, G. Transformer dynamics: a neuroscientific approach to interpretability of large language models. Preprint at https://arxiv.org/abs/2502.12131 (2025).

Geshkovski, B., Letrouit, C., Polyanskiy, Y., & Rigollet, P. A mathematical perspective on transformers. Bull. Amer. Math. Soc. 62, 427-479 (2025).

Mikhaeil, J. M., Monfared, Z. & Durstewitz, D. On the difficulty of learning chaotic dynamics with RNNs. Adv. Neural Inf. Process. Syst. Vol. 35, 11297–11312 (2022).

Monfared, Z. & Durstewitz, D. Transformation of ReLU-based recurrent neural networks from discrete-time to continuous-time. In Proc. 37th International Conference on Machine Learning Vol. 119, 6999–7009 (PMLR, 2020).

Eisenmann, L., Monfared, Z., Göring, N. & Durstewitz, D. Bifurcations and loss jumps in RNN training. Adv. Neural Inf. Process. Syst. 36, 70511–70547 (2023).

Google Scholar

Ibayashi, H. & Imaizumi, M. Why does sgd prefer flat minima?: Through the lens of dynamical systems. In AAAI Workshop When Machine Learning meets Dynamical Systems: Theory and Applications (2023).

Şimşekli, U., Sener, O., Deligiannidis, G. & Erdogdu, M. A. Hausdorff dimension, heavy tails, and generalization in neural networks. Adv. Neural Inf. Process. Syst. 33, 5138–5151 (2020).

Zhang, Y., Singh, A.K., Latham, P.E. & Saxe, A. Training dynamics of in-context learning in linear attention. Proc. 42nd International Conference on Machine Learning 267, 76047-76087 (PMLR, 2025).

Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. USA 79, 2554–2558 (1982).

Article
MathSciNet
MATH

Google Scholar

Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

Article
MathSciNet

Google Scholar

Hinton, G. E. & Sejnowski, T. J. in Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations (eds Rumelhart, D. E. & McClelland, J. L.) 282–317 (MIT Press, 1986).

Ambrogioni, L. In search of dispersed memories: generative diffusion models are associative memory networks. Entropy 26, 381 (2024).

Article

Google Scholar

Pham, B. et al. Memorization to generalization: the emergence of diffusion models from associative memory. In NeurIPS 2024 Workshop on Scientific Methods for Understanding Deep Learning (2024).

Aksay, E., Gamkrelidze, G., Seung, H. S., Baker, R. & Tank, D. W. In vivo intracellular recording and perturbation of persistent activity in a neural integrator. Nat. Neurosci. 4, 184–193 (2001).

Article

Google Scholar

Khona, M. & Fiete, I. R. Attractor and integrator networks in the brain. Nat. Rev. Neurosci. 23, 744–766 (2022).

Article

Google Scholar

Nair, A. et al. An approximate line attractor in the hypothalamus encodes an aggressive state. Cell 186, 178–193.e115 (2023).

Article

Google Scholar

Zhang, K. Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory. J. Neurosci. 16, 2112–2126 (1996).

Article

Google Scholar

Gardner, R. J. et al. Toroidal topology of population activity in grid cells. Nature 602, 123–128 (2022).

Article

Google Scholar

Machens, C. K., Romo, R. & Brody, C. D. Flexible control of mutual inhibition: a neural model of two-interval discrimination. Science 307, 1121–1124 (2005).

Article

Google Scholar

Durstewitz, D. Self-organizing neural integrator predicts interval times through climbing activity. J. Neurosci. 23, 5342–5353 (2003).

Article

Google Scholar

Seung, H. S., Lee, D. D., Reis, B. Y. & Tank, D. W. Stability of the memory of eye position in a recurrent network of conductance-based model neurons. Neuron 26, 259–271 (2000).

Article

Google Scholar

Mensh, B. D., Aksay, E., Lee, D. D., Seung, H. S. & Tank, D. W. Spontaneous eye movements in goldfish: oculomotor integrator performance, plasticity, and dependence on visual feedback. Vis. Res. 44, 711–726 (2004).

Article

Google Scholar

Gallego, J. A., Perich, M. G., Miller, L. E. & Solla, S. A. Neural manifolds for the control of movement. Neuron 94, 978–984 (2017).

Article

Google Scholar

Fransén, E., Tahvildari, B., Egorov, A. V., Hasselmo, M. E. & Alonso, A. A. Mechanism of graded persistent cellular activity of entorhinal cortex layer V neurons. Neuron 49, 735–746 (2006).

Article

Google Scholar

Vinograd, A., Nair, A., Kim, J. H., Linderman, S. W. & Anderson, D. J. Causal evidence of a line attractor encoding an affective state. Nature 634, 910–918 (2024).

Article

Google Scholar

Schmidt, D., Koppe, G., Monfared, Z., Beutelspacher, M. & Durstewitz, D. Identifying nonlinear dynamical systems with multiple time scales and long-range dependencies. In Proc. 9th International Conference on Learning Representations e1007263 (ICLR, 2021).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

Article

Google Scholar

Perko, L. Differential Equations and Dynamical Systems 7 (Springer, 2001).

Rabinovich, M. I., Huerta, R., Varona, P. & Afraimovich, V. S. Transient cognitive dynamics, metastability, and decision making. PLoS Comput. Biol. 4, e1000072 (2008).

Article
MathSciNet

Google Scholar

Rabinovich, M. I., Varona, P., Selverston, A. I. & Abarbanel, H. D. I. Dynamical principles in neuroscience. Rev. Mod. Phys. 78, 1213–1265 (2006).

Article

Google Scholar

Tsuda, I. Toward an interpretation of dynamic neural activity in terms of chaotic dynamical systems. Behav. Brain Sci. 24, 793–848 (2001).

Article

Google Scholar

Tsuda, I. Chaotic itinerancy and its roles in cognitive neurodynamics. Curr. Opin. Neurobiol. 31, 67–71 (2015).

Article

Google Scholar

Koch, D. et al. Ghost channels and ghost cycles guiding long transients in dynamical systems. Phys. Rev. Lett. 133, 047202 (2024).

Article
MathSciNet

Google Scholar

Lapish, C. C., Balaguer-Ballester, E., Seamans, J. K., Phillips, A. G. & Durstewitz, D. Amphetamine exerts dose-dependent changes in prefrontal cortex attractor dynamics during working memory. J. Neurosci. 35, 10172 (2015).

Article

Google Scholar

Komura, Y. et al. Retrospective and prospective coding for predicted reward in the sensory thalamus. Nature 412, 546–549 (2001).

Article

Google Scholar

Wang, J., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible timing by temporal scaling of cortical responses. Nat. Neurosci. 21, 102–110 (2018).

Article

Google Scholar

Spisak, T. & Friston, K. Self-orthogonalizing attractor neural networks emerging from the free energy principle. Preprint at https://doi.org/10.48550/arXiv.2505.22749 (2025).

Rouse, N. A. & Daltorio, K. A. Visualization of stable heteroclinic channel-based movement primitives. IEEE Rob. Autom. Lett. 6, 2343–2348 (2021).

Article

Google Scholar

Mengers, N., Rouse, N. & Daltorio, K. A. Stable heteroclinic channels for controlling a simulated aquatic serpentine robot in narrow crevices. Front. Electron. 6, 1507644 (2025).

Article

Google Scholar

Durstewitz, D. & Seamans, J. K. The computational role of dopamine D1 receptors in working memory. Neural Netw. 15, 561–572 (2002).

Article

Google Scholar

Chahine, M. et al. Robust flight navigation out of distribution with liquid neural networks. Sci. Rob. 8, eadc8892 (2023).

Article

Google Scholar

Baronig, M., Ferrand, R., Sabathiel, S. & Legenstein, R. Advancing spatio-temporal processing through adaptation in spiking neural networks. Nat. Commun. 16, 5776 (2025).

Article

Google Scholar

Wang, G. et al. Hierarchical reasoning model. Preprint at https://doi.org/10.48550/arXiv.2506.21734 (2025).

Doya, K. Bifurcations in the learning of recurrent neural networks. In Proc. 1992 IEEE International Symposium on Circuits and Systems Vol. 6, 2777–2780 (IEEE, 1992).

Beggs, J. M. & Plenz, D. Neuronal avalanches in neocortical circuits. J. Neurosci. 23, 11167–11177 (2003).

Article

Google Scholar

Bertschinger, N. & Natschläger, T. Real-time computation at the edge of chaos in recurrent neural networks. Neural Comput. 16, 1413–1436 (2004).

Article
MATH

Google Scholar

Shew, W. L., Yang, H., Petermann, T., Roy, R. & Plenz, D. Neuronal avalanches imply maximum dynamic range in cortical networks at criticality. J. Neurosci. 29, 15595–15600 (2009).

Article

Google Scholar

Cocchi, L., Gollo, L. L., Zalesky, A. & Breakspear, M. Criticality in the brain: a synthesis of neurobiology, models and cognition. Prog. Neurobiol. 158, 132–152 (2017).

Article

Google Scholar

Murray, J. D. et al. A hierarchy of intrinsic timescales across primate cortex. Nat. Neurosci. 17, 1661–1663 (2014).

Article

Google Scholar

Stemmler, M. & Koch, C. How voltage-dependent conductances can adapt to maximize the information encoded by neuronal firing rate. Nat. Neurosci. 2, 521–527 (1999).

Article

Google Scholar

Zhong, L. et al. Unsupervised pretraining in biological neural networks. Nature 644, 741–748 (2025).

Citri, A. & Malenka, R. C. Synaptic Plasticity: Multiple Forms, Functions, and Mechanisms. Neuropsychopharmacology 33, 18–41 (2008).

Article

Google Scholar

Holtmaat, A. & Svoboda, K. Experience-dependent structural synaptic plasticity in the mammalian brain. Nat. Rev. Neurosci. 10, 647–658 (2009).

Article

Google Scholar

Fu, M. & Zuo, Y. Experience-dependent structural plasticity in the cortex. Trends Neurosci. 34, 177–187 (2011).

Article

Google Scholar

Sagi, Y. et al. Learning in the fast lane: new insights into neuroplasticity. Neuron 73, 1195–1203 (2012).

Article

Google Scholar

Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International conference on machine learning Vol. 37, 448–456 (PMLR, 2015).

Salimans, T. & Kingma, D. P. Weight normalization: a simple reparameterization to accelerate training of deep neural networks. Adv. Neural Inf. Process. Syst. 29, 901–909 (2016).

Turrigiano, G. G., Leslie, K. R., Desai, N. S., Rutherford, L. C. & Nelson, S. B. Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391, 892–896 (1998).

Article

Google Scholar

Kaplanis, C., Shanahan, M. & Clopath, C. Continual reinforcement learning with complex synapses. In Proc. 35th International Conference on Machine Learning 2497–2506 (PMLR, 2018).

Laborieux, A., Ernoult, M., Hirtzlin, T. & Querlioz, D. Synaptic metaplasticity in binarized neural networks. Nat. Commun. 12, 2549 (2021).

Article

Google Scholar

Schultz, W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci. 17, 183–195 (2016).

Article

Google Scholar

Doya, K. Metalearning and neuromodulation. Neural Netw. 15, 495–506 (2002).

Article

Google Scholar

Izhikevich, E. M. Solving the distal reward problem through linkage of stdp and dopamine signaling. Cereb. Cortex 17, 2443–2452 (2007).

Article

Google Scholar

Huttenlocher, P. R. & Dabholkar, A. S. Regional differences in synaptogenesis in human cerebral cortex. J. Comp. Neurol. 387, 167–178 (1997).

Article

Google Scholar

Hensch, T. K. Critical period regulation. Annu. Rev. Neurosci. 27, 549–579 (2004).

Article

Google Scholar

Ba, J., Hinton, G. E., Mnih, V., Leibo, J. Z. & Ionescu, C. Using fast weights to attend to the recent past. Adv. Neural Inf. Process. Syst. 29, 4331–4339 (2016).

Google Scholar

Hofmann, M., Becker, M. F. P., Tetzlaff, C. & Mäder, P. Concept transfer of synaptic diversity from biological to artificial neural networks. Nat. Commun. 16, 5112 (2025).

Article

Google Scholar

Benna, M. K. & Fusi, S. Computational principles of synaptic memory consolidation. Nat. Neurosci. 19, 1697–1706 (2016).

Article

Google Scholar

Ralambomihanta, T. R. et al. Learning from the past with cascading eligibility traces. Preprint at https://doi.org/10.48550/arXiv.2506.14598 (2025).

Wang, J. X. Meta-learning in natural and artificial intelligence. Curr. Opin. Syst. Biol. 38, 90–95 (2021).

Google Scholar

Ostapenko, O., Puscas, M., Klein, T., Jahnichen, P. & Nabi, M. Learning to remember: a synaptic plasticity driven framework for continual learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 11321–11329 (IEEE, 2019).

Ben-Iwhiwhu, E., Nath, S., Pilly, P. K., Kolouri, S. & Soltoggio, A. Lifelong reinforcement learning with modulating masks. Trans. Mach. Learn. Res. https://openreview.net/forum?id=V7tahqGrOq (2023).

Miconi, T., Stanley, K. & Clune, J. Differentiable plasticity: training plastic neural networks with backpropagation. In Proc. 35th International Conference on Machine Learning Vol. 80, 3559–3568 (PMLR, 2018).

Shervani-Tabar, N. & Rosenbaum, R. Meta-learning biologically plausible plasticity rules with random feedback pathways. Nat. Commun. 14, 1805 (2023).

Article

Google Scholar

Yu, Y., Jin, Y., Xiao, Y. & Yan, Y. A Recurrent spiking network with hierarchical intrinsic excitability modulation for schema learning. Preprint at https://doi.org/10.48550/arXiv.2501.14539 (2025).

Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In Proc. 26th Annual International Conference on Machine Learning 41–48 (Association for Computing Machinery, 2009).

Brock, A., Lim, T., Ritchie, J. M. & Weston, N. J. FreezeOut: accelerate training by progressively freezing layers. In 10th NIPS Workshop on Optimization for Machine Learning Vol. 10 (NIPS, 2017).

Sorrenti, A. et al. Selective freezing for efficient continual learning. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) 3542–3551 (IEEE, 2023).

Shi, T., Wu, Y., Song, L., Zhou, T. & Zhao, J. Efficient reinforcement finetuning via adaptive curriculum learning. Preprint at https://doi.org/10.48550/arXiv.2504.05520 (2025).

Tolman, E. C. Purposive Behavior in Animals and Men (Appleton-Century-Crofts, 1932).

Tolman, E. C. & Honzik, C. H. Introduction and removal of reward, and maze performance in rats. Univ. Calif. Pub. Psychol. 4, 257–275 (1930).

Google Scholar

Ke, N. R. et al. Sparse attentive backtracking: temporal credit assignment through reminding. Adv. Neural Inf. Process. Syst. 31, 7651–7662 (2018).

Google Scholar

McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419 (1995).

Article

Google Scholar

Sun, W., Advani, M., Spruston, N., Saxe, A. & Fitzgerald, J. E. Organizing memories for generalization in complementary learning systems. Nat. Neurosci. 26, 1438–1448 (2023).

Article

Google Scholar

Samborska, V., Butler, J. L., Walton, M. E., Behrens, T. E. J. & Akam, T. Complementary task representations in hippocampus and prefrontal cortex for generalizing the structure of problems. Nat. Neurosci. 25, 1314–1326 (2022).

Article

Google Scholar

Moscovitch, M., Cabeza, R., Winocur, G. & Nadel, L. Episodic memory and beyond: the hippocampus and neocortex in transformation. Annu. Rev. Psychol. 67, 105–134 (2016).

Article

Google Scholar

Treves, A. & Rolls, E. T. Computational analysis of the role of the hippocampus in memory. Hippocampus 4, 374–391 (1994).

Article

Google Scholar

Kumaran, D., Hassabis, D. & McClelland, J. L. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends Cogn. Sci. 20, 512–534 (2016).

Article

Google Scholar

Wilson, M. A. & McNaughton, B. L. Reactivation of hippocampal ensemble memories during sleep. Science 265, 676–679 (1994).

Article

Google Scholar

Foster, D. J. Replay comes of age. Annu. Rev. Neurosci. 40, 581–602 (2017).

Article

Google Scholar

Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T. & Wayne, G. Experience replay for continual learning. Adv. Neural Inf. Process. Syst. 32, 350–360 (2019).

Google Scholar

Shi, Q. et al. Hybrid neural networks for continual learning inspired by corticohippocampal circuits. Nat. Commun. 16, 1272 (2025).

Article

Google Scholar

Du, J. -l, Wei, H. -p, Wang, Z. -r, Wong, S. T. & Poo, M. -m Long-range retrograde spread of LTP and LTD from optic tectum to retina. Proc. Natl Acad. Sci. USA 106, 18890–18896 (2009).

Article

Google Scholar

Zhang, T. et al. Self-backpropagation of synaptic modifications elevates the efficiency of spiking and artificial neural networks. Sci. Adv. 7, eabh0146 (2021).

Article
MathSciNet

Google Scholar

Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982).

Article
MATH

Google Scholar

Kohonen, T. Analysis of a simple self-organizing process. Biol. Cybern. 44, 135–140 (1982).

Article
MathSciNet
MATH

Google Scholar

Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biol. 15, 267–273 (1982).

Article
MathSciNet

Google Scholar

Oja, E. & Karhunen, J. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. J. Math. Anal. Appl. 106, 69–84 (1985).

Article
MathSciNet
MATH

Google Scholar

Hertz, J. A., Krogh, A. & Palmer, R. G. Introduction To The Theory Of Neural Computation, I. (Westview Press, 1991).

Kuriscak, E., Marsalek, P., Stroffek, J. & Toth, P. G. Biological context of Hebb learning in artificial neural networks, a review. Neurocomputing 152, 27–35 (2015).

Article

Google Scholar

Schmidgall, S. et al. Brain-inspired learning in artificial neural networks: a review. APL Mach. Learn. 2, 021501 (2024).

Article

Google Scholar

Drew, P. J. & Abbott, L. F. Extending the effects of spike-timing-dependent plasticity to behavioral timescales. Proc. Natl Acad. Sci. USA 103, 8876–8881 (2006).

Article

Google Scholar

Soltoggio, A. Short-term plasticity as cause–effect hypothesis testing in distal reward learning. Biol. Cybern. 109, 75–94 (2015).

Article
MathSciNet
MATH

Google Scholar

Lu, S. & Sengupta, A. Deep unsupervised learning using spike-timing-dependent plasticity. Neuromorphic Comput. Eng. 4, 024004 (2024).

Article

Google Scholar

Apolinario, M. P. E. & Roy, K. S-TLLR: STDP-inspired temporal local learning rule for spiking neural networks. Trans. Mach. Learn. Res. https://openreview.net/forum?id=vlQ56aWJhl (2025).

Rahman, N. A. & Yusoff, N. Modulated spike-time dependent plasticity (STDP)-based learning for spiking neural network (SNN): a review. Neurocomputing 618, 129170 (2025).

Article

Google Scholar

Kudithipudi, D. et al. Neuromorphic computing at scale. Nature 637, 801–812 (2025).

Article

Google Scholar

Bittner, K. C. et al. Conjunctive input processing drives feature selectivity in hippocampal CA1 neurons. Nat. Neurosci. 18, 1133–1142 (2015).

Article

Google Scholar

Qian, F. K., Li, Y. & Magee, J. C. Mechanisms of experience-dependent place-cell referencing in hippocampal area CA1. Nat. Neurosci. 28, 1486–1496 (2025).

Article

Google Scholar

Pang, R. & Recanatesi, S. A non-Hebbian code for episodic memory. Sci. Adv. 11, eado4112 (2025).

Article

Google Scholar

Fusi, S., Asaad, W. F., Miller, E. K. & Wang, X.-J. A neural circuit model of flexible sensorimotor mapping: learning and forgetting on multiple timescales. Neuron 54, 319–333 (2007).

Article

Google Scholar

Russo, E. & Durstewitz, D. Cell assemblies at multiple time scales with arbitrary lag constellations. eLife 6, e19428 (2017).

Article

Google Scholar

Cavanagh, S. E., Hunt, L. T. & Kennerley, S. W. A diversity of intrinsic timescales underlie neural computations. Front. Neural Circuits 14, 615626 (2020).

Article

Google Scholar

Gao, R., van den Brink, R. L., Pfeffer, T. & Voytek, B. Neuronal timescales are functionally dynamic and shaped by cortical microarchitecture. eLife 9, e61277 (2020).

Article

Google Scholar

Zijlmans, M. et al. High-frequency oscillations as a new biomarker in epilepsy. Ann. Neurol. 71, 169–178 (2012).

Article

Google Scholar

Spaak, E., de Lange, F. P. & Jensen, O. Local entrainment of alpha oscillations by visual stimuli causes cyclic modulation of perception. J. Neurosci. 34, 3536–3544 (2014).

Article

Google Scholar

Momtaz, S. & Bidelman, G. M. Effects of stimulus rate and periodicity on auditory cortical entrainment to continuous sounds. eneuro 11, ENEURO.0027-0023.2024 (2024).

Article

Google Scholar

Durstewitz, D. Neural representation of interval time. NeuroReport 15, 745–749 (2004).

Article

Google Scholar

Rosenberg, M., Zhang, T., Perona, P. & Meister, M. Mice in a labyrinth show rapid learning, sudden insight, and efficient exploration. eLife 10, e66175 (2021).

Article

Google Scholar

Zipser, D. Recurrent network model of the neural mechanism of short-term active memory. Neural Comput. 3, 179–193 (1991).

Article

Google Scholar

Rajalingham, R., Piccato, A. & Jazayeri, M. Recurrent neural networks with explicit representation of dynamic latent variables can mimic behavioral patterns in a physical inference task. Nat. Commun. 13, 5865 (2022).

Article

Google Scholar

Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).

Article

Google Scholar

Gu, A. & Dao, T. Mamba: linear-time sequence modeling with selective state spaces. Preprint at https://doi.org/10.48550/arXiv.2312.00752 (2023).

Bulatov, A., Kuratov, Y. & Burtsev, M. Recurrent memory transformer. Adv. Neural Inf. Process. Syst. 35, 11079–11091 (2022).

Hutchins, D., Schlag, I., Wu, Y., Dyer, E. & Neyshabur, B. Block-recurrent transformers. Adv. Neural Inf. Process. Syst. 35, 33248–33261 (2022).

Google Scholar

Kriegeskorte, N. Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu. Rev. Vision Sci. 1, 417–446 (2015).

Article

Google Scholar

Kumar, S. et al. Shared functional specialization in transformer-based language models and the human brain. Nat. Commun. 15, 5523 (2024).

Article

Google Scholar

Whittington, J. C., Warren, J. & Behrens, T. E. Relating transformers to models and neural representations of the hippocampal formation. Preprint at https://doi.org/10.48550/arXiv.2112.04035 (2021).

Miikkulainen, R. Neuroevolution insights into biological neural computation. Science 387, eadp7478 (2025).

Article

Google Scholar

Durstewitz, D., Koppe, G. & Thurm, M. I. Reconstructing computational system dynamics from neural data with recurrent neural networks. Nat. Rev. Neurosci. 24, 693–710 (2023).

Article

Google Scholar

Brenner, M., Weber, E., Koppe, G. & Durstewitz, D. Learning interpretable hierarchical dynamical systems models from time series data. In Proc. 13th International Conference on Learning Representations 1–37 (ICLR, 2025).

Brenner, M., Hess, F., Koppe, G. & Durstewitz, D. Integrating multimodal data for joint generative modeling of complex dynamics. In Proc. 41st International Conference on Machine Learning Vol. 235, 4482–4516 (PMLR, 2024).

Glaser, J., Whiteway, M., Cunningham, J. P., Paninski, L. & Linderman, S. Recurrent switching dynamical systems models for multiple interacting neural populations. Adv. Neural Inf. Process. Syst. 33, 14867–14878 (2020).

Google Scholar

Pals, M., Sağtekin, A. E., Pei, F., Gloeckler, M. & Macke, J. H. Inferring stochastic low-rank recurrent neural networks from neural data. Adv. Neural Inf. Process. Syst. 37, 18225–18264 (2024).

Google Scholar

Hess, F., Monfared, Z., Brenner, M. & Durstewitz, D. Generalized teacher forcing for learning chaotic dynamics. In Proc. 11th International Conference on Machine Learning 13017–13049 (ICML, 2023).

Platt, J. A., Penny, S. G., Smith, T. A., Chen, T.-C. & Abarbanel, H. D. I. Constraining chaos: enforcing dynamical invariants in the training of reservoir computers. Chaos 33, 103107 (2023).

Article
MathSciNet
MATH

Google Scholar

Lim, S. et al. Inferring learning rules from distributions of firing rates in cortical neurons. Nat. Neurosci. 18, 1804–1810 (2015).

Article

Google Scholar

Mehta, Y. et al. Model based inference of synaptic plasticity rules. Adv. Neural Inf. Process. Syst. 37, 48519–48540 (2024).

Google Scholar

Chen, S., Yang, Q. & Lim, S. Efficient inference of synaptic plasticity rule with Gaussian process regression. iScience 26, 106182 (2023).

Article

Google Scholar