← Back Published on

LLMs, the Brain and Reason: A Complicated Relationship

Agents are at the centre of attention in artificial intelligence right now. They promise to bring a new level of automation by solving problems on their own, using thinking and planning similar to humans. This could greatly improve productivity for both individuals and businesses. Yet, users may face issues with reliability and trust due to the well-known hallucination problem of large language models (LLMs), which form the core of AI agents. Another concern is the difficulty LLMs still have with robust planning and reasoning, which may still fall short of the human-like abilities that are promised. [1]

The Big Debate

This gap in reasoning and planning between LLMs and humans is hotly debated. Some claim that models simply need better training to learn how to reason through problems and plan ahead. This idea is fuelled by findings that models specifically trained to reason ('large reasoning models') perform better than their original counterparts [2]. Others believe that LLMs do not fundamentally have the capacity to think or plan like humans or animals. They think the way LLMs is incompatible with true thinking and just giving them more training will not fix this [3].

The Core Problem

To find answers, neuroscientists and AI researchers have been searching for something in LLMs that lets humans imagine future scenarios and plan steps to solve problems: a model of the world [4]. Such a model is essentially organized knowledge that helps us understand how different objects and ideas are connected. In the physical world, this is like a map that shows towns, their surroundings, and how they are linked, for example, by roads. 

Such maps help us imagine and figure out how to travel from one place to another. More generally, maps let us imagine an action and its consequences without needing to try it in real life. This is the foundation of how we understand the world and gives us a sense of cause and effect, which is what we know as common sense. Importantly, it also helps us understand what is impossible and goes against our common sense.

If you have a problem to solve or a goal to reach, you can use this map to think through all the possible steps to get there. You might also consider which steps are the fastest, require the least effort or cost, or bring the most satisfaction when you succeed. Traditional AI has developed many algorithms that efficiently search through these maps to find the best sequence of actions to reach a goal [5]. For example, Google Maps uses such an algorithm behind the scenes to find the best route to your destination.

Maps and Algorithms in the Brain

In both animal and human brains, the existence of spatial maps was first proposed in theory [6] and later confirmed in a deep brain region called the hippocampus and its surrounding areas [4]. The importance of this discovery was recognized with the awarding of the Nobel Prize in 2014. Recently, researchers have found evidence of more abstract maps in the brain that represent objects and their relationships beyond just physical space [4]. 

However, it is still unclear which algorithms the brain uses to plan and reason based on these maps. An important strategy used by neuroscientists is to test whether algorithms developed by AI researchers also apply to the brain [5].

There are also important findings about what happens in the brain during planning. When you plan a path of where to go, the sequence of places you intend to visit is played out in the hippocampus before you carry out the plan. This has been taken as evidence for the neural basis of planning or imagining the future in your mind [7].

Maps and LLMs

For AI researchers, a key question has been whether LLMs also build internal maps during training and rely on them for their reasoning and planning abilities. Some theoretical studies have been encouraging, suggesting that the transformer architecture could support the construction of maps similar to those found in the hippocampus [8]. A few studies may have found evidence supporting this idea [9]. However, other research has been less encouraging, showing no signs of such map-like structures in LLMs.

One study tested GPT-4 and other LLMs but failed to find evidence that these models can organize knowledge into maps or use them to plan. Instead, the models often hallucinated relationships and proposed plans that were impossible given the task [10]. 

Reasoning is also essential in mathematics. Yet another study showed that LLMs like GPT-4o struggle to reason reliably through math problems. Even small changes to the problem such as altering numbers or wording without changing the solution strategy often caused the models to fail. Their performance also broke down when additional, unrelated details were included in the math problem [11].

A recent, widely discussed study by researchers at Apple further challenged the view that LLMs can truly reason [12]. It found that even the most advanced models, including those specifically trained for reasoning, could only handle simple tasks. As the complexity of reasoning increased, model performance collapsed, even for the best models available. This result felt like a major setback to many in the field, although some researchers raised concerns about the study's design and interpretation [13].

How do LLMs reason?

If LLMs do not really reason like humans, then what is happening when they seem to be thinking and solving problems? A widely held view is that LLMs are essentially performing simple pattern matching based on the vast amount of data they have seen during training. They piece together familiar patterns of how others have solved similar problems found in their training data [14].

The idea is that if a task closely resembles one that the model was exposed to during pretraining, it may perform well. But the moment the pattern changes slightly, the model fails, because it cannot truly reason through the new variation of the task.

Humans can also fail for the same reasons when they rely only on memorized patterns without truly understanding the problem. Imagine a math student who memorizes problem-solving steps during a lecture. To test true understanding, the teacher introduces a small twist in the exam, a variation not covered in class. A student who simply memorized the steps will likely fail, while a student who understood the structure of the problem can reason through the new challenge and solve it. LLMs today tend to behave more like the first student: good at recall, but weak at adapting to novelty.

Changing the Architecture

While many remain convinced that LLMs can be improved through better training and increased computing power to eventually solve any problem and even surpass humans, others focus on developing new strategies and architectures to help LLMs reason. One idea is to combine LLMs with reasoning engines that are based on classic AI and an LLM can use as a tool whenever it requires to plan and reason through problems (see Neuro-Symbolic AI or Are We Already There?). 

Another idea is to design an entirely new architecture that helps AI reason and plan like humans by learning and using a world model that includes structured knowledge and reliable reasoning [15]. This approach of combining AI that excels at pattern recognition like LLMS and AI that can reliably reason based on world models is called neurosymbolic AI [16].

One example of such a new architecture is the JEPA (Joint Embedding Predictive Architecture) model, developed by YannLeCun at Meta [3]. At its core, the JEPA model has a world model that understands causal relationships and their hierarchical structure within the environment, allowing it to predict future actions based on this understanding. It also incorporates components that estimate how much cost is required to reach a goal and the potential reward for achieving it. Additionally, the model includes a mechanism similar to human attention. It focuses only on the goal-relevant aspects of the world model, the input, and the associated cost or reward.

Outlook

Despite the widespread excitement about LLMs and their capabilities today, studies consistently show that their reasoning abilities remain fragile. Only future research will reveal whether LLMs can catch up to human-level planning and reasoning or if entirely new AI architectures are needed. Until then, caution is warranted when relying on the reasoning skills of LLMs and AI agents in high-stakes situations where trust and accuracy are essential.


References

1. Yu T, Jing Y, Zhang X, et al. Benchmarking Reasoning Robustness in Large Language Models. Published online March 6, 2025. https://arxiv.org/pdf/2503.04550

2. Ferrag MA, Tihanyi N, Debbah M. Reasoning Beyond Limits: Advances and Open Problems for LLMs. Published online March 26, 2025. https://arxiv.org/pdf/2503.22732

3. Y LeCun. A path towards autonomous machine intelligence. openreview.net. Published online 2022. https://openreview.net/pdf?id=BZ5a1r-kVsf

4. Behrens TEJ, Muller TH, Whittington JCR, et al. What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior. Neuron. 2018;100(2):490-509.

5. Mattar MG, Lengyel M. Planning in the brain. Neuron. 2022;110(6):914-934.

6. Tolman EC. Cognitive maps in rats and men. Psychol Rev. 1948;55(4):189-208.

7. Ólafsdóttir HF, Bush D, Barry C. The Role of Hippocampal Replay in Memory and Planning. Current Biology. 2018;28(1):R37-R50.

8. Whittington JCR, Warren J, Behrens TEJ. Relating transformers to models and neural representations of the hippocampal formation. ICLR 2022 - 10th International Conference on Learning Representations. Published online December 7, 2021. https://arxiv.org/pdf/2112.04035

9. Yuan Y, Søgaard A. Revisiting the Othello World Model Hypothesis. Published online March 6, 2025. https://arxiv.org/pdf/2503.04421

10. Momennejad I, Hasanbeig H, Vieira Frujeri F, et al. Evaluating Cognitive Maps and Planning in Large Language Models with CogEval. Adv Neural Inf Process Syst. 2023;36:69736-69751.

11. Mirzadeh I, Alizadeh K, Shahrokhi H, Tuzel O, Bengio S, Farajtabar M. GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. Published online October 7, 2024. https://arxiv.org/pdf/2410.05229

12. Parshin Shojaee IMKAMHSBMF. The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. 2025. https://machinelearning.apple.com/research/illusion-of-thinking

13. Lawsen A. Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. Published online June 10, 2025. https://arxiv.org/pdf/2506.09250

14. Jiang B, Xie Y, Hao Z, et al. A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners. Published online June 16, 2024. https://arxiv.org/pdf/2406.11050

15. Marcus Gary, Davis Ernest. Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon Books; 2019.

16. Colelough BC, Regli W. Neuro-Symbolic AI in 2024: A Systematic Review. Published online January 9, 2025. https://arxiv.org/pdf/2501.054... to edit the article content or add new blocks...