← Back Published on

Inside the Brain of AI Agents

In 1950, Alan Turing asked the famous question: "Can machines think?" [1]. Recognizing that a thought is hard to define, Turing proposed a different approach. Instead of trying to describe it, he suggested we look at how machines behave.

He introduced the "imitation game," now known as the Turing Test: if a machine’s responses are indistinguishable from those of a human, it is said to have passed the test. Turing’s ideas started many years of research in artificial intelligence, which in recent years has gained momentum again with the development of generative AI and large language models (LLMs).

Recent improvements of large language models have fuelled hopes that we are getting closer to answering Alan Turing’s big question. The development of increasingly sophisticated models has given the impression that human abilities, such as reasoning, planning, and problem-solving, have popped up in LLMs [2].

There is a growing belief that, because of their advanced thinking abilities, these new generation of LLMs, when equipped with additional features such as memory and the ability to use external tool (also called “AI agents”), could automate human tasks in ways never seen before, potentially replacing people in many jobs.

Early AI: The Human Connection

Reaching a level of AI that seems to think like humans was a long road. Early AI systems were only modestly intelligent by human standards. They were unable to learn on their own and relied on human programmers to define rules that determined how they behaved. These rules also included algorithms that guide AI models to plan and reason through a limited set of problems [3]. 

The main advantage of this approach was predictability: given a particular input, such as a question, the AI would always produce the same response. However, this type of behaviour came at the expense of flexibility, as models were unable to adapt to new circumstances without human intervention, which is quite different from what we consider thinking.

Yet, humans sometimes act in similar ways when following routines or habits. In these cases, there is little to no conscious thought involved. A familiar situation triggers an automatic response. For example, you might wake up and immediately check your emails and the news or instinctively greet your family and colleagues with a “good morning.” This also includes sequences of actions that are familiar, as well as plans that worked out in the past and are now stored in memory.

The advantage of such habitual behavior, much like early AI, is that it is fast, effortless, consistent, and reliable, leading to fewer mistakes [4]. You also rely more on habits when you are under stress [5]. However, unlike rule-based AI, humans acquire these routines through repetition and learning over time.

When Machines Learn ─ Without Thought

Artificial intelligence changed fundamentally with the invention of machine learning. Instead of depending on rules programmed by humans, machines began to learn patterns and rules directly from data [3]. However, there is one major difference from earlier AI: the rules that machine learning models use are based on statistics, not predefined instructions.

Given a particular situation, these systems can only provide a probable response, not a fixed one. This is also true for modern large language models (LLMs), the most widely known machine learning models today.

It seems that when large language models first appeared, they felt closer to human thinking, but they were still similar to older rule-based AI. They respond to a question immediately, without anything that could be considered a true thinking process in between.

Behind the Scenes of Human Thoughts

What does it mean when we say we think? Thinking can describe several different types of processes. We think when we deliberately solve new problems that go beyond familiar solutions or make new plans, considering the steps needed to achieve a goal. When faced with choices, we decide which option is best. Thinking also includes judging whether our ideas or someone else’s opinions are true. Being creative and imagining new situations are also types of thinking.

The human brain uses thinking when we are relaxed or mildly stressed. However, when we are under high stress, the brain often shuts down its slower [4], more error-prone thinking processes and switches into the faster, more automatic and safer routine and habit modes [5].

The Secret Ingredient of Thinking

An essential ingredient for thinking is memory, the ability to remember the past. Without it, you would not be able to reason through facts you have learned, connect different steps to come up with a plan, or compare options to make decisions. You might even forget the plan you just made, like going to the fridge and not remembering why you went there.

Memory comes in different forms [6]. Episodic memory stores personal experiences, keeping track of what happened, when, and where. Another important type is semantic memory, or "knowledge memory," which holds facts about the world, word meanings, and concepts. Working memory lets us keep information for a short period of time, like remembering what you or someone you are talking to just said.

All these types of memory are combined with the information we receive from our senses to shape our thoughts and take action. Unlike reactive behavior, where we respond automatically to certain inputs, directed action involves carrying out a plan aimed at achieving a specific goal.

The Hope of AI Agents

It is all over the news now that LLMs have recently received a major upgrade. They turned into AI agents. What sets these new AI agents apart is their attempt to copy human thinking abilities, allowing them to tackle more complex tasks than simply responding passively to queries.

LLMs "sense" their environment by receiving input from various devices such as keyboards, microphones and cameras. Traditional LLMs would then use their internal knowledge, semantic memory acquired during pretraining, to interpret the incoming information and answer questions.

All You Need Is Memory

With the agent upgrade, LLMs now have access to a much broader range of memory sources [6]. First, they can expand their semantic memory by accessing information from company databases and searching the internet for up-to-date facts.

Moreover, LLMs can remember what the user wrote just moments ago and keep track of the ongoing conversation, much like short-term memory in humans.

Some LLMs are now equipped with a human-like form of episodic memory. They can access databases that store conversations and interactions from the more distant past. This allows the LLM to learn about user preferences over time and tailor its responses accordingly.

AI that Thinks and Acts

The next upgrade is to enable the LLM to plan and reason, moving beyond simple reactive responses. One effective approach is to prompt the model directly to think through a problem or task and generate a step-by-step plan, using techniques like ReAct or chain-of-thought prompting. In newer models, this advanced reasoning capability is often built in during pretraining, allowing the LLM to perform more complex reasoning tasks automatically (large reasoning models or LRMs). As a result, when you ask a question to an LLM, some models let you watch how they think through a problem. It seems like the LLM explores various viewpoints before reaching what appears to be the final answer.

The next step in making LLMs more human-like is allowing them to take actions to execute plans and achieve specific goals. One such action is accessing supplementary memory sources, such as databases or the internet, to retrieve information like daily weather updates, the latest news, or company-specific data.

Beyond information retrieval, LLMs can also interact with various software and applications. This includes using calculators for math, email clients to analyse messages, calendars to access or create events, and chat systems to communicate with other people. All these external resources, apps, software, and databases are referred to as "tools". LLMs are usually given the autonomy to choose which tools to use to accomplish their goals most effectively.

When an LLM is equipped with memory, reasoning and planning skills, and has access to external tools to achieve goals, it becomes what is known as a modern AI agent [7].

AI agents are increasingly being used in a variety of everyday tasks and business operations. They include scheduling meetings and appointments, managing emails (e.g. extracting important info, filtering out unimportant messages, and drafting responses), answering company-specific questions, summarizing data into tables and figures, and supporting decision-making [8].

Team Up!

AI agents are usually designed as single agents, with one LLM handling a specific goal on its own. In contrast, humans often achieve complex goals by working together in teams, where each person is responsible for a subtask managed by their own brain. Coordination via a team leader and clear communication are essential for dividing work and successfully reaching the team's objective.

The human team concept also applies to AI agents. In multi-agent systems, several individual AI agents are brought together, each with a specific task. Sometimes, a coordinator agent oversees the process and coordinates the tasks. Just like humans, agents communicate and collaborate to achieve a shared goal.

For example, imagine a simple system where one agent writes a draft on a topic, then passes it to another agent for critical review. The reviewer suggests improvements, which the writer agent incorporates before sending it back for another review. This cycle continues until a final version is ready to be reviewed by a human.

Not so Fast

Are LLMs now truly capable of reasoning, planning, and thinking like humans? The answer is not straightforward. When we examine just the behaviour of agents, as Turing once proposed, it appears they have not yet reached human-level thinking abilities.

LLMs have persistent problems that are not solved by turning them into agents. The main challenges are their tendency to hallucinate, i.e. inventing information that is not true, and their lack of reliability and consistency in providing correct answers.

Humans rely on structured knowledge, facts, logic, and an understanding of cause and effect ("common sense"), which gives a sense of certainty when we reason and plan.

In contrast, LLMs seem to primarily rely on statistical pattern matching, reproducing patterns learned from their vast training data rather than truly understanding underlying concepts. Thus, when faced with something new and not seen in the training data, LLMs’ performance often drops significantly, making them unreliable in unfamiliar situations (see LLMs, the Brain and Reason: A Complicated Relationship).

The hallucination and reliability issues in LLMs can cause mistakes when an agent extracts information from databases, reasons through problems, or chooses tools [7]. Even with optimized prompting, these problems are not fully resolved.

These issues become even more pronounced in multi-agent systems, where each step introduces another chance for error, and mistakes can accumulate as agents pass wrong information between each other. Furthermore, current memory systems are not advanced enough to support complex planning or reasoning over long periods.

Another challenge is that as agent systems become more complex, it gets harder to evaluate their performance and identify the source of errors. Additionally, using agents can be expensive, since completing a task often requires many more LLM calls, driving up costs.

Two recent studies highlight these problems. One study found that the main issues in agent systems are LLMs not following prompts, failures in agent communication, and verification systems that do not catch errors [9]. As a result, overall task success rates are often below 50%.

Another study simulated a small software company run entirely by AI agents ("TheAgentCompany”) in different roles like CTO, data science, HR and finance [10]. Even with specialized tools and communication systems, the best result was only 30% correct task completion, raising doubts about claims that AI agents can replace human workers in the near future. The study also confirmed that running multi-agent systems is expensive.

The Brain Remains Out of Reach

While excitement about AI agents is currently on the rise, significant risks remain due to the inconsistent functioning of LLMs. They may boost productivity in low-risk situations, but in high-stakes applications where reliability, trust, and legal compliance are critical, their limitations are a serious concern. Overall, current AI agents with their LLM “brains” are still far from matching the reasoning and goal-achieving abilities of the human brain (see LLMs, the Brain and Reason: A Complicated Relationship for a more detailed explanation).


References

1. TURING AM. I.—COMPUTING MACHINERY AND INTELLIGENCE. Mind. 1950;LIX(236):433-460.

2. Kambhampati S, Stechly K, Valmeekam K, et al. Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces! Published online May 27, 2025. https://arxiv.org/pdf/2504.09762v2

3. Kautz HA. The Third AI Summer: AAAI Robert S. Engelmore Memorial Lecture. AI Mag. 2022;43(1):105-125.

4. Kahneman D. Thinking, Fast and Slow. Farrar, Straus and Giroux; 2013.

5. Schwabe L, Wolf OT. Stress and multiple memory systems: From “thinking” to “doing.” Trends Cogn Sci. 2013;17(2):60.

6. Wu Y, Liang S, Zhang C, et al. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs. Published online April 23, 2025. https://arxiv.org/pdf/2504.15965v2

7. Krishnan N. AI Agents: Evolution, Architecture, and Real-World Applications. Published online March 16, 2025. https://arxiv.org/pdf/2503.12687

8. Sapkota R, Roumeliotis KI, Karkee M. AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges. Published online May 15, 2025. https://arxiv.org/pdf/2505.10468v1

9. Cemri M, Pan MZ, Yang S, et al. Why Do Multi-Agent LLM Systems Fail? Published online March 17, 2025. https://arxiv.org/pdf/2503.13657

10. Xu FF, Song Y, Li B, et al. TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks. Published online December 18, 2024. https://arxiv.org/pdf/2412.14161