Your Agent Has Amnesia: Why Short-Term Memory Matters More Than You Think

How agent frameworks like Vercel AI and Autogen approach memory and why forgetting can break reasoning. This isn’t about vector databases — it’s about the kind of memory agents need to think between runs. Short-Term Memory Keeps LLMs Thinking Straight Try solving a problem without being able to remember the last step you took. You’d […]

How agent frameworks like Vercel AI and Autogen approach memory and why forgetting can break reasoning. This isn’t about vector databases — it’s about the kind of memory agents need to think between runs.

Short-Term Memory Keeps LLMs Thinking Straight

Try solving a problem without being able to remember the last step you took. You’d never be able to reach an answer.

Large Language Models (LLMs) face the same challenge. They need a way to retain recent steps to stay logical. Short-term memory gives LLMs that continuity. It helps them remember what was said a few turns ago so each new response fits naturally. Without it, every prompt would stand alone.

A Simple Example: How ChatGPT Keeps Track of a Conversation

To see how this works in practice, let’s take ChatGPT as an example.

Every time you send a message (“explain this”, “what’s that”), ChatGPT runs a brand-new pass of the model. Behind the web interface, it looks something like:

response = runModel({
  model: "gpt-4o",
  messages: [
    { role: "user", content: "Explain gravity" }
  ]
})

The model finishes running and returns with a response, “Gravity is the force that pulls objects toward each other.”

When you follow up with another message, the model runs again. But this time, let’s remind it of what’s already been said:

response = runModel({
  model: "gpt-4o",
  messages: [
    { role: "user", content: "Explain gravity" },
    { role: "assistant", content: "Gravity is the force that pulls objects toward each other." },
    { role: "user", content: "How does that keep the moon in orbit?" }
  ]
})

Notice how the previous question and answer was fed back into the model before asking the new question.

Think of it this way:

Each time you send a new message, the model starts fresh. It’s not constantly running. But it stays coherent because it’s given context to past conversation history.

This is short-term memory in action.

Enter the Agent: Giving LLMs the Power to Act

AI agents are built on top of LLMs. You can think of them as LLMs with a toolbox strapped on — the same language engine, but now with the ability to plan and act. Instead of just generating text, agents can execute functions, call APIs, or use tools to complete goals in the real world.

As agents take on multi-step tasks, their short-term memory needs to stretch beyond the conversation itself. It must also keep track of what tools were used, how they were used, and what results came back.

This isn’t unique to just ChatGPT, or OpenAI. Every AI framework needs some way to preserve and reuse context between turns – they just give it different names. Autogen, AWS Strands, and CrewAI call it ‘memory’. Vercel AI calls it ‘conversation history.’ Google ADK calls it ‘session.’

Expanding Short-Term Memory for Agents

Our earlier example demonstrates how simple chatbots work. The model reads whatever context it was given, predicts the next response token by token, and stops. It has no access to tools, or any other means to execute external, complex tasks.

When an agent is equipped with tools and external resources, a call to runModel() can quickly become more than simple text generation. Within one call, the agent could execute a number of tools, forming new decisions step-by-step, executing functions call by call, until it returns a final response.

In other words, agents are doing more than responding with words. So we need to give them more context than simple message history.

This additional context can take the forms of:

  • Tool calls (requests and responses)
  • Agent reasoning steps
  • MCP server resources
  • User and system prompts

At first glance, keeping all this context may seem excessive. But whether it’s excessive depends on how capable you want your agent to be.

The Hidden Problem: Agents That Forget Mid-Task

As agents take on more complex tasks, some of those tasks can’t be completed within a single runtime (i.e. a single runModel() call). In those cases, the agent has to restart itself mid-task and resume from where it left off, making proper memory management not just useful, but essential.

To show what that looks like, let’s walk through a concrete example.

I was building a general purpose e-commerce agent that could make a purchase for any user request, from “Buy me white sneakers” to “Buy me a dining table.”

I gave the agent the following system prompt:

You are a helpful e-commerce agent. Find a relevant merchant, then use that merchant’s tools to complete the checkout.

The idea is that the agent starts by calling the Skyfire Identity & Payment MCP server, which holds a list of merchant MCP servers (shoe websites, furniture websites, etc). Once it identifies the best-suited merchant, the agent reinitializes itself with that merchant’s MCP toolset and uses it to complete a purchase.

In this context, reinitialize means “stop runModel() and runModel() again with a bigger toolset.” Why? Because most agent frameworks have a static toolkit during runtime. So when the agent (still mid-task, trying to finish a purchase) calls Skyfire’s MCP and discovers the merchant’s MCP, it can’t install the merchant’s MCP tools within the same execution. It has to stop and reinitialize itself. (We dive deeper into this in From Discovery to Execution: How AI Agents Dynamically Install Tools.)

Reinitializing solved the problem of loading new tools, but it introduced another: every time the agent rebooted, it forgot that it already had found a merchant. So it called the Skyfire MCP again, found the same merchant again, restarted again… and again. Groundhog Day, except Phil doesn’t remember he’s reliving the same day.

In this example, the agent needed to restart itself mid-task so it could install additional MCP tools. In other cases, it might need to install new static SDK tools, handle long-running asynchronous calls, or dynamically inject user input during execution.

Whatever the scenario, there will be times when your agent has to restart. And when it does, preserving memory becomes critical. Developers need a clear understanding of their agent’s intended capabilities, because that determines how much context it should retain and what kind of memory strategy it requires.

Preserving Context with Vercel AI

Take a look at an example using Vercel AI’s SDK. Let’s initialize our agent’s context with one MCP server and a system prompt.

import { CoreMessage } from "ai";

// conversation_history: CoreMessage[]
const agentContext = {
  available_mcp_servers: [
    { url: process.env.SKYFIRE_MCP_URL || "", headers: { "skyfire-api-key": apiKey }},
  ],
  conversation_history: [
    {
      role: "system",
      content: "You are a helpful e-commerce agent. Find a relevant merchant, then use that merchant’s tools to complete the checkout."
    }
  ]
};

Here, conversation_history is typed as CoreMessage[], which Vercel AI’s SDK uses to represent a conversation. Each message includes a role (system, user, or assistant) and content (text or other media parts). You can’t just pass in any object or type and expect the model to understand it — the agent needs data structured in the format its framework expects.

Now, we need to run the agent. This is where conversation_history gets updated and used.

import * as ai from "ai";
import { wrapAISDK } from "langsmith/experimental/vercel";

const { generateText } = wrapAISDK(ai);

async function runAgent(userInput: string, agentContext: AgentContext) {
  const allTools = await prepareAllTools(agentContext);

  // Add user input to the agent's context
  agentContext.conversation_history.push({
    role: "user",
    content: userInput,
  });

  // Run the agent with current context and available tools
  const { text: answer,
    usage,
    steps,
    response,
  } = await generateText({
    model: modelWithTracing,
    maxTokens: 5000,
    tools: allTools,
    maxSteps: 20,
    messages: agentContext.conversation_history,
  });

  // Update context with the agent's latest messages
  agentContext.conversation_history.push(...response.messages);

  // Return final result and updated context
  return JSON.stringify({ answer, steps, usage, agentContext });
}

The function generateText() is in charge of the agent’s reasoning loop. It decides which tools to call, executes them if needed, and returns a structured response (e.g. the answer to “Explain gravity”). It has a messages field that accepts an array of message objects in the same structure used by CoreMessage.

When runAgent() executes, two key things happen.

  1. Before generateText() runs, conversation_history is passed into the model.
  2. After generateText() completes, the agent’s new messages (including reasoning steps and tool outputs) are appended to its conversation history.

This loop of passing in context, generating a response, and updating memory is what allows the agent to build short-term memory. It’s what lets it think in steps, reference past work, and move toward more complex reasoning instead of starting from zero each time.

Implementing Memory in Autogen

Let’s take a look at an example with Autogen, where conversation_history plays the same conceptual role as in the Vercel AI example. The difference is that Autogen uses a ListMemory object to manage the agent’s memory. That’s its built-in way of storing and recalling context.

from autogen_core.memory import ListMemory, MemoryContent
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core import CancellationToken

# Initialize memory
conversation_history = ListMemory()

# Define and run the agent
async def run_agent():
  agent = AssistantAgent(
    name="autogen_commerce_agent",
    model_client=OpenAIChatCompletionClient(
      model="gpt-4o",
      api_key=os.getenv("OPENAI_API_KEY"),
    ),
    tools=all_tools,
    memory=[conversation_history],
    reflect_on_tool_use=True,
    system_message="You are a helpful e-commerce buyer agent assisting the user with purchases.",
  )

  # Run the agent task
  result = await agent.run(
    task="Buy me white sneakers.",
    cancellation_token=CancellationToken()
  )

  # Collect and store tool call information in memory
  for msg in result.messages:
    if msg.type == "ToolCallExecutionEvent":
      for res in msg.content:
        log_str = f"Tool: {res.name}\nOutput: {res.content}"
        await conversation_history.add(MemoryContent(
          content=log_str,
          mime_type="text/plain",
          metadata={"role": "assistant"}
        ))

Take a look at the for loop. This is the heart of this agent’s reasoning ability.

Each ToolCallExecutionEvent represents a moment when the agent invoked a tool (e.g. querying a catalog or checking out an order). The loop iterates through these events, extracts the tool name and its output, and stores them in memory as MemoryContent objects.

By recording tool activity in this structured way, the agent maintains a detailed log of what actions were taken and what results they produced. This lets it:

  • Reference past tool calls to avoid repetition,
  • Build multi-step reasoning chains, and
  • Improve decisions based on prior outcomes.

Put simply, this memory makes the agent tool-aware. It doesn’t just recall text, but remembers what tools it used, what they returned, and how those outcomes shaped its reasoning.

Conclusion: Memory Defines Intelligence

Memory is that scaffolding that holds reasoning together.

A chatbot relies on short-term memory to stay coherent from one message to the next. Similarly, an agent needs to remember what tools it called, what results came back, and what decisions it made along the way.

When your agent can recall its own reasoning, it becomes reliable. It can reinitialize tools without losing progress, recover from interruptions, and move through complex goals step by step.

Ultimately, intelligence isn’t just about producing the right output. It’s about remembering enough to know why that output made sense.

Because without memory, even the smartest agent is just guessing.

AI merchant service

Agent Tool Calling Chaos when MCP Server Tools have Optional or Nullable Parameters

When building MCP servers, you quickly realize something: the devil really is in the details. One issue we’ve encountered is how differently LLMs and agent frameworks handle optional and nullable parameters in MCP server tools. At first glance, this might sound like a subtle implementation detail. But as we learned—through multiple iterations across OpenAI, Anthropic, […]
Read more

Join Our Community of Innovators

Stay updated with the latest insights and trends in AI payments and identity solutions.