When building AI agents, it’s tempting to feed the model every bit of context you have – all the tools, all the data, all the instructions – in hopes that “more context == better results.” In theory, a larger context window and more tools should make an agent more capable. In practice, we’ve learned the hard way that where and how you give context to an agent can make or break its performance.
In this post, we’ll explore why context placement matters and how to diagnose and fix the coordination problems that arise when agents juggle complex prompts, multiple tools, and large Model Context Protocol (MCP) server interactions. We’ll walk through real examples of failures – an agent skipping a required restart instruction, tools being invoked out of order, irrelevant actions being taken – and the strategies that got things back on track. Our focus is on the downstream effects of context changes (like adding big knowledge files or many tools) on the rest of the agent’s behavior across prompts and tool calls, and how to engineer prompts, tools, and MCP servers to keep your agent reliable.
Let’s dive into why “more context” can sometimes become “too much context,” and what to do about it.
The Context: More Tools, More Data, More Problems
With the rise of large context windows and the adoption of MCP servers for external tool use and resources, agents now have access to unprecedented amounts of information. You can connect an agent to multiple MCP servers (each exposing dozens of tools) and put long documents or API specs into its prompt. In theory, this means an AI agent could have everything it needs at its fingertips. In practice, overflowing the context can confuse the model and degrade performance.
Research backs this up. An LLM study aptly titled “Lost in the Middle” showed that large language models often fail to use long contexts effectively. They do best when the relevant information is at the beginning or end of the prompt, and accuracy plummets when critical info is buried in the middle of a long context. In other words, if your key instruction is lost among hundreds or thousands of tokens of other content, the model might not grasp it at all. Another paper found that irrelevant context can distract LLMs and reduce their accuracy – adding extraneous text caused significant drops in task performance for many models (e.g. adding a bunch of “zzz’s” to the end of a prompt). Even though these findings took place in 2023 and 2024, models still struggle with these issues even as we reach 1M token context windows and beyond.
And it’s not just instructions that can suffer from excess context – tool usage suffers too. An agent faced with dozens of tools in its toolkit has to pick the right one from a long list. For instance, an analysis of the Berkeley function-calling benchmark shows that many model’s performance worsens when provided with more than one tool. Statistically, the agent is not always going to pick the right tool, so more options naturally results in more errors.
In one experiment by the University of Texas at Austin in 2025, a small model was given 46 tools to choose from (all within its context window) and it failed completely, but when the list was trimmed to 19 tools, it succeeded – even though both cases were under the token limit. The extra tools weren’t exceeding the raw capacity, but they exceeded the model’s ability to focus. As one analysis put it: if you include extraneous tool definitions in the prompt, “the model has to pay attention to it” – and it often ends up using tools that aren’t even relevant . In other words, flooding the agent with 100 unrelated capabilities doesn’t make it smarter; it just gives it more ways to get distracted and go off-script. Cursor, the AI IDE of choice, has even put a hard cap of 40 tools that the Cursor agent can have access to at once. It’s good to remember that, although you don’t generally insert the tools into the context directly, that is what the agent frameworks and LLM providers are doing behind the scenes.
All this reinforces a counterintuitive truth: more context or more tools can actually make your agent less capable, not more, if not managed carefully. The negative effects of “information overload” afflict both humans and their AI agents!
Context Management Issues and Solutions
Important instructions get overlooked
Important instructions being overlooked is one of the most common modes of agent failure. Let’s go through an example from my own agent development experience.
Context on the setup: This situation involved two of Skyfire’s MCP servers. The demo server returns only a single, pre-selected seller service when queried—essentially a controlled environment for testing. The official server, on the other hand, returns the full catalog of all available Skyfire seller services (think of seller services as merchants providing some kind of service to buyer agents).
We were switching an agent demo from pointing to the demo MCP server to pointing to Skyfire’s official MCP server. This meant the agent suddenly had to process a much larger list of available seller services, whereas before it only ever saw the one specific service it needed.
The list of seller services was exposed via a resource on our MCP server, which we decided to append to the system prompt. This very quickly proved to be a problem.
Here’s the system prompt for the agent. The prompt uses a prompt engineering technique that leverages the model’s coding abilities: structuring different sections with HTML-style tags like <section></section>. Since these models excel at parsing code syntax, the tag-based structure helps them cleanly distinguish between different parts of the prompt.
<setup>
You are connected to tools from MCP servers and hosted OpenAPI specs (jsons)
and are solving problems step by step. To use an OpenAPI spec can do and convert it to a tool, use the convertOpenApiSpecToAgentTool tool.
Make sure to inlcude the openapi.json at the end of urls
</setup>
<procedures>
Remember to use only the create-kya-pay-token tool from skyfire before using an external service tool call
</procedures>
MCP servers have /mcp or /sse at the end, if it's a json then you would connect via the OpenAPI tool.
<terminate>
When connect-mcp-server-tool tool is executed, stop the processing immediately.
You can execute multiple convert-openapi-spec-to-agent-tool calls in sequence, but after all OpenAPI conversions are complete, stop processing.
</terminate> {inserted MCP seller service resources list} Services you can access: { "data": [ { "id": "123abc...", "name": "exampleServiceName", "description": "Short summary of what the service does...", "price": "0.001", "priceModel": "PAY_PER_USE", "seller": { "id": "seller123...", "name": "Example Seller", }, "websiteUrl": "https://example.com/...", "acceptedTokens": ["kya", "pay", "kya+pay"], "termsOfService": { "url": "https://example.com/terms", "required": true }, "openApiSpecUrl": "https://example.com/spec/openapi.json", } ... other services ] }
The above prompt instructs the agent to stop processing after all calls to convert-openapi-spec-to-agent-tool are complete (this tool allows the agent to turn OpenAPI specifications into tool calls). Most agent frameworks lock in their toolkit at the start of each run, which prevents the adding of new tools mid-execution. To use newly discovered tools, the agent must stop and restart with an updated toolkit.
When the agent only had access to the small list of sellers, it properly followed this restart instruction. After giving it the larger list of sellers, the agent forgot to follow the restart instruction and proceeded to fire off a bunch of irrelevant tool calls. To confirm that it was the additional information added to the system prompt causing this issue, we added the restart instruction to the end of the MCP resource. That meant the restart instruction was no longer sandwiched between initial system instructions and the appended MCP server resources. It was now at the end of the system prompt and just by changing the location of the instruction, the agent was able to restart correctly.
<setup>
You are connected to tools from MCP servers and hosted OpenAPI specs (jsons)
and are solving problems step by step. To use an OpenAPI spec can do and convert it to a tool, use the convertOpenApiSpecToAgentTool tool.
Make sure to inlcude the openapi.json at the end of urls
</setup>
<procedures>
Remember to use only the create-kya-pay-token tool from skyfire before using an external service tool call
</procedures>
MCP servers have /mcp or /sse at the end, if it's a json then you would connect via the OpenAPI tool.
{inserted MCP seller service resources list}
Services you can access:
{
"data": [
{
"id": "123abc...",
"name": "exampleServiceName",
"description": "Short summary of what the service does...",
"price": "0.001",
"priceModel": "PAY_PER_USE",
"seller": {
"id": "seller123...",
"name": "Example Seller",
},
"websiteUrl": "https://example.com/...",
"acceptedTokens": ["kya", "pay", "kya+pay"],
"termsOfService": { "url": "https://example.com/terms", "required": true },
"openApiSpecUrl": "https://example.com/spec/openapi.json",
}
... other services
]
}
<terminate>
When connect-mcp-server-tool tool is executed, stop the processing immediately.
You can execute multiple convert-openapi-spec-to-agent-tool calls in sequence, but after all OpenAPI conversions are complete, stop processing.
</terminate>
We wanted to figure out another way to fix the agent’s behavior without having to change the order or location of the instructions in the system prompt. We thought about how LLMs worked and realized that the description and output of the convert-openapi-spec-to-agent-tool that ran immediately prior to restarting was critical to getting the agent back on track. Agents do their best to follow their system prompts at all times, but tool descriptions and outputs also play a big part in determining what agents choose to do next.
We added the line “stop execution after this tool” to the tool description and the agent began executing as intended again.
"convert-openapi-spec-to-agent-tool": {
description: "Gets the OpenAPI spec URL prompted by the user. Stop execution after this tool",
parameters: jsonSchema({
type: "object",
properties: {
openApiSpecUrl: {
type: "string",
description: "URL for OpenAPI spec - ends in a .json",
},
serviceName: {
type: "string",
description: "Name of the service corresponding to the OpenAPI spec",
}
},
required: ["openApiSpecUrl", "serviceName"],
additionalProperties: false,
}),
execute: async () => {
return {
content: [
{
text: "Converting OpenAPI spec to tools...",
},
],
};
},
},
The key was placing the relevant instruction as close to the action taken as possible. If you want your agent to restart after a specific tool is run– make it implicit in the tool call. However, this approach has a drawback: it intermixes agent control logic with tool definitions, which can make your codebase harder to maintain as complexity grows.
In this case, for the sake of maintaining a clear separation between prompt instructions and agent tooling, we could also have created a stop-execution tool. By formalizing the process of stopping as a tool, the agent is doubly aware of this option at all junctions where it needs to make a tool call. In general, having ways to stop is very important for agent development.
Many agent frameworks, like AWS Strands, recognize this and provide a built-in stop tool to help. The Vercel AI SDK didn’t have a dedicated stop tool, but it was quite easy to write my own.
"stop-execution": {
description: "Stop the agent's execution and return control to the user. Use this when you need to restart, when you've encountered an error that requires user intervention, or when you've completed the task at hand.",
parameters: jsonSchema({
type: "object",
properties: {
reason: {
type: "string",
description: "Explanation for why execution is stopping (e.g., 'Task completed successfully', 'Need user input', 'Error encountered')",
}
},
required: ["reason"],
additionalProperties: false,
}),
execute: async ({ reason }) => {
return {
content: [
{
text: `Stopping agent execution. Reason: ${reason}`,
},
],
};
},
},
I tested this approach using the initial system prompt. With the new stopping tool present, the agent now correctly halted execution after calling convert-openapi-spec-to-agent-tool, without requiring any modifications to the tool definition or the order of instructions in the system prompt. Although this approach worked well when tested, for the final implementation I opted to stick with adding a reminder in the convert-openapi-spec-to-agent-tool tool description as it was simple and worked reliably.
In our original setup, our agent was forgetting its stopping criteria due to context overflow. In this section, we explored three solutions to this problem. In solution one, we moved the stopping instructions to the bottom of the system prompt. In solution two, we kept the prompt the same and added a restart reminder in the convert-openapi-spec-to-agent-tool tool description. In solution three, we also kept the prompt the same and added an additional stop-execution tool to the agent’s toolkit. All three solutions resulted in reliable stopping of the agent at required times.
If an agent can decide what to do when and has the proper tool call for it, everything works! However, in practice, instructions and reminders can get lost in context. To remind your agents what to do, place the necessary instructions closer and closer to when the action is being taken. In this case, we moved the instruction all the way to the tool call right before stopping the agent. This may not be the right proximity for your agent, but the general strategy still applies!
Tools are used out of order
Another symptom of context overload or misplacement is the agent doing things that technically work but are not in the order or manner the human designer expected. We encountered this when building an e-commerce checkout agent.
The normal flow was: add item to cart, create a payment token, ask for billing/shipping info, then checkout. This was the system prompt for the agent:
You are a helpful e-commerce agent that helps a user shortlist from a catalog to a single product. When connect-mcp-server-tool tool is executed, stop the processing. If the user wants to buy a product, do not ask for confirmation.
Because multiple valid sequences could lead to a successful checkout, the agent didn’t always follow the same order. For example, it might request the user’s address before creating the payment token and still complete the purchase without issue.
From the model’s point of view, either sequence was acceptable. But for our testing purposes, we wanted a consistent, fixed call sequence with token creation coming before billing details. We’d kept the system prompt intentionally broad to make the agent more general purpose, but that meant we couldn’t hard-code a specific action sequence. Still, we needed a way to ensure it reliably followed the intended order without sacrificing flexibility.
The fix turned out to be simple: all it took was a small cue in the tool output of the add-to-cart step by appending a line like “Now creating a Skyfire KYA+PAY token…” to the tool’s success message. By placing that hint immediately where the agent would see it next, the LLM stuck to the intended sequence and would always begin creating the payment token right after the items were added to cart and then proceed with asking for the user’s shipping information. This process is similar to leaving a trail of breadcrumbs for the LLM to follow to produce the outputs that align with expectations. The approach can also help sequence tool calls that should generally be called one after the other.
This is a great example of how tool outputs influence agent behavior. Because we placed the contextual instruction right before the moment of action, the agent followed it. When that hint was absent, too far away, or not explicit, the agent followed its own intuition and reasoning. In the agent’s frame of reference, collecting billing information probably makes more sense before doing anything payment related like creating a payment token because that’s how modern checkouts work across e-commerce!
this.server.tool(
"add-to-cart",
`Adds a product to the user's session-associated cart. Requires a valid session token.`,
{
session_token: z.string().describe("User session tokenr"),
product_url: z.string().describe("Product URL"),
},
async ({ session_token, product_url}) => {
const session = await kvGetSession(session_token);
if (!session) {
return { content: [{ type: "text", text: "Invalid or expired session token." }] };
}
session.cart.push({ product_url });
await kvPutSession(session_token, session);
return {
content: [{ type: "text", text: `Added to cart: ${product_url}\n Now creating a Skyfire KYA+PAY token...` }],
};
}
);
In some cases, agents plan out their steps and tool use before they begin tool calling in what is generally referred to as a planning phase. In such agents, you can often have another LLM vet the plan and engage in plan refinement to fix any inconsistencies in what tools are being called when. If your agent is not doing a planning step where it maps all the tool calls out ahead of time and gets feedback, the descriptions and outputs of one tool call will significantly influence the next sequence of tools that are called. Putting the instructions so that they are proximal to the next action you want taken, increases the likelihood of agents to execute as expected. The outputs of the prior tool call are as close as you can get and a very strong determinant for what will happen next.
This learning offers an approach that tool, API and MCP server developers, can take to ensure that agents use their tools as expected and in the order expected. The combination of tool descriptions, tool outputs and MCP server resources give these developers a powerful set of controls to nudge agents towards success.
Agents trigger tools wrong or the wrong tools
Calling the Wrong Tools
The LLM research community has found that smaller models are especially bad at juggling multiple tools. When given a large toolbox, a weaker model might invoke tools that have nothing to do with the query, simply because they’re there. This is a direct result of context confusion: the model has too many potential actions in mind and can’t reliably filter out the irrelevant ones.
Recall the first example, where the agent missed its restart cue after being given a larger list of sellers to parse through. The agent called irrelevant tools and connected to other unnecessary services in order to try and complete the task at hand.
LLMs can have trouble knowing when to stop and sometimes the stopping condition is not so simple as after a specific tool call. There are different ways to help them stop in these circumstances. One way is to impose a tool calling budget, which is a limit on how many tool invocations the agent can make in one run (a limit which is tuned to the task the agent does). A more targeted constraint is a maximum chain depth, which restricts how many consecutive tool calls can occur before the agent must stop, restart, or take a moment to think. You can also monitor the agent’s certainty in the steps it’s taking and stop it if the confidence drops below a threshold (either via logits, number of reasoning steps, or tool-choice confidence).
To mitigate the number of wrong tools called in general there’s the practice of tool RAG. In the same way, we do RAG to only input relevant information into the model’s context window from databases, tool RAG limits the number of tools the model considers before a tool call, decreasing the chances of errant calls. The approach is relatively new, but is already being adopted by platforms like llamaindex and others.
Implementing these safeguards can prevent your agents from spiraling into repeatedly calling the wrong tools. Now, let’s take a look at how to mitigate cases where your agent is calling the right tools, but with the wrong arguments.
Calling Tools Wrong
In our official MCP server, we have Skyfire tool calls to create kya tokens, which takes an argument sellerServiceId.
export async function createKYAToken(
params: {
sellerServiceId: string
buyerTag?: string | null
expiresAt?: number | null
identityPermissions?: string[] | null
},
apiKey: string
): Promise<ToolResponse> {
...
}
In the list of services we were exposing to the model, each service had an id (the sellerServiceId) and an attached seller key with an object describing the seller, which also had an id. The model was unable to differentiate between the ids and would pass the seller ID as the seller service ID. The easy fix here was to trim down the context being passed in the model and omit any non-important keys, especially if they have naming overlaps or similarities to a tool call argument. We don’t want to expose the LLM to conflicting information if we can avoid it.
We have also gotten into the habit of creating an MCP mcp://guide resource for every MCP server we create. The guide gives an overview of the tools and resources available in the server and common flows an agent should be aware of as well as context on the tool inputs, so it can pattern match to what’s being requested of it.
Both of these solutions in conjunction increased our agents ability to correctly call the tools at its disposal to 100%.
Learnings
As we’ve built increasingly complex agents that juggle prompts, tools, and MCP servers, several key principles have emerged from our trials and errors:
- Put context as close to the action taken as possible
- The most impactful instructions are those placed directly where they’re needed. Don’t bury critical directives in a long system prompt—embed them in tool descriptions, tool outputs, or as contextual hints right before the action. The agent’s attention is highest when it’s actively considering or executing a specific action, so that’s when it’s most receptive to guidance.
- The descriptions and outputs of one tool call significantly influence the next sequence of tools
- By crafting tool outputs that explicitly suggest or describe the next step, you create a breadcrumb trail that keeps the agent on track. Think of each tool output as a chance to whisper in the agent’s ear about what should come next.
- Trim down context you are passing to the model
- Remove unnecessary keys from data structures before inserting them into the context window. Every extra field is potential noise (your agent is looking for needles in a bigger and bigger haystack). If the agent doesn’t need to know the seller’s internal database ID or other fields don’t include it. Focus on signal over completeness.
- Create MCP guides to reinforce tool use patterns and tool definitions
- A dedicated guide resource can serve as a persistent reference document that the agent can consult. Include common workflows, tool calling sequences, and clarifications about confusing parameters. This helps agents pattern match to whatever workflow the user is requesting of them.
- Use tool RAG to reduce the number of tools exposed at any given invocation
- Don’t show the agent 100 tools when it only needs 5. Implement semantic search over your tool definitions and dynamically retrieve only the most relevant tools for each user query. This keeps the agent’s focus sharp and reduces the likelihood of calling irrelevant tools.
- Implement guardrails to prevent cascading failures
- Agents will make mistakes, especially when uncertain. Rather than hoping they self-correct, implement hard external limits: tool call budgets, chaining depth limits, and confidence thresholds. These guardrails transform unpredictable agent behavior into controlled, debuggable systems.
- Context cleanliness is as important as context placement
- Having the right information in the right place is useless if it’s surrounded by noise. Ambiguous naming, redundant fields, and conflicting terminology all degrade agent performance. Treat your context window like a carefully curated dataset, not a data dump.
The Takeaway
As AI agents become more capable and context windows expand to millions of tokens, it’s tempting to give agents access to everything and let them figure it out. Despite improving reasoning capabilities, this approach fails in production. Constraint and clarity consistently outperform abundance and ambiguity.
The most reliable agents aren’t those with access to the most tools or the longest context windows—they’re the ones with carefully curated context, strategically placed instructions, and intelligent guardrails that prevent them from wandering off the path.
The rise of MCP and tool-calling frameworks is democratizing access to powerful agentic capabilities, but it’s also introducing new failure modes that catch developers off-guard. You can’t just plug in an MCP server with 50 tools and expect your agent to gracefully juggle them all. You need to think carefully about:
- What information to expose (signal vs. noise)
- Where to place instructions (proximity to action)
- How to structure data (clarity and disambiguation)
- When to intervene (guardrails and validation)
The agents that work in production are those that respect the cognitive limitations of even the most advanced LLMs. They recognize that attention is finite, that context can overwhelm as easily as it can empower, and that a well-placed instruction is worth a thousand tokens of background documentation.
As you build your own agents, remember: more isn’t always better. Start with less—fewer tools, clearer instructions, cleaner data—and add complexity only when you’ve proven your agent can handle what it already has. Place your instructions where they’ll be seen knowing the nuances of LLMs, put reminders in proximity to when the agent is taking action, trim your context ruthlessly to avoid needle in a haystack problems as much as possible, and build guardrails that keep your agent from spiraling into confusion.
The future of AI agents isn’t just about maximizing context windows—it’s about maximizing the signal-to-noise ratio within them. Engineer your context as carefully as you engineer your prompts, and your agents will reward you with reliability, consistency, and the kind of performance that actually works in the real world.