Tools 6 min read

Solving MCP Context Bloat with Claude's Tool Search API

Solving MCP Context Bloat with Claude's Tool Search API
Learn how Claude's new Tool Search feature dynamically loads only the tools you need, cutting context usage from 46K tokens to under 500.

If you’ve been working with the Model Context Protocol (MCP), you’ve likely encountered a frustrating bottleneck: Context Bloat.

When you connect a powerful MCP server—like the GitHub MCP—it typically loads the definitions for every single tool available. For a server with 91 tools, this can consume roughly 46,000 tokens just for the definitions. That’s a massive chunk of your context window gone before you’ve even sent a single message.

Anthropic has addressed this challenge with a powerful new feature in the Claude API called Tool Search. While this isn’t a change to the MCP protocol itself, it’s a game-changer for anyone building MCP-powered agents with Claude.

Illustration of Context Bloat
The Struggle is Real: Carrying 91 tools when you only need one.

Tool Search is a Claude API feature that allows the model to dynamically discover and load tools on-demand, rather than pre-loading everything at the start. Instead of stuffing all tool definitions into the context window upfront, Claude searches your tool catalog and loads only the 3-5 most relevant tools required for the current task.

This solves two critical challenges:

  • Context Efficiency: Tool definitions can consume massive portions of your context window (50 tools ≈ 10-20K tokens)
  • Tool Selection Accuracy: Claude’s ability to correctly select tools degrades significantly with more than 30-50 tools loaded at once
💡

Key Point: Tool Search is a Claude API feature, not an MCP protocol change. It works seamlessly with MCP servers, but the search logic runs on Claude’s side.


Supported Providers & Models

Tool Search is currently in public beta and requires specific beta headers depending on your provider:

ProviderBeta HeaderSupported Models
Anthropic APIadvanced-tool-use-2025-11-20Claude Opus 4.5, Claude Sonnet 4.5
Microsoft Foundryadvanced-tool-use-2025-11-20Claude Opus 4.5, Claude Sonnet 4.5
Google Cloud Vertex AItool-search-tool-2025-10-19Claude Opus 4.5, Claude Sonnet 4.5
Amazon Bedrocktool-search-tool-2025-10-19Claude Opus 4.5 only
⚠️

Bedrock Users: Tool Search is only available via the Invoke API, not the Converse API.


The Two Flavors of Search: Regex vs. BM25

Claude offers two distinct search methods, each optimized for different tool naming conventions:

1. Regex Search (tool_search_tool_regex_20251119)

Best For: Tools with a strict, consistent, and predictable naming structure.

How It Works: Claude constructs regex patterns using Python’s re.search() syntax to find matching tools.

Code
# Example patterns Claude might use:
"weather"                    # Matches tools containing "weather"
"get_.*_data"                # Matches get_user_data, get_weather_data
"database.*query|query.*database"  # OR patterns for flexibility
"(?i)slack"                  # Case-insensitive search
  • Pro: Extremely precise for structured APIs like stripe_customer_get or aws_ec2_stop
  • Con: Fails if tool names are inconsistent or ambiguous

2. BM25 Search (tool_search_tool_bm25_20251119)

Best For: Tools with natural language names or semantically meaningful descriptions.

How It Works: Claude uses natural language queries, and a BM25 relevance ranking algorithm matches them to your tool definitions.

  • Pro: More flexible; handles synonyms and natural language variations
  • Con: Slightly less precise than exact pattern matching
Tool Search Solution
Tool Search to the rescue: Finding the right needle in the haystack.

How the defer_loading Flag Works

The magic happens with a simple property: defer_loading: true. When you mark a tool with this flag, Claude won’t load its definition into context until it’s discovered via search.

Code
{
  "name": "get_weather",
  "description": "Get weather for a location",
  "input_schema": { ... },
  "defer_loading": true  // 👈 Load only when needed
}

Key Rules:

  • The Tool Search tool itself must never have defer_loading: true
  • Keep your 3-5 most frequently used tools as non-deferred
  • At least one tool must be non-deferred (or you’ll get an error)

Implementation Guide

Here’s how to implement Tool Search with the Claude API:

Step 1: Add the Beta Header

Code
# For Anthropic API / Microsoft Foundry
"anthropic-beta: advanced-tool-use-2025-11-20"

# For MCP integration, add both:
"anthropic-beta: advanced-tool-use-2025-11-20,mcp-client-2025-11-20"

Step 2: Include the Search Tool

Code
{
  "type": "tool_search_tool_regex_20251119",
  "name": "tool_search_tool_regex"
}

Step 3: Mark Tools for Deferred Loading

Code
tools = [
    # Search tool - always loaded
    {"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
    
    # Frequently used - keep loaded
    {"name": "navigate", "description": "...", "defer_loading": False},
    
    # Everything else - load on demand
    {"name": "github_create_issue", "description": "...", "defer_loading": True},
    {"name": "github_list_repos", "description": "...", "defer_loading": True},
    # ... 88 more tools
]

MCP Server Integration

When working with MCP servers, you can use mcp_toolset to defer all tools from a server by default:

Code
{
  "type": "mcp_toolset",
  "mcp_server_name": "github-server",
  "default_config": {
    "defer_loading": true
  },
  "configs": {
    "search_issues": {
      "defer_loading": false  // Override: keep this one loaded
    }
  }
}

When Tool Search Isn’t Available: Manual Curation

Not every environment supports Claude’s Tool Search yet. For developers building agents in Microsoft Copilot Studio, you’ll need a different approach.

The Alternative: Design-Time Tool Selection

While Copilot Studio doesn’t yet have dynamic tool search, it offers granular tool configuration that lets you manually control which tools are available to your agent.

When you connect an MCP server in Copilot Studio, every tool appears as a distinct action with an Enable/Disable toggle. Instead of dumping all 50+ tools into the agent’s context, you can curate exactly what the agent can access.

Why This Still Matters

Manual curation isn’t as elegant as dynamic search, but it’s effective:

  1. Context Hygiene: Disabling irrelevant tools removes their definitions from the system prompt
  2. Latency Reduction: Fewer tool tokens means faster response times
  3. Accuracy: With fewer tools to choose from, the model is less likely to select the wrong action
📋

Best Practice: If you’re using Copilot Studio, audit your MCP connections and disable any tools your agent doesn’t need. It’s manual, but it works.


Technical Constraints

While Tool Search solves the bloat issue, there are boundaries to keep in mind:

ConstraintLimit
Maximum tools in catalog10,000
Regex pattern length200 characters
Search results per query3-5 tools
Model supportSonnet 4.5+, Opus 4.5+ (no Haiku)

The Bottom Line

Context bloat has been one of the biggest pain points when scaling MCP-powered agents. Claude’s Tool Search feature provides an elegant, dynamic solution—but it’s important to understand this is a Claude API capability, not an MCP protocol change.

If you’re using Claude (via Anthropic, Microsoft Foundry, Vertex AI, or Bedrock), enable Tool Search and start deferring your tools. Your context window will thank you.

If you’re on another platform, manual tool curation is your friend until similar features arrive. Every tool you disable is tokens saved.

Either way, the days of cramming 46,000 tokens of tool definitions into every request are finally behind us.

Related Articles

More articles coming soon...

Discussion

Loading...