Solving MCP Context Bloat with Claude's Tool Search API

If you’ve been working with the Model Context Protocol (MCP), you’ve likely encountered a frustrating bottleneck: Context Bloat.

When you connect a powerful MCP server—like the GitHub MCP—it typically loads the definitions for every single tool available. For a server with 91 tools, this can consume roughly 46,000 tokens just for the definitions. That’s a massive chunk of your context window gone before you’ve even sent a single message.

Anthropic has addressed this challenge with a powerful new feature in the Claude API called Tool Search. While this isn’t a change to the MCP protocol itself, it’s a game-changer for anyone building MCP-powered agents with Claude.

Illustration of Context Bloat — The Struggle is Real: Carrying 91 tools when you only need one.

What is Tool Search?

Tool Search is a Claude API feature that allows the model to dynamically discover and load tools on-demand, rather than pre-loading everything at the start. Instead of stuffing all tool definitions into the context window upfront, Claude searches your tool catalog and loads only the 3-5 most relevant tools required for the current task.

This solves two critical challenges:

Context Efficiency: Tool definitions can consume massive portions of your context window (50 tools ≈ 10-20K tokens)
Tool Selection Accuracy: Claude’s ability to correctly select tools degrades significantly with more than 30-50 tools loaded at once

💡

Key Point: Tool Search is a Claude API feature, not an MCP protocol change. It works seamlessly with MCP servers, but the search logic runs on Claude’s side.

Supported Providers & Models

Tool Search is currently in public beta and requires specific beta headers depending on your provider:

Provider	Beta Header	Supported Models
Anthropic API	`advanced-tool-use-2025-11-20`	Claude Opus 4.5, Claude Sonnet 4.5
Microsoft Foundry	`advanced-tool-use-2025-11-20`	Claude Opus 4.5, Claude Sonnet 4.5
Google Cloud Vertex AI	`tool-search-tool-2025-10-19`	Claude Opus 4.5, Claude Sonnet 4.5
Amazon Bedrock	`tool-search-tool-2025-10-19`	Claude Opus 4.5 only

⚠️

Bedrock Users: Tool Search is only available via the Invoke API, not the Converse API.

The Two Flavors of Search: Regex vs. BM25

Claude offers two distinct search methods, each optimized for different tool naming conventions:

1. Regex Search (`tool_search_tool_regex_20251119`)

Best For: Tools with a strict, consistent, and predictable naming structure.

How It Works: Claude constructs regex patterns using Python’s re.search() syntax to find matching tools.

Code

# Example patterns Claude might use:
"weather"                    # Matches tools containing "weather"
"get_.*_data"                # Matches get_user_data, get_weather_data
"database.*query|query.*database"  # OR patterns for flexibility
"(?i)slack"                  # Case-insensitive search

Pro: Extremely precise for structured APIs like stripe_customer_get or aws_ec2_stop
Con: Fails if tool names are inconsistent or ambiguous

2. BM25 Search (`tool_search_tool_bm25_20251119`)

Best For: Tools with natural language names or semantically meaningful descriptions.

How It Works: Claude uses natural language queries, and a BM25 relevance ranking algorithm matches them to your tool definitions.

Pro: More flexible; handles synonyms and natural language variations
Con: Slightly less precise than exact pattern matching

Tool Search Solution — Tool Search to the rescue: Finding the right needle in the haystack.

How the `defer_loading` Flag Works

The magic happens with a simple property: defer_loading: true. When you mark a tool with this flag, Claude won’t load its definition into context until it’s discovered via search.

Code

{
  "name": "get_weather",
  "description": "Get weather for a location",
  "input_schema": { ... },
  "defer_loading": true  // 👈 Load only when needed
}

Key Rules:

The Tool Search tool itself must never have defer_loading: true
Keep your 3-5 most frequently used tools as non-deferred
At least one tool must be non-deferred (or you’ll get an error)

Implementation Guide

Here’s how to implement Tool Search with the Claude API:

Step 1: Add the Beta Header

Code

# For Anthropic API / Microsoft Foundry
"anthropic-beta: advanced-tool-use-2025-11-20"

# For MCP integration, add both:
"anthropic-beta: advanced-tool-use-2025-11-20,mcp-client-2025-11-20"

Step 2: Include the Search Tool

Code

{
  "type": "tool_search_tool_regex_20251119",
  "name": "tool_search_tool_regex"
}

Step 3: Mark Tools for Deferred Loading

Code

tools = [
    # Search tool - always loaded
    {"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},

    # Frequently used - keep loaded
    {"name": "navigate", "description": "...", "defer_loading": False},

    # Everything else - load on demand
    {"name": "github_create_issue", "description": "...", "defer_loading": True},
    {"name": "github_list_repos", "description": "...", "defer_loading": True},
    # ... 88 more tools
]

MCP Server Integration

When working with MCP servers, you can use mcp_toolset to defer all tools from a server by default:

Code

{
  "type": "mcp_toolset",
  "mcp_server_name": "github-server",
  "default_config": {
    "defer_loading": true
  },
  "configs": {
    "search_issues": {
      "defer_loading": false // Override: keep this one loaded
    }
  }
}

When Tool Search Isn’t Available: Manual Curation

Not every environment supports Claude’s Tool Search yet. For developers building agents in Microsoft Copilot Studio, you’ll need a different approach.

The Alternative: Design-Time Tool Selection

While Copilot Studio doesn’t yet have dynamic tool search, it offers granular tool configuration that lets you manually control which tools are available to your agent.

When you connect an MCP server in Copilot Studio, every tool appears as a distinct action with an Enable/Disable toggle. Instead of dumping all 50+ tools into the agent’s context, you can curate exactly what the agent can access.

Why This Still Matters

Manual curation isn’t as elegant as dynamic search, but it’s effective:

Context Hygiene: Disabling irrelevant tools removes their definitions from the system prompt
Latency Reduction: Fewer tool tokens means faster response times
Accuracy: With fewer tools to choose from, the model is less likely to select the wrong action

📋

Best Practice: If you’re using Copilot Studio, audit your MCP connections and disable any tools your agent doesn’t need. It’s manual, but it works.

Technical Constraints

While Tool Search solves the bloat issue, there are boundaries to keep in mind:

Constraint	Limit
Maximum tools in catalog	10,000
Regex pattern length	200 characters
Search results per query	3-5 tools
Model support	Sonnet 4.5+, Opus 4.5+ (no Haiku)

The Bottom Line

Context bloat has been one of the biggest pain points when scaling MCP-powered agents. Claude’s Tool Search feature provides an elegant, dynamic solution—but it’s important to understand this is a Claude API capability, not an MCP protocol change.

If you’re using Claude (via Anthropic, Microsoft Foundry, Vertex AI, or Bedrock), enable Tool Search and start deferring your tools. Your context window will thank you.

If you’re on another platform, manual tool curation is your friend until similar features arrive. Every tool you disable is tokens saved.

Either way, the days of cramming 46,000 tokens of tool definitions into every request are finally behind us.

AI & Automation Hub

What is Tool Search?

Supported Providers & Models

The Two Flavors of Search: Regex vs. BM25

1. Regex Search (`tool_search_tool_regex_20251119`)

2. BM25 Search (`tool_search_tool_bm25_20251119`)

How the `defer_loading` Flag Works

Implementation Guide

Step 1: Add the Beta Header

Step 2: Include the Search Tool

Step 3: Mark Tools for Deferred Loading

MCP Server Integration

When Tool Search Isn’t Available: Manual Curation

The Alternative: Design-Time Tool Selection

Why This Still Matters

Technical Constraints

The Bottom Line

Related Articles

Discussion

What is Tool Search?

Supported Providers & Models

The Two Flavors of Search: Regex vs. BM25

1. Regex Search (tool_search_tool_regex_20251119)

2. BM25 Search (tool_search_tool_bm25_20251119)

How the defer_loading Flag Works

Implementation Guide

Step 1: Add the Beta Header

Step 2: Include the Search Tool

Step 3: Mark Tools for Deferred Loading

MCP Server Integration

When Tool Search Isn’t Available: Manual Curation

The Alternative: Design-Time Tool Selection

Why This Still Matters

Technical Constraints

The Bottom Line

Enjoying this post?

Related Articles

Discussion

1. Regex Search (`tool_search_tool_regex_20251119`)

2. BM25 Search (`tool_search_tool_bm25_20251119`)

How the `defer_loading` Flag Works