Lesson 26 of 55: Multimodal MCP – Images, PDFs, and Audio With Google Gemini

Gemini’s multimodal capabilities are not a bolt-on feature — they are a first-class part of the API. When you combine them with MCP, you unlock tool-calling patterns that no other provider can match today: an agent that reads a PDF invoice and simultaneously queries your accounting database; a vision pipeline that processes uploaded product photos and calls your inventory API; an audio transcription workflow that tags clips with taxonomy from your knowledge base. This lesson covers the full multimodal stack for MCP applications.

Gemini multimodal MCP diagram showing image PDF audio inputs flowing into model alongside tool calls dark
Gemini accepts images, PDFs, audio, and video alongside text and tool calls in the same request.

The Multimodal Parts System

Every Gemini request is built from an array of parts. A part can be text, inline data (base64), or a file URI from the Files API. This composability is what makes multimodal tool calling clean:

import { GoogleGenerativeAI } from '@google/generative-ai';
import fs from 'node:fs';
import path from 'node:path';

const genai = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

// Inline image (small files up to ~20MB)
function imageToInlinePart(filePath, mimeType = 'image/jpeg') {
  const data = fs.readFileSync(filePath).toString('base64');
  return { inlineData: { mimeType, data } };
}

// Inline PDF
function pdfToInlinePart(filePath) {
  const data = fs.readFileSync(filePath).toString('base64');
  return { inlineData: { mimeType: 'application/pdf', data } };
}

This parts-based architecture is why Gemini multimodal feels natural to work with. Rather than separate endpoints for vision and text, you compose a single array of parts, mixing modalities freely. When MCP tools are added on top, the model can reason across image content and tool results in the same conversation turn.

Image Analysis + MCP Tool Calls

A common pattern: the user uploads a product photo, the model identifies it visually, then calls an MCP tool to fetch live inventory and pricing data for that product:

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const transport = new StdioClientTransport({
  command: 'node',
  args: ['./servers/inventory-server.js'],
});
const mcp = new Client({ name: 'vision-agent', version: '1.0.0' });
await mcp.connect(transport);
const { tools: mcpTools } = await mcp.listTools();

const model = genai.getGenerativeModel({
  model: 'gemini-2.0-flash',
  tools: [{ functionDeclarations: mcpTools.map(t => ({ name: t.name, description: t.description, parameters: t.inputSchema })) }],
});

async function analyzeProductImage(imagePath) {
  const chat = model.startChat();
  const imagePart = imageToInlinePart(imagePath);

  let response = await chat.sendMessage([
    imagePart,
    { text: 'Identify this product and check its current inventory and pricing using the available tools.' },
  ]);

  let candidate = response.response.candidates[0];

  // Tool calling loop (same pattern as lesson 24)
  while (candidate.content.parts.some(p => p.functionCall)) {
    const calls = candidate.content.parts.filter(p => p.functionCall);
    const results = await Promise.all(calls.map(async part => {
      const fc = part.functionCall;
      const result = await mcp.callTool({ name: fc.name, arguments: fc.args });
      const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
      return { functionResponse: { name: fc.name, response: { result: text } } };
    }));
    response = await chat.sendMessage(results);
    candidate = response.response.candidates[0];
  }

  return candidate.content.parts.filter(p => p.text).map(p => p.text).join('');
}

const analysis = await analyzeProductImage('./uploads/product-photo.jpg');
console.log(analysis);

In practice, this pattern powers use cases like warehouse quality control (photograph a shelf, look up inventory), insurance claims processing (photograph damage, cross-reference policy), and retail product identification. The key insight is that the model handles the visual recognition while MCP tools provide the live data layer.

Gemini PDF invoice analysis with MCP database lookup tool call flow dark diagram boxes arrows
Invoice processing: Gemini reads the PDF, extracts line items, then calls MCP tools to verify each item against your ERP.

PDF Processing with MCP Enrichment

PDFs are particularly powerful. A 200-page contract, a scanned invoice, or a technical specification can be passed to Gemini as inline data. The model reads the entire document and can simultaneously call MCP tools to enrich or validate the extracted information:

async function processInvoice(pdfPath) {
  const model = genai.getGenerativeModel({
    model: 'gemini-2.5-pro-preview-03-25',  // Pro for complex document understanding
    tools: [{ functionDeclarations: mcpTools.map(t => ({ name: t.name, description: t.description, parameters: t.inputSchema })) }],
  });

  const chat = model.startChat();
  const pdfPart = pdfToInlinePart(pdfPath);

  let response = await chat.sendMessage([
    pdfPart,
    {
      text: `Extract all line items from this invoice (product name, SKU, quantity, unit price, total).
For each line item, use the verify_product tool to check if the SKU exists in our system.
Flag any discrepancies between the invoice price and our current pricing.
Return a structured JSON summary.`,
    },
  ]);

  let candidate = response.response.candidates[0];
  while (candidate.content.parts.some(p => p.functionCall)) {
    const calls = candidate.content.parts.filter(p => p.functionCall);
    const results = await Promise.all(calls.map(async part => {
      const fc = part.functionCall;
      const result = await mcp.callTool({ name: fc.name, arguments: fc.args });
      const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
      return { functionResponse: { name: fc.name, response: { result: text } } };
    }));
    response = await chat.sendMessage(results);
    candidate = response.response.candidates[0];
  }

  return candidate.content.parts.filter(p => p.text).map(p => p.text).join('');
}

Be careful with PDF size when using inline base64 encoding. A 50-page PDF might be 5-10MB, but base64 inflates that by roughly 33%. If your invoices or contracts regularly exceed 15MB, skip ahead to the Files API section below to avoid hitting request size limits.

The Files API for Large Files

Files over ~20MB (or videos) should use the Files API rather than inline base64. This also supports re-use across multiple requests without re-uploading:

import { GoogleAIFileManager } from '@google/generative-ai/server';

const fileManager = new GoogleAIFileManager(process.env.GEMINI_API_KEY);

// Upload once, use multiple times
const uploadedFile = await fileManager.uploadFile('./large-report.pdf', {
  mimeType: 'application/pdf',
  displayName: 'Q1 2026 Report',
});

console.log(`Uploaded: ${uploadedFile.file.uri}`);

// Wait for processing to complete
let fileState = await fileManager.getFile(uploadedFile.file.name);
while (fileState.state === 'PROCESSING') {
  await new Promise(r => setTimeout(r, 2000));
  fileState = await fileManager.getFile(uploadedFile.file.name);
}

if (fileState.state !== 'ACTIVE') {
  throw new Error(`File processing failed: ${fileState.state}`);
}

// Reference in any model call
const filePart = {
  fileData: {
    mimeType: fileState.mimeType,
    fileUri: fileState.uri,
  },
};

// Now use filePart in chat.sendMessage([filePart, { text: 'Summarize this report...' }])

The Files API also opens up video analysis. You can upload a product demo video, have Gemini analyze visual content frame by frame, and then call MCP tools to log findings or trigger workflows. Audio and video share the same upload-then-reference pattern shown above.

Audio Transcription + Tool Enrichment

async function processAudioWithEnrichment(audioPath) {
  const model = genai.getGenerativeModel({
    model: 'gemini-2.0-flash',
    tools: [{ functionDeclarations: mcpTools.map(t => ({ name: t.name, description: t.description, parameters: t.inputSchema })) }],
  });

  const audioData = fs.readFileSync(audioPath).toString('base64');
  const audioPart = { inlineData: { mimeType: 'audio/mpeg', data: audioData } };

  const chat = model.startChat();
  let response = await chat.sendMessage([
    audioPart,
    { text: 'Transcribe this audio. Then identify any product names or order numbers mentioned and look them up in our system.' },
  ]);

  let candidate = response.response.candidates[0];
  while (candidate.content.parts.some(p => p.functionCall)) {
    const calls = candidate.content.parts.filter(p => p.functionCall);
    const results = await Promise.all(calls.map(async part => {
      const fc = part.functionCall;
      const result = await mcp.callTool({ name: fc.name, arguments: fc.args });
      const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
      return { functionResponse: { name: fc.name, response: { result: text } } };
    }));
    response = await chat.sendMessage(results);
    candidate = response.response.candidates[0];
  }

  return candidate.content.parts.filter(p => p.text).map(p => p.text).join('');
}
Three modality examples image analysis PDF processing audio transcription each with MCP tool call enrichment dark
Image, PDF, and audio inputs all follow the same parts-based pattern – the tool calling loop is identical.

Notice that all three modality examples – image, PDF, and audio – reuse the identical tool-calling loop from the function calling lesson. This is intentional. Multimodal inputs change what the model sees, not how it calls tools. Your MCP server code stays exactly the same regardless of input type.

Multimodal MCP Resources

MCP resources can also return binary content (type 'blob') – useful for image thumbnails, report PDFs, or audio clips. You can fetch these from a resource and pass them directly to Gemini:

// Fetch a binary resource from MCP and pass it to Gemini
const resource = await mcp.readResource({ uri: 'report://monthly/2026-03.pdf' });
const blobContent = resource.contents.find(c => c.mimeType === 'application/pdf');

if (blobContent) {
  const pdfPart = { inlineData: { mimeType: 'application/pdf', data: blobContent.blob } };
  const response = await chat.sendMessage([pdfPart, { text: 'Extract the key metrics from this report.' }]);
}

Audio Content Type in MCP

New in 2025-03-26

MCP added audio as a first-class content type alongside text and image in spec version 2025-03-26. Tool results and resource contents can now include audio blocks with base64-encoded data and a MIME type. This means an MCP server can return audio recordings, synthesised speech, or extracted audio clips directly as tool output – clients that support audio can play or process them without a separate download step.

// MCP tool returning audio content
server.tool('transcribe_meeting', '...', { recording_uri: z.string() }, async ({ recording_uri }) => {
  const audioBuffer = await downloadAudio(recording_uri);
  const transcript = await transcribe(audioBuffer);

  return {
    content: [
      { type: 'text', text: transcript },
      {
        type: 'audio',
        data: audioBuffer.toString('base64'),
        mimeType: 'audio/wav',
      },
    ],
  };
});

For Gemini specifically, you can pass MCP audio content directly into the inlineData format shown in the examples above. The audio type supports WAV, MP3, FLAC, OGG, and other standard MIME types. For files over ~10MB, use the Gemini Files API to upload first, then reference by file URI.

Failure Modes to Watch

  • File too large for inline: Base64 encoding a 50MB video inline will hit request size limits. Use the Files API for anything over ~10MB to be safe.
  • Unsupported MIME types: Not all MIME types work with all models. Test image/webp and application/x-pdf variants – stick to image/jpeg, image/png, and application/pdf for broadest support.
  • Files API cleanup: Uploaded files persist for 48 hours. For GDPR/CCPA compliance, explicitly delete files after processing with fileManager.deleteFile(name).
  • Audio length limits: Inline audio has a limit of about 20MB; use the Files API for longer recordings. Processing 1 hour of audio uses roughly 1,750 tokens per minute.

What to Build Next

  • Build an invoice processing MCP agent: takes a scanned PDF, extracts line items, calls a lookup_product MCP tool for each SKU, and outputs a reconciled JSON report.
  • Add a get_image resource to an MCP server that returns product photos as blob content – have Gemini analyze them and then call your tag_product tool.

nJoy πŸ˜‰

Lesson 25 of 55: Gemini 2.0 and 2.5 Pro + MCP – Function Calling at Scale

Google’s Gemini 2.0 Flash and 2.5 Pro bring a distinct approach to function calling that differs meaningfully from both OpenAI and Claude. Understanding those differences — particularly around parallel tool execution, the functionDeclarations schema, and how Gemini handles tool results — will save you hours of debugging when you first wire up an MCP server to the Gemini API.

Gemini 2.0 Flash and 2.5 Pro function calling with MCP server diagram dark teal
Gemini function calling bridges the Gemini API to MCP tools via a schema conversion layer.

The Gemini Function Calling Model

Gemini’s tool-calling API is part of its @google/generative-ai package (or the newer @google/genai unified SDK). The key primitives are:

  • FunctionDeclaration – describes a function with a name, description, and JSON Schema parameters
  • FunctionCall – the model’s request to invoke a function (name + args as a plain object)
  • FunctionResponse – your code’s reply to a function call (name + response object)
  • Tool – a wrapper around an array of FunctionDeclaration objects

Gemini can issue multiple FunctionCall parts in a single response, meaning it supports parallel tool calling natively. This is a significant performance advantage when your agent can execute tools concurrently.

Installing the SDK

npm install @google/generative-ai @modelcontextprotocol/sdk zod

Use Node.js 22+ with "type": "module" in your package.json. Store your API key as GEMINI_API_KEY in a .env file and load it with node --env-file=.env.

Converting MCP Tools to Gemini FunctionDeclarations

The MCP SDK returns tools as JSON Schema objects. Gemini’s FunctionDeclaration also uses JSON Schema for parameters, but the wrapper format differs. The conversion is straightforward:

import { GoogleGenerativeAI } from '@google/generative-ai';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const genai = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

// Connect to an MCP server
const transport = new StdioClientTransport({
  command: 'node',
  args: ['./servers/product-server.js'],
});

const mcp = new Client({ name: 'gemini-host', version: '1.0.0' });
await mcp.connect(transport);

// List MCP tools
const { tools: mcpTools } = await mcp.listTools();

// Convert to Gemini FunctionDeclarations
function mcpToolToGeminiDeclaration(tool) {
  return {
    name: tool.name,
    description: tool.description,
    parameters: tool.inputSchema,  // MCP already uses JSON Schema - pass through directly
  };
}

const geminiTools = [
  {
    functionDeclarations: mcpTools.map(mcpToolToGeminiDeclaration),
  },
];

This near-zero conversion cost is one of Gemini’s practical advantages over OpenAI. Because both MCP and Gemini use raw JSON Schema for parameters, you avoid the nested function wrapper that OpenAI requires. In a multi-provider setup, fewer transformations mean fewer bugs.

MCP JSON Schema parameters being converted to Gemini FunctionDeclaration format comparison dark diagram
MCP’s JSON Schema for tool parameters maps directly to Gemini’s parameters field – no deep transformation needed.

The Full Tool Calling Loop

async function runGeminiMcpLoop(userMessage) {
  const model = genai.getGenerativeModel({
    model: 'gemini-2.0-flash',
    tools: geminiTools,
  });

  const chat = model.startChat();
  const messages = [{ role: 'user', parts: [{ text: userMessage }] }];

  let response = await chat.sendMessage(userMessage);
  let candidate = response.response.candidates[0];

  while (candidate.finishReason === 'STOP' && hasFunctionCalls(candidate)) {
    const functionCalls = candidate.content.parts.filter(p => p.functionCall);

    // Execute all function calls in parallel (Gemini can issue multiple at once)
    const results = await Promise.all(
      functionCalls.map(async (part) => {
        const fc = part.functionCall;
        const mcpResult = await mcp.callTool({
          name: fc.name,
          arguments: fc.args,
        });
        const text = mcpResult.content
          .filter(c => c.type === 'text')
          .map(c => c.text)
          .join('\n');
        return {
          functionResponse: {
            name: fc.name,
            response: { result: text },
          },
        };
      })
    );

    // Send all function responses back in a single turn
    response = await chat.sendMessage(results);
    candidate = response.response.candidates[0];
  }

  return candidate.content.parts
    .filter(p => p.text)
    .map(p => p.text)
    .join('');
}

function hasFunctionCalls(candidate) {
  return candidate.content.parts.some(p => p.functionCall);
}

Note the key difference from OpenAI and Claude: Gemini uses a Chat session object (model.startChat()) with chat.sendMessage() instead of stateless messages arrays. The chat object maintains conversation history internally.

In production, this loop is the heartbeat of your agent. Every real-world Gemini MCP integration – from customer support bots to internal data dashboards – runs some version of this pattern. Getting it right here means the rest of your application can treat tool calling as a solved problem.

Gemini 2.5 Pro: Longer Context and Better Reasoning

Switch from gemini-2.0-flash to gemini-2.5-pro-preview-03-25 (or the latest stable alias) for tasks requiring deeper reasoning:

const model = genai.getGenerativeModel({
  model: 'gemini-2.5-pro-preview-03-25',  // 1M token context window
  tools: geminiTools,
  generationConfig: {
    temperature: 0.7,
    maxOutputTokens: 8192,
  },
});

Gemini 2.5 Pro’s 1-million-token context window makes it ideal for MCP agents that need to analyze large datasets, entire codebases, or long document collections through resource-fetching tools.

With the model tier covered, the next question is speed. Even the best model is slow if it makes tool calls sequentially when it could run them in parallel. This is where Gemini’s default parallel calling behavior becomes especially valuable.

Parallel Tool Execution: The Gemini Advantage

When Gemini issues multiple function calls in one response, it means it has determined those calls can be satisfied independently. This enables true parallel execution at your application layer:

// Gemini may respond with multiple FunctionCall parts simultaneously
// Example: searching multiple databases at once
// candidate.content.parts = [
//   { functionCall: { name: 'search_products', args: { query: 'laptop' } } },
//   { functionCall: { name: 'get_inventory', args: { category: 'electronics' } } },
//   { functionCall: { name: 'get_pricing', args: { tier: 'enterprise' } } },
// ]

// Your Promise.all() handles them concurrently - real parallelism
const results = await Promise.all(functionCalls.map(callMcpTool));

Compare this to OpenAI (which also supports parallel calls) and Claude (which sequences calls unless you explicitly enable parallel tool use in the beta). Gemini’s default is to use parallelism aggressively when it makes sense.

Handling Errors in Tool Responses

async function callMcpToolSafe(fc, mcpClient) {
  try {
    const result = await mcpClient.callTool({
      name: fc.name,
      arguments: fc.args,
    });
    if (result.isError) {
      return {
        functionResponse: {
          name: fc.name,
          response: { error: result.content[0]?.text ?? 'Tool returned an error' },
        },
      };
    }
    const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
    return {
      functionResponse: {
        name: fc.name,
        response: { result: text },
      },
    };
  } catch (err) {
    return {
      functionResponse: {
        name: fc.name,
        response: { error: `Execution failed: ${err.message}` },
      },
    };
  }
}

Getting error handling right is critical because MCP tools are external processes that can fail for reasons completely outside your control: a database connection drops, an API rate-limits you, or the tool process crashes. Wrapping every call in a safe executor ensures your agent loop never hangs on a single broken tool.

“Function calling lets you connect Gemini models to external tools and APIs. Rather than processing all data internally, the model generates structured function calls that your application executes.” – Google AI for Developers, Function Calling Guide

With the core integration, model selection, parallel execution, and error handling covered, it is worth cataloging the specific ways things break in practice. These failure modes are drawn from real Gemini MCP deployments, not theoretical edge cases.

Common Failure Modes

  • Null parameters schema: If a tool has no parameters, pass parameters: { type: 'object', properties: {} } – omitting the field entirely causes a 400 error.
  • Nested arrays in schemas: Gemini is stricter about nested array schemas than OpenAI. Test each tool schema independently with a simple test call before integrating.
  • Chat session state: The Chat object holds history in memory. For multi-user applications, create a new startChat() per session – do not share a chat instance across users.
  • finishReason misread: Always check candidate.finishReason. A value of 'SAFETY' or 'RECITATION' means the response was blocked – handle these as errors rather than silently continuing.

What to Build Next

  • Swap gemini-2.0-flash for gemini-2.5-pro and pass a 500KB document as a user message – observe how the model leverages full-context reasoning alongside tool calls.
  • Add a maximumTurns guard to your tool loop to prevent infinite agent loops.
  • Log response.response.usageMetadata.totalTokenCount to measure the cost of each agent run.

nJoy πŸ˜‰

Lesson 24 of 55: Production Patterns for Claude + MCP (Caching, Retries, Tool Guards)

Three months of production experience with Claude + MCP teaches you things that no documentation covers. The retry patterns that actually work. The system prompts that reduce hallucinated tool calls. The caching strategies that cut your bill in half. The error classes you will encounter and the ones that silently corrupt output. This lesson consolidates those hard-won patterns into a reference you can apply directly.

Production Claude MCP patterns overview diagram showing caching retry budget observability blocks dark
Production Claude + MCP: the patterns that separate reliable systems from ones that fail at 2am.

Prompt Caching for Cost Reduction

Anthropic’s prompt caching feature caches portions of the input prompt that do not change between requests. For MCP applications, the tool definitions and system prompt are perfect candidates – they are typically the same for every user in a session. Caching them can reduce costs by 50-90% on repeated calls.

// Enable prompt caching for stable content
const response = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 4096,
  system: [
    {
      type: 'text',
      text: `You are a helpful assistant with access to our product database and order management system.
Always verify product availability before confirming orders.
Format all prices in USD.`,
      cache_control: { type: 'ephemeral' },  // Cache this system prompt
    },
  ],
  tools: claudeTools.map(t => ({
    ...t,
    // Cache tool definitions - they rarely change
    ...(claudeTools.indexOf(t) === claudeTools.length - 1
      ? { cache_control: { type: 'ephemeral' } }  // Cache after last tool definition
      : {}),
  })),
  messages,
});

// Check cache performance in usage stats
const usage = response.usage;
console.error(`Cache: ${usage.cache_read_input_tokens} hit, ${usage.cache_creation_input_tokens} created`);

In practice, the 5-minute rolling cache window means that prompt caching works best for active sessions with frequent back-and-forth. For batch jobs or infrequent requests, the cache expires between calls and you will not see meaningful savings.

“Prompt caching enables you to cache portions of your prompt. Cached data is stored server-side for a rolling 5-minute period, after which it expires. Cache hits save 90% of input token costs for the cached portion.” – Anthropic Documentation, Prompt Caching

System Prompt Patterns That Work

Claude responds better to system prompts that describe the persona, define tool usage rules, specify output format, and set boundaries – in that order. Vague system prompts produce vague tool use.

const PRODUCTION_SYSTEM_PROMPT = `You are a precise product research assistant for TechStore.

TOOL USAGE RULES:
1. Always call search_products before making any recommendations
2. For price comparisons, call get_product_price for each product separately
3. If a product has less than 3 reviews, note "limited reviews" in your response
4. Never recommend products that are out of stock (use check_availability first)
5. If tools return errors, explain what you could not verify rather than guessing

OUTPUT FORMAT:
- Lead with the recommendation, then supporting evidence
- Include price, rating, and availability for each recommended product
- Use bullet points for product comparisons
- End with "Note: Stock and prices verified at [current timestamp]"

BOUNDARIES:
- You can only recommend products from our catalogue
- Do not speculate about products not in the search results
- If the user asks for something outside our catalogue, say so clearly`;

A well-structured system prompt is the single highest-leverage improvement you can make to a Claude + MCP integration. Vague prompts like “be helpful and use tools when needed” produce erratic tool usage. Specific rules about when to call which tool, in what order, and how to handle errors reduce hallucinated tool calls dramatically.

Anthropic prompt caching diagram showing system prompt and tool definitions cached versus messages uncached cost reduction
Prompt caching: static content (system prompt, tool definitions) cached at 90% discount; dynamic messages are not cached.

Production Error Taxonomy

// Claude API errors and how to handle them

// 429 - Rate limit: retry with exponential backoff
// 529 - Overloaded: retry with longer backoff (Anthropic load)
// 400 - Bad request: check tool schema, messages format, max_tokens
// 401 - Auth error: check ANTHROPIC_API_KEY
// 413 - Request too large: trim context or summarize conversation history

// Non-error patterns to watch:
// stop_reason === 'max_tokens' - response was cut off, increase max_tokens
// stop_reason === 'end_turn' but no text - model may be stuck, check context

async function callClaudeWithRetry(params, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await anthropic.messages.create(params);
    } catch (err) {
      const shouldRetry = err.status === 429 || err.status === 529 || err.status >= 500;
      if (!shouldRetry || attempt === maxRetries) throw err;

      const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
      const retryAfter = err.headers?.['retry-after']
        ? parseInt(err.headers['retry-after']) * 1000
        : delay;

      console.error(`[claude] Attempt ${attempt} failed (${err.status}), retrying in ${retryAfter}ms`);
      await new Promise(r => setTimeout(r, retryAfter));
    }
  }
}

Rate limiting (429) is the error you will hit most often during development and load testing. Anthropic’s rate limits are per-organization, so one runaway script can block your entire team. Always implement retry logic before you start scaling, not after.

Context Management for Long Conversations

// Summarise old conversation history when approaching context limits
// Claude 3.5 Sonnet context: 200K tokens (allows very long conversations)
// But cost grows linearly with context - summarize for efficiency

async function summariseHistory(messages, anthropicClient) {
  const summaryRequest = await anthropicClient.messages.create({
    model: 'claude-3-5-haiku-20241022',  // Use cheaper model for summarisation
    max_tokens: 500,
    messages: [
      ...messages,
      { role: 'user', content: 'Summarise our conversation so far in 3 bullet points, preserving all key facts found via tool calls.' },
    ],
  });
  return summaryRequest.content[0].text;
}

// In your main conversation loop, check token usage:
if (response.usage.input_tokens > 50000) {
  const summary = await summariseHistory(messages, anthropic);
  messages = [{ role: 'user', content: `Previous conversation summary:\n${summary}` }];
}

Context summarization is a cost optimization, not just a technical constraint. Even if your conversation fits within Claude’s 200K context window, sending 100K tokens per request is expensive. Summarizing early keeps your per-request cost predictable and your latency consistent.

Failure Mode: model Outputting Tool Calls That Do Not Exist

// Claude occasionally hallucinates tool names, especially if tool descriptions are vague
// Guard against this at the execution layer
const toolNames = new Set(mcpTools.map(t => t.name));

for (const toolUse of toolUseBlocks) {
  if (!toolNames.has(toolUse.name)) {
    console.error(`[warn] Claude called non-existent tool: ${toolUse.name}`);
    toolResults.push({
      type: 'tool_result',
      tool_use_id: toolUse.id,
      content: [{ type: 'text', text: `Tool '${toolUse.name}' does not exist. Available tools: ${[...toolNames].join(', ')}` }],
      is_error: true,
    });
    continue;
  }
  // ... execute valid tool
}

Hallucinated tool names are surprisingly common when your server exposes many tools with similar names, or when tool descriptions are ambiguous. The validation pattern above is cheap insurance: a few lines of code that prevent cascading failures from a single bad tool call.

What to Check Right Now

  • Enable prompt caching – add cache_control: { type: 'ephemeral' } to your system prompt and the last tool definition. Check the usage.cache_read_input_tokens to measure savings.
  • Add a tool existence check – validate every tool name Claude returns before attempting to execute it via MCP. Hallucinated tool calls happen in production.
  • Monitor stop reasons – log every stop_reason. A high rate of max_tokens stops means you need to increase max_tokens or summarize context sooner.
  • Measure prompt cache hit rates – aim for >70% cache hit rate in sustained conversations. Low hit rates mean your “static” content is actually varying between calls.

nJoy πŸ˜‰

Lesson 23 of 55: Claude Code, Agent Skills, and MCP for Autonomous Coding

Claude Code is Anthropic’s autonomous coding agent – and it is built on MCP. When Claude Code reads files, runs tests, executes commands, and browses documentation, it does all of this through MCP servers. The architecture is not a coincidence: it is Anthropic demonstrating exactly how a production-grade autonomous agent should integrate with external systems. Understanding how Claude Code uses MCP is one of the fastest ways to understand how you should build your own agents.

Claude Code architecture diagram showing MCP servers for filesystem bash tools web browser connected to Claude agent dark
Claude Code is an MCP host: it connects to filesystem, bash, and browser servers, and orchestrates them with Claude 3.5.

Claude Code’s MCP Architecture

Claude Code (the CLI tool, claude) operates as an MCP host. When it starts, it connects to a set of built-in MCP servers that provide its core capabilities: computer-use (screen reading/clicking), bash (shell command execution), and files (filesystem read/write). You can extend Claude Code with your own custom MCP servers, making it immediately capable of working with your specific project tools.

# ~/.claude/config.json - Extend Claude Code with your MCP servers
{
  "mcpServers": {
    "my-project-tools": {
      "command": "node",
      "args": ["./tools/mcp-server.js"],
      "env": {
        "DATABASE_URL": "postgresql://localhost/mydb"
      }
    },
    "github-tools": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/mcp-github"]
    }
  }
}

Once configured, Claude Code can use your custom server’s tools as naturally as it uses its built-in bash or filesystem tools. Your create_github_issue tool becomes as usable as Bash(git commit).

Building Agent Skills for Claude Code

The most powerful Claude Code extension pattern is the “agent skill” – a specialised MCP server that encapsulates a complex workflow as a single callable tool. Instead of Claude figuring out the 20-step process to deploy a microservice, you encode those steps in a deploy_service tool that handles all the complexity.

// deploy-skill-server.js
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';

const exec = promisify(execFile);
const server = new McpServer({ name: 'deploy-skills', version: '1.0.0' });

server.tool(
  'deploy_service',
  `Deploy a microservice to Kubernetes. Handles build, push, and rollout.
  Returns the deployment status and the new pod count.`,
  {
    service_name: z.string().describe('Name of the service to deploy'),
    image_tag: z.string().describe('Docker image tag to deploy'),
    namespace: z.string().default('production').describe('Kubernetes namespace'),
    replicas: z.number().int().min(1).max(10).default(2),
  },
  async ({ service_name, image_tag, namespace, replicas }) => {
    const steps = [];

    // Step 1: Build
    const { stdout: buildOut } = await exec('docker', [
      'build', '-t', `${service_name}:${image_tag}`, './services/' + service_name
    ]);
    steps.push('Build: OK');

    // Step 2: Push
    await exec('docker', ['push', `myregistry/${service_name}:${image_tag}`]);
    steps.push('Push: OK');

    // Step 3: Deploy
    const manifest = generateK8sManifest(service_name, image_tag, namespace, replicas);
    await exec('kubectl', ['apply', '-f', '-'], { input: manifest });
    steps.push(`Deploy: OK (${replicas} replicas)`);

    // Step 4: Wait for rollout
    await exec('kubectl', ['rollout', 'status', `deployment/${service_name}`, '-n', namespace]);
    steps.push('Rollout: Complete');

    return {
      content: [{ type: 'text', text: steps.join('\n') + `\n\nService ${service_name} deployed successfully.` }],
    };
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);
Claude Code agent skills diagram showing complex deployment workflow encapsulated as single MCP tool dark
Agent skills: encode complex multi-step workflows as atomic MCP tools. Claude calls one tool instead of orchestrating ten.

Permission Modes in Claude Code

Claude Code has a permission system that controls what actions it can take without asking for confirmation. MCP tools are subject to the same permission model. You can configure Claude Code to auto-approve specific tools, require confirmation for destructive operations, or run in fully supervised mode.

# .claude/settings.json (project-level)
{
  "permissions": {
    "allow": [
      "Bash(git *)",              # Allow all git commands
      "mcp:my-project-tools:read_*",  # Allow read-only tools from my server
      "Read(**)"                   # Allow reading any file
    ],
    "deny": [
      "mcp:my-project-tools:deploy_*",  # Always ask before deploying
      "Bash(rm -rf *)"            # Never auto-approve recursive deletes
    ]
  }
}

“Claude Code is designed to be an autonomous coding agent that can understand and work on complex codebases. It uses a set of built-in tools and can be extended with custom MCP servers to access domain-specific capabilities.” – Anthropic Documentation, Claude Code

Failure Modes with Claude Code MCP Extensions

Case 1: Tools That Are Too Granular

// BAD: Too granular - Claude has to call many tools in sequence and may make mistakes
server.tool('set_k8s_namespace', '...', { ns: z.string() }, handler);
server.tool('set_k8s_image', '...', { image: z.string() }, handler);
server.tool('apply_k8s_manifest', '...', { manifest: z.string() }, handler);
server.tool('watch_k8s_rollout', '...', { deployment: z.string() }, handler);

// BETTER: One atomic skill tool that handles the whole workflow
server.tool('deploy_service', 'Deploy a service to k8s...', { service: z.string(), ... }, handler);

Case 2: Forgetting to Handle Long-Running Operations

// Build + deploy can take minutes
// Don't timeout. Stream progress via notifications or use progress indicators
// Claude Code will wait, but it needs feedback to know the tool is running

server.tool('build_and_deploy', '...', { ... }, async ({ service }) => {
  // Send progress
  process.stderr.write(`[build] Starting build for ${service}...\n`);
  await buildService(service);  // May take 2-10 minutes
  process.stderr.write(`[deploy] Deploying ${service}...\n`);
  await deployService(service);
  return { content: [{ type: 'text', text: 'Done.' }] };
});

What to Check Right Now

  • Install Claude Codenpm install -g @anthropic-ai/claude-code. Then run claude in a project directory to see it in action.
  • Add your MCP server to Claude Code config – add it to ~/.claude/config.json or .claude/config.json (project-level). Then run claude and ask it to use your tool.
  • Design tools as atomic workflows – each tool should complete one meaningful unit of work end-to-end. Avoid exposing low-level implementation details as separate tools.
  • Review the permission system – set appropriate allow and deny rules for your project. Deny destructive tools by default and require explicit confirmation.

nJoy πŸ˜‰

Lesson 22 of 55: Claude Extended Thinking Mode With MCP Tools

Claude 3.7 Sonnet introduced extended thinking – a mode where the model spends additional compute on internal reasoning before producing its response. When combined with MCP tools, extended thinking transforms how the model approaches complex multi-step tasks: instead of immediately deciding to call a tool, Claude reasons through what it knows, what it needs, which tools would help, and what order to call them in. The result is dramatically fewer redundant tool calls and significantly better decisions on ambiguous tasks.

Claude extended thinking diagram showing internal reasoning loop before tool calls in dark minimal design
Extended thinking: Claude reasons internally before deciding to call tools – reducing noise, improving decisions.

Enabling Extended Thinking

Extended thinking is enabled by adding the thinking block to the API request. You control the “budget” – the maximum number of tokens Claude can use for internal reasoning. A higher budget allows deeper reasoning but adds latency and cost.

import Anthropic from '@anthropic-ai/sdk';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const mcpClient = new Client({ name: 'thinking-host', version: '1.0.0' }, { capabilities: {} });
await mcpClient.connect(new StdioClientTransport({ command: 'node', args: ['server.js'] }));
const { tools: mcpTools } = await mcpClient.listTools();
const claudeTools = mcpTools.map(t => ({ name: t.name, description: t.description, input_schema: t.inputSchema }));

async function runWithExtendedThinking(userMessage, thinkingBudget = 8000) {
  const messages = [{ role: 'user', content: userMessage }];

  while (true) {
    const response = await anthropic.messages.create({
      model: 'claude-3-7-sonnet-20250219',
      max_tokens: 16000,  // Must be > thinking budget
      thinking: {
        type: 'enabled',
        budget_tokens: thinkingBudget,  // Min: 1024, no hard max
      },
      tools: claudeTools,
      messages,
    });

    // Response may contain thinking blocks - they appear before text/tool_use
    const thinkingBlocks = response.content.filter(b => b.type === 'thinking');
    const textBlocks = response.content.filter(b => b.type === 'text');
    const toolUseBlocks = response.content.filter(b => b.type === 'tool_use');

    if (process.env.SHOW_THINKING) {
      for (const tb of thinkingBlocks) {
        console.error('\n[thinking]', tb.thinking.slice(0, 500) + '...');
      }
    }

    messages.push({ role: 'assistant', content: response.content });

    if (response.stop_reason === 'tool_use') {
      const toolResults = await Promise.all(
        toolUseBlocks.map(async (toolUse) => {
          const result = await mcpClient.callTool({
            name: toolUse.name,
            arguments: toolUse.input,
          });
          return { type: 'tool_result', tool_use_id: toolUse.id, content: result.content };
        })
      );
      messages.push({ role: 'user', content: toolResults });
    } else {
      return textBlocks.map(b => b.text).join('');
    }
  }
}

// Complex task: extended thinking shines here
const result = await runWithExtendedThinking(
  `I need to buy a laptop for machine learning research. 
   My budget is $2000. I prefer AMD GPUs but would consider NVIDIA. 
   It must have at least 32GB RAM expandable to 64GB, 
   and I work across Windows and Linux so driver support matters.
   Research and recommend the top 3 options.`,
  12000  // Higher budget for complex task
);

console.log(result);
await mcpClient.close();
Claude thinking budget dial showing trade-off between reasoning depth latency cost with task complexity axis dark
Choosing the thinking budget: simple tasks need 1,000-4,000 tokens; complex research tasks benefit from 8,000+.

When to Use Extended Thinking with MCP

Extended thinking is not free – it adds significant latency (often 10-30 seconds for high budgets) and substantial token cost. Use it selectively:

  • Use it for: complex research requiring 5+ tool calls, tasks requiring careful tradeoff analysis, situations where tool call order significantly affects outcome quality
  • Skip it for: simple lookups, single-tool tasks, time-sensitive queries, high-volume low-latency applications
// Adaptive thinking budget based on task complexity
function getThinkingBudget(task) {
  const wordCount = task.split(/\s+/).length;
  const hasComparisons = /compare|vs|versus|between|best|recommend/.test(task.toLowerCase());
  const hasMultipleRequirements = task.split(/and|also|additionally|plus/).length > 2;

  if (hasComparisons && hasMultipleRequirements) return 10000;
  if (hasComparisons || hasMultipleRequirements) return 5000;
  if (wordCount > 50) return 3000;
  return 0;  // No thinking for simple tasks
}

const budget = getThinkingBudget(userInput);
if (budget > 0) {
  return runWithExtendedThinking(userInput, budget);
} else {
  return runWithClaude(userInput); // Standard tool calling
}

“Extended thinking causes Claude to reason more thoroughly about tasks before responding, which can substantially improve performance on complex tasks. Thinking tokens are not cached and must be included in the context window when continuing a conversation.” – Anthropic Documentation, Extended Thinking

Failure Modes with Extended Thinking

Case 1: Setting max_tokens Less Than Thinking Budget

// WRONG: max_tokens must exceed budget_tokens
const response = await anthropic.messages.create({
  max_tokens: 4096,
  thinking: { type: 'enabled', budget_tokens: 8000 }, // 8000 > 4096 - API error!
});

// CORRECT: max_tokens must be greater than budget_tokens
const response = await anthropic.messages.create({
  max_tokens: 16000,
  thinking: { type: 'enabled', budget_tokens: 8000 }, // Valid: 16000 > 8000
});

Case 2: Not Passing Thinking Blocks Back in Continuation

// When continuing a conversation with extended thinking enabled,
// thinking blocks from previous turns MUST be included in the messages array.
// The SDK handles this automatically if you push the full response.content.
messages.push({ role: 'assistant', content: response.content }); // Include ALL blocks including thinking

What to Check Right Now

  • Test with SHOW_THINKING=1 – run your agent with thinking visible. Reading the thinking output reveals what the model understood about the task and why it chose each tool.
  • Measure latency impact – log response time with and without extended thinking on the same tasks. Quantify the tradeoff for your use case before deploying at scale.
  • Start with budget 4000-8000 – this range gives substantially improved reasoning for most tasks without the extreme latency of budgets above 15,000.
  • Use claude-3-5-sonnet for anything where speed > accuracy – 3.5 Sonnet without thinking is typically faster and cheaper for tasks where the tradeoff makes sense.

nJoy πŸ˜‰

Lesson 21 of 55: Claude 3.5 and 3.7 + MCP – Native Tool Calling Patterns

Claude’s tool use is the cleanest tool calling implementation in the major LLM providers. The API is symmetric: you send tools in the request, Claude returns tool_use blocks when it wants to call something, you run the tools, and you send back tool_result blocks. No function/tool naming confusion, no finish_reason gotchas – just a clear, typed message structure. This lesson builds the Claude + MCP integration from scratch, comparing it to the OpenAI pattern where they differ.

Claude tool use message format diagram showing tool-use block and tool-result block on dark background
Claude’s tool calling: tool_use blocks in assistant messages, tool_result blocks in user messages.

Claude Tool Use Format

Claude’s tool use has a fundamentally different message structure from OpenAI’s. The key difference: tool results go in a user message (not a separate role), nested inside a tool_result content block that references the tool use ID. This is more structured and less ambiguous than OpenAI’s approach.

// Claude tool calling message flow:

// 1. Request: tools defined, user message sent
// 2. Response: Claude returns tool_use block(s)
{
  role: 'assistant',
  content: [
    { type: 'text', text: 'Let me search for that.' },
    {
      type: 'tool_use',
      id: 'toolu_01XY',
      name: 'search_products',
      input: { query: 'wireless headphones', limit: 5 }
    }
  ]
}

// 3. You execute the tool through MCP
// 4. Send result back in a user message with tool_result block
{
  role: 'user',
  content: [{
    type: 'tool_result',
    tool_use_id: 'toolu_01XY',
    content: [{ type: 'text', text: 'Found 5 products: ...' }]
  }]
}

The Complete Claude + MCP Integration

import Anthropic from '@anthropic-ai/sdk';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const mcpClient = new Client({ name: 'claude-host', version: '1.0.0' }, { capabilities: {} });
await mcpClient.connect(new StdioClientTransport({ command: 'node', args: ['server.js'] }));

const { tools: mcpTools } = await mcpClient.listTools();

// Convert MCP tools to Anthropic format
const claudeTools = mcpTools.map(t => ({
  name: t.name,
  description: t.description,
  input_schema: t.inputSchema,  // Note: input_schema, not parameters
}));

async function runWithClaude(userMessage) {
  const messages = [{ role: 'user', content: userMessage }];

  while (true) {
    const response = await anthropic.messages.create({
      model: 'claude-3-5-sonnet-20241022',
      max_tokens: 4096,
      tools: claudeTools,
      messages,
    });

    // Append Claude's response to messages
    messages.push({ role: 'assistant', content: response.content });

    // If Claude stopped due to tool use, execute tools
    if (response.stop_reason === 'tool_use') {
      const toolUseBlocks = response.content.filter(b => b.type === 'tool_use');
      const toolResults = await Promise.all(
        toolUseBlocks.map(async (toolUse) => {
          console.error(`[tool] Calling: ${toolUse.name}`, toolUse.input);
          const result = await mcpClient.callTool({
            name: toolUse.name,
            arguments: toolUse.input,
          });

          return {
            type: 'tool_result',
            tool_use_id: toolUse.id,
            content: result.content,  // MCP content blocks work directly here
          };
        })
      );

      // Tool results go in a user message
      messages.push({ role: 'user', content: toolResults });

    } else {
      // end_turn or other stop reason - extract final text
      const finalText = response.content
        .filter(b => b.type === 'text')
        .map(b => b.text)
        .join('');

      return finalText;
    }
  }
}

const result = await runWithClaude('Compare the best noise-cancelling headphones under $300');
console.log(result);
await mcpClient.close();
Side by side comparison of Claude tool use format versus OpenAI function calling format showing differences dark
Claude vs OpenAI tool calling: the message structure differs but the underlying logic is the same.

Claude 3.5 vs 3.7 Sonnet for Tool Use

Claude 3.5 Sonnet (20241022) is the current production choice for tool-heavy workloads: fast, reliable tool calls, good at following tool descriptions, and competitive pricing. Claude 3.7 Sonnet adds extended thinking (covered in Lesson 21) and improved reasoning for complex multi-step tool chains, at higher latency and cost.

// For fast, reliable tool calling:
model: 'claude-3-5-sonnet-20241022'

// For complex reasoning + tool use:
model: 'claude-3-7-sonnet-20250219'  // Includes extended thinking

// Haiku for high-volume, simple tool tasks:
model: 'claude-3-5-haiku-20241022'

“Claude is trained to use tools in the same way that humans do: by processing what it’s seen before and uses this context to craft appropriate tool calls or final responses. Tool use enables Claude to interact with external services and APIs in a structured way.” – Anthropic Documentation, Tool Use

Key Differences from OpenAI

Aspect Claude (Anthropic) GPT (OpenAI)
Tool result role user tool
Schema field input_schema parameters
Tool call detection stop_reason === 'tool_use' finish_reason === 'tool_calls'
Multiple tools All results in one user message Each result is a separate tool message
Tool call args toolUse.input (already parsed) JSON.parse(toolCall.function.arguments)

Failure Modes with Claude Tool Use

Case 1: Putting Tool Results in an Assistant Message

// WRONG: Tool results in wrong role
messages.push({ role: 'assistant', content: toolResults }); // API error

// CORRECT: Tool results go in user role
messages.push({ role: 'user', content: toolResults });

Case 2: Forgetting that Claude’s input Is Already Parsed JSON

// WRONG: Trying to JSON.parse Claude's tool input
const args = JSON.parse(toolUse.input); // Error: toolUse.input is already an object

// CORRECT: Use directly - Claude's SDK already parses it
const args = toolUse.input; // Already an object like { query: "...", limit: 5 }
await mcpClient.callTool({ name: toolUse.name, arguments: args });

What to Check Right Now

  • Test with a multi-tool Claude response – ask a question that forces 2-3 tool calls in one response. Verify all tool use blocks are collected and all results are bundled into one user message.
  • Verify input_schema not parameters – this is the single most common copy-paste error when moving from OpenAI to Claude code. Search your code for parameters in Claude tool definitions.
  • Handle vision content in tool results – Claude can process image content blocks in tool results. If your MCP tools return images (base64), pass them through as { type: 'image', source: ... } in the tool result content array.
  • Set a system prompt – Claude responds well to clear system prompts. Define the assistant’s persona, task scope, and output format at the system level.

nJoy πŸ˜‰

Lesson 20 of 55: Building a Production OpenAI Client for MCP Tool Loops

The gap between “demo that calls a tool” and “production client that handles 10,000 daily users” is everything we have not talked about yet: connection pooling, retry logic, cost control, token budget management, error classification, telemetry, and graceful degradation. This lesson builds a production-grade OpenAI MCP client library from scratch – the kind you would actually deploy in a company. Every pattern here comes from real production failure modes.

Production OpenAI MCP client architecture diagram showing connection pool retry logic telemetry dark
A production MCP client: connection management, retry, budget control, and telemetry baked in.

The Production Client Library

// mcp-openai-client.js - Production-grade MCP + OpenAI client

import OpenAI from 'openai';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const DEFAULT_CONFIG = {
  model: 'gpt-4o',
  maxTokens: 4096,
  maxIterations: 15,
  temperature: 0.1,
  retries: 3,
  retryDelay: 1000,  // ms
  budgetUSD: 0.50,   // Max cost per conversation
  timeoutMs: 120_000, // 2 minute timeout per conversation
};

// Token cost estimates (USD per 1M tokens, approximate)
const MODEL_COSTS = {
  'gpt-4o': { input: 2.50, output: 10.00 },
  'gpt-4o-mini': { input: 0.15, output: 0.60 },
  'o3': { input: 15.00, output: 60.00 },
  'o3-mini': { input: 1.10, output: 4.40 },
};

export class McpOpenAIClient {
  constructor(mcpServerConfig, options = {}) {
    this.config = { ...DEFAULT_CONFIG, ...options };
    this.openai = new OpenAI({ apiKey: options.apiKey || process.env.OPENAI_API_KEY });
    this.mcpServerConfig = mcpServerConfig;
    this.mcpClient = null;
    this.tools = [];
    this.totalCostUSD = 0;
  }

  async connect() {
    this.mcpClient = new Client(
      { name: 'production-host', version: '1.0.0' },
      { capabilities: {} }
    );

    const transport = new StdioClientTransport(this.mcpServerConfig);
    await this.mcpClient.connect(transport);

    const { tools } = await this.mcpClient.listTools();
    this.tools = tools.map(t => ({
      type: 'function',
      function: { name: t.name, description: t.description, parameters: t.inputSchema },
    }));

    console.error(`[mcp] Connected - ${this.tools.length} tools available`);
  }

  async disconnect() {
    await this.mcpClient?.close();
  }

  estimateCostUSD(inputTokens, outputTokens, model) {
    const costs = MODEL_COSTS[model] || MODEL_COSTS['gpt-4o'];
    return (inputTokens / 1_000_000) * costs.input + (outputTokens / 1_000_000) * costs.output;
  }

  async executeWithRetry(fn, maxRetries = this.config.retries) {
    let lastError;
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        return await fn();
      } catch (err) {
        lastError = err;
        const isRetryable = err.status === 429 || err.status === 500 || err.status === 503;
        if (!isRetryable || attempt === maxRetries) throw err;

        const delay = this.config.retryDelay * Math.pow(2, attempt - 1); // Exponential backoff
        console.error(`[openai] Attempt ${attempt} failed: ${err.message}. Retrying in ${delay}ms`);
        await new Promise(r => setTimeout(r, delay));
      }
    }
    throw lastError;
  }

  async run(userMessage, systemPrompt = null) {
    const startTime = Date.now();
    const messages = [];

    if (systemPrompt) messages.push({ role: 'system', content: systemPrompt });
    messages.push({ role: 'user', content: userMessage });

    let iteration = 0;
    let totalInputTokens = 0;
    let totalOutputTokens = 0;

    while (true) {
      if (++iteration > this.config.maxIterations) {
        throw new Error(`Exceeded max iterations (${this.config.maxIterations})`);
      }

      if (Date.now() - startTime > this.config.timeoutMs) {
        throw new Error(`Conversation timeout after ${this.config.timeoutMs}ms`);
      }

      if (this.totalCostUSD > this.config.budgetUSD) {
        throw new Error(`Budget exceeded: $${this.totalCostUSD.toFixed(4)} > $${this.config.budgetUSD}`);
      }

      const response = await this.executeWithRetry(() =>
        this.openai.chat.completions.create({
          model: this.config.model,
          messages,
          tools: this.tools.length > 0 ? this.tools : undefined,
          max_tokens: this.config.maxTokens,
          temperature: this.config.temperature,
        })
      );

      const usage = response.usage;
      totalInputTokens += usage?.prompt_tokens || 0;
      totalOutputTokens += usage?.completion_tokens || 0;
      const turnCost = this.estimateCostUSD(
        usage?.prompt_tokens || 0,
        usage?.completion_tokens || 0,
        this.config.model
      );
      this.totalCostUSD += turnCost;

      const choice = response.choices[0];
      const message = choice.message;
      messages.push(message);

      if (choice.finish_reason !== 'tool_calls') {
        const elapsedMs = Date.now() - startTime;
        console.error(`[stats] iterations=${iteration} tokens=${totalInputTokens}+${totalOutputTokens} cost=$${this.totalCostUSD.toFixed(4)} elapsed=${elapsedMs}ms`);
        return {
          content: message.content,
          iterations: iteration,
          totalCostUSD: this.totalCostUSD,
          tokens: { input: totalInputTokens, output: totalOutputTokens },
          elapsedMs,
        };
      }

      // Execute tool calls
      const toolResults = await Promise.all(
        message.tool_calls.map(async (tc) => {
          let args;
          try {
            args = JSON.parse(tc.function.arguments);
          } catch {
            return { role: 'tool', tool_call_id: tc.id, content: 'Error: Invalid tool arguments JSON' };
          }

          try {
            const result = await this.mcpClient.callTool({ name: tc.function.name, arguments: args });
            const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
            const errorFlag = result.isError ? '[TOOL ERROR] ' : '';
            return { role: 'tool', tool_call_id: tc.id, content: errorFlag + text };
          } catch (err) {
            console.error(`[tool] ${tc.function.name} error: ${err.message}`);
            return { role: 'tool', tool_call_id: tc.id, content: `Tool execution failed: ${err.message}` };
          }
        })
      );

      messages.push(...toolResults);
    }
  }
}
OpenAI cost tracking dashboard showing per-model token costs budget control and usage metrics dark
Cost tracking: estimate cost per turn, accumulate per conversation, enforce budget limits before they hit your bill.

Usage Pattern

import { McpOpenAIClient } from './mcp-openai-client.js';

const client = new McpOpenAIClient(
  { command: 'node', args: ['server.js'] },
  {
    model: 'gpt-4o-mini',
    budgetUSD: 0.10,
    maxIterations: 8,
    timeoutMs: 60_000,
  }
);

await client.connect();

const result = await client.run(
  'Find me a good Python book for beginners under $40',
  'You are a helpful book recommendation assistant.'
);

console.log('Answer:', result.content);
console.log('Cost:', `$${result.totalCostUSD.toFixed(4)}`);
console.log('Iterations:', result.iterations);

await client.disconnect();

“For production deployments, implement exponential backoff for rate limit errors (429). The OpenAI API will return Retry-After headers for rate limits – respect these values.” – OpenAI Documentation, Error Codes

Failure Modes in Production

Case 1: No Budget Control

// A single misbehaving agent with no budget cap can cost hundreds of dollars
// Always set a budgetUSD limit per conversation
// Always set a maxIterations limit per conversation
// Log and alert when conversations exceed 80% of budget

Case 2: Catching All Errors and Retrying Blindly

// Some errors should NOT be retried - e.g. 400 Bad Request (invalid schema)
// 429 = retry (rate limit)
// 500/503 = retry (server error)
// 400 = do NOT retry (your code is wrong)
// 401/403 = do NOT retry (authentication issue)

What to Check Right Now

  • Set per-conversation budgets – $0.10 is a reasonable starting point for most workflows. Adjust based on your model and expected tool call count.
  • Implement exponential backoff – the pattern shown above (doubling delay on each retry) is the industry standard. Start at 1000ms, cap at 60000ms.
  • Log every tool call – production debugging without tool call logs is nearly impossible. Log tool name, arguments, result length, and execution time for every call.
  • Monitor iteration counts – if average iterations are above 8, your tool descriptions or system prompt may be unclear. Investigate and improve before scaling.

nJoy πŸ˜‰

Lesson 19 of 55: OpenAI Responses API and Agents SDK With MCP

OpenAI released the Responses API and the Agents SDK as a unified approach to building agentic workflows. These are not just new API endpoints – they represent OpenAI’s opinionated view of how production agents should be structured. The Responses API replaces the Chat Completions API for agentic use cases. The Agents SDK wraps it with built-in MCP support, tool orchestration, and a pipeline abstraction that handles the looping automatically. This lesson shows you both layers and where MCP plugs in.

OpenAI Responses API and Agents SDK architecture diagram with MCP tool integration dark
The Agents SDK wraps the Responses API with built-in MCP support and automatic tool orchestration.

The Responses API

The Responses API (openai.responses.create()) is designed for stateful, multi-turn agentic sessions. Unlike Chat Completions which requires you to manage conversation history manually, the Responses API maintains state server-side via a response ID. You reference previous responses by ID, and the API handles context management including tool call history.

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// First turn - creates a new response
const response = await openai.responses.create({
  model: 'gpt-4o',
  input: 'Search for the best laptops under $1000',
  tools: openAITools,  // Same format as Chat Completions
});

const responseId = response.id;  // Save this for continuations

// Continue the conversation using the response ID (no need to re-send history)
const followUp = await openai.responses.create({
  model: 'gpt-4o',
  input: 'Now filter to only Dell and Lenovo models',
  previous_response_id: responseId,  // References prior context
  tools: openAITools,
});

“The Responses API is designed specifically for agentic workflows. It maintains conversation state server-side, supports native tool execution, and provides a unified interface for building multi-step AI tasks.” – OpenAI API Reference, Responses

The Agents SDK with MCP

The OpenAI Agents SDK (@openai/agents) provides a higher-level abstraction with native MCP support. Instead of writing the tool calling loop yourself, the SDK handles it automatically. You define an agent with tools and an instruction, and the SDK orchestrates the full pipeline.

import { Agent, run, MCPServerStdio } from '@openai/agents';

// Connect to your MCP server via the SDK's native MCP support
const mcpServer = new MCPServerStdio({
  name: 'my-tools',
  fullCommand: 'node ./my-mcp-server.js',
});

await mcpServer.connect();

// Create an agent with the MCP server's tools
const agent = new Agent({
  name: 'Research Assistant',
  instructions: `You are a research assistant with access to product search and comparison tools.
    Always search for at least 3 options before recommending.
    Format your final recommendation as a clear list with prices.`,
  tools: await mcpServer.listTools(),
  model: 'gpt-4o',
});

// Run the agent - the SDK handles the tool calling loop
const result = await run(agent, 'Find the best wireless headphones under $200');
console.log('Final answer:', result.finalOutput);

// Clean up
await mcpServer.close();
OpenAI Agents SDK pipeline diagram showing Agent definition running with tools and MCP server integration dark
The Agents SDK pipeline: define agent + tools, run with input, SDK handles orchestration automatically.

Handoffs: Multi-Agent Patterns with the SDK

import { Agent, run, handoff } from '@openai/agents';

const searchAgent = new Agent({
  name: 'Search Specialist',
  instructions: 'You specialise in searching and retrieving product data.',
  tools: searchMcpTools,
  model: 'gpt-4o-mini',  // Cheaper model for search
});

const analysisAgent = new Agent({
  name: 'Analysis Specialist',
  instructions: 'You specialise in comparing and recommending products based on data.',
  tools: analysisMcpTools,
  model: 'gpt-4o',        // Smarter model for complex reasoning
  handoffs: [handoff(searchAgent, 'Use search specialist when you need more data')],
});

const result = await run(analysisAgent, 'Compare the top 5 gaming laptops');
console.log(result.finalOutput);

Failure Modes with the Responses API and Agents SDK

Case 1: Not Handling Tool Call Errors in the Responses API

// The Responses API may return partial results if a tool fails
// Always check response.status and handle incomplete states
const response = await openai.responses.create({ ... });

if (response.status === 'incomplete') {
  console.error('Response incomplete:', response.incomplete_details);
  // Handle: retry, use partial output, or escalate
}

Case 2: State Leakage Between Responses API Sessions

// previous_response_id links responses in a chain
// If you reuse an ID from a different user's session, state leaks
// Always scope response IDs to the authenticated user's session store

const userSession = sessions.get(userId);
const response = await openai.responses.create({
  previous_response_id: userSession.lastResponseId || undefined,
  ...
});
userSession.lastResponseId = response.id;

What to Check Right Now

  • Try the Agents SDK first – if you are building a new agent, start with the Agents SDK. The automatic tool loop saves significant boilerplate.
  • Use the Responses API for long sessions – for multi-turn conversations with many tool calls, the Responses API’s server-side state management avoids sending large context windows repeatedly.
  • Test handoff behaviour – if using multi-agent handoffs, test the edge case where the receiving agent decides it does not need to hand off again and loops back incorrectly.
  • Check the Agents SDK version – the SDK is actively developed. Pin the version in package.json and read the changelog when upgrading: npm install @openai/agents.

nJoy πŸ˜‰

Lesson 18 of 55: OpenAI Streaming and Structured Outputs With MCP Tools

Tool calling with a single round-trip response is the entry point. But production MCP applications need streaming – the ability to show intermediate results to users as the model thinks – and structured outputs, which guarantee that the model’s final answer conforms to a schema you define. This lesson adds both to your OpenAI + MCP integration, covering the streaming tool call parsing mechanics and the structured output patterns that prevent hallucinated schemas in production.

OpenAI streaming tool calling diagram showing chunks arriving over time with tool call delta parsing dark
Streaming + tool calling: deltas arrive incrementally, tool call arguments accumulate, execution happens when complete.

Streaming with Tool Calls

When you stream a completion that includes tool calls, the tool call arguments arrive incrementally as delta chunks. You must accumulate them before you can parse and execute the tool. The pattern is: buffer all deltas, detect when a tool call is complete, then execute through MCP.

import OpenAI from 'openai';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const mcpClient = new Client({ name: 'streaming-host', version: '1.0.0' }, { capabilities: {} });

await mcpClient.connect(new StdioClientTransport({ command: 'node', args: ['server.js'] }));
const { tools: mcpTools } = await mcpClient.listTools();

const openAITools = mcpTools.map(t => ({
  type: 'function',
  function: { name: t.name, description: t.description, parameters: t.inputSchema },
}));

async function runStreamingWithTools(userMessage) {
  const messages = [{ role: 'user', content: userMessage }];

  while (true) {
    // Stream the completion
    const stream = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages,
      tools: openAITools,
      stream: true,
    });

    // Accumulate the full response
    let assistantMessage = { role: 'assistant', content: '', tool_calls: [] };
    const toolCallMap = {}; // index -> accumulated tool call

    for await (const chunk of stream) {
      const delta = chunk.choices[0]?.delta;
      if (!delta) continue;

      // Stream text content to UI
      if (delta.content) {
        assistantMessage.content += delta.content;
        process.stdout.write(delta.content); // Real-time output
      }

      // Accumulate tool call deltas
      if (delta.tool_calls) {
        for (const tcDelta of delta.tool_calls) {
          const idx = tcDelta.index;
          if (!toolCallMap[idx]) {
            toolCallMap[idx] = { id: '', type: 'function', function: { name: '', arguments: '' } };
          }
          const tc = toolCallMap[idx];
          if (tcDelta.id) tc.id += tcDelta.id;
          if (tcDelta.function?.name) tc.function.name += tcDelta.function.name;
          if (tcDelta.function?.arguments) tc.function.arguments += tcDelta.function.arguments;
        }
      }
    }

    assistantMessage.tool_calls = Object.values(toolCallMap);
    messages.push(assistantMessage);

    // No tool calls = we have the final answer
    if (assistantMessage.tool_calls.length === 0) {
      return assistantMessage.content;
    }

    // Execute all accumulated tool calls through MCP
    const toolResults = await Promise.all(
      assistantMessage.tool_calls.map(async (tc) => {
        const args = JSON.parse(tc.function.arguments);
        console.error(`\n[tool] Calling: ${tc.function.name}`);
        const result = await mcpClient.callTool({ name: tc.function.name, arguments: args });
        const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
        return { role: 'tool', tool_call_id: tc.id, content: text };
      })
    );

    messages.push(...toolResults);
  }
}

const answer = await runStreamingWithTools('What are the best products under $50?');
console.log('\n\nFinal:', answer);
OpenAI structured output schema enforcement showing response json conforming to zod schema dark
Structured outputs: the model is forced to return JSON that matches your exact schema – no hallucinated fields.

Structured Outputs with MCP Tool Results

OpenAI’s structured outputs feature forces the model to return JSON that exactly matches a schema you specify. This is different from JSON mode (which just returns valid JSON) – structured outputs guarantee that every required field is present and every value is the correct type. You can use structured outputs for the final answer even when intermediate steps use tool calls.

import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod.js';

// Define the schema for the final answer
const ProductRecommendationSchema = z.object({
  recommendations: z.array(z.object({
    product_name: z.string(),
    price: z.number(),
    reason: z.string(),
    confidence: z.enum(['high', 'medium', 'low']),
  })),
  total_products_checked: z.number(),
  search_strategy: z.string(),
});

// Use structured output for the final response
const finalResponse = await openai.beta.chat.completions.parse({
  model: 'gpt-4o',
  messages: [
    ...conversationHistory,
    { role: 'user', content: 'Based on the search results, provide your top 3 recommendations.' },
  ],
  response_format: zodResponseFormat(ProductRecommendationSchema, 'product_recommendations'),
});

const recommendations = finalResponse.choices[0].message.parsed;
// recommendations is now typed as ProductRecommendation - guaranteed to match schema
console.log(recommendations.recommendations[0].product_name);

“Structured Outputs is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema, so you don’t need to worry about the model omitting a required key, or hallucinating an invalid enum value.” – OpenAI Documentation, Structured Outputs

Failure Modes with Streaming Tool Calls

Case 1: Parsing Arguments Before All Deltas Arrive

// WRONG: Parsing tool call arguments during streaming
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
  if (delta.tool_calls?.[0]?.function?.arguments) {
    const args = JSON.parse(delta.tool_calls[0].function.arguments); // WRONG - may be partial JSON
    await mcpClient.callTool({ ... });
  }
}

// CORRECT: Accumulate all deltas first, then parse
// (As shown in the complete streaming loop above)

Case 2: Missing tool_call_id in Tool Result Messages

// WRONG: tool_call_id missing or mismatched
messages.push({ role: 'tool', content: result }); // Missing tool_call_id

// CORRECT: Each tool result must include the exact tool_call_id
messages.push({ role: 'tool', tool_call_id: tc.id, content: result });

What to Check Right Now

  • Test streaming with a multi-tool query – ask a question that forces two tool calls in sequence. Verify the streaming output is coherent and the final answer is correct.
  • Add a progress indicator – during streaming, show a spinner or partial text. Users should see something happening, not a blank screen for 10 seconds.
  • Use structured outputs for all final answers – wherever your application needs to parse the model’s response programmatically, use structured outputs. It eliminates an entire class of parsing bugs.
  • Handle stream errors – wrap the for await (const chunk of stream) loop in a try-catch. Network errors during streaming are common and need graceful handling.

nJoy πŸ˜‰

Lesson 17 of 55: OpenAI + MCP Tool Calling With GPT-4o and o3

OpenAI’s tool calling is where MCP integration becomes immediately tangible. You have an MCP server with tools registered on it. You have a GPT-4o or o3 model that needs to use those tools. The integration is three steps: list tools from MCP, convert them to OpenAI’s function format, run the completion loop. This lesson builds that integration from scratch, explains every conversion step, and covers the failure modes that will break your agent in the middle of a production run.

OpenAI GPT-4o connected to MCP server via tool calling loop diagram dark technical
OpenAI + MCP: the tool calling loop – list tools, generate, route tool calls to MCP, feed results back.

The OpenAI Tool Calling Model

OpenAI’s tool calling (formerly function calling) works by providing the model with a list of functions it can invoke. When the model decides to use a tool, the API returns a response with tool_calls instead of content. Your application executes the tool, then appends the result to the conversation and calls the API again. This loop continues until the model returns content with no pending tool calls.

MCP tools map cleanly onto OpenAI’s function schema. The conversion is mechanical: take the MCP tool’s name, description, and JSON Schema, and wrap them in OpenAI’s format.

// MCP tool schema (what the MCP server provides)
// {
//   name: "search_products",
//   description: "Search the product catalogue",
//   inputSchema: {
//     type: "object",
//     properties: {
//       query: { type: "string", description: "Search terms" },
//       limit: { type: "number", description: "Max results" }
//     },
//     required: ["query"]
//   }
// }

// OpenAI tool format (what openai.chat.completions.create() expects)
function mcpToolToOpenAITool(mcpTool) {
  return {
    type: 'function',
    function: {
      name: mcpTool.name,
      description: mcpTool.description,
      parameters: mcpTool.inputSchema,  // Direct pass-through - formats are compatible
    },
  };
}

“Tool calls allow models to call user-defined tools. Tools are specified in the request by the user, and the model can call them during message generation.” – OpenAI Documentation, Function Calling

The Complete Integration: MCP Client + OpenAI Loop

// mcp-openai-host.js
import OpenAI from 'openai';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Step 1: Connect to MCP server
const mcpClient = new Client(
  { name: 'openai-host', version: '1.0.0' },
  { capabilities: {} }
);

const transport = new StdioClientTransport({
  command: 'node',
  args: ['./my-mcp-server.js'],
  env: process.env,
});

await mcpClient.connect(transport);

// Step 2: Discover tools and convert to OpenAI format
const { tools: mcpTools } = await mcpClient.listTools();
const openAITools = mcpTools.map(tool => ({
  type: 'function',
  function: {
    name: tool.name,
    description: tool.description,
    parameters: tool.inputSchema,
  },
}));

console.log(`Loaded ${openAITools.length} tools from MCP server`);

// Step 3: Build the tool-calling loop
async function runWithTools(userMessage) {
  const messages = [{ role: 'user', content: userMessage }];

  while (true) {
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages,
      tools: openAITools,
      tool_choice: 'auto',  // Let the model decide
    });

    const choice = response.choices[0];
    const message = choice.message;

    // Append the assistant message to conversation
    messages.push(message);

    // If no tool calls, we have the final answer
    if (choice.finish_reason !== 'tool_calls') {
      return message.content;
    }

    // Execute each tool call through MCP
    const toolResults = await Promise.all(
      message.tool_calls.map(async (toolCall) => {
        const args = JSON.parse(toolCall.function.arguments);

        console.log(`Calling tool: ${toolCall.function.name}`, args);

        const result = await mcpClient.callTool({
          name: toolCall.function.name,
          arguments: args,
        });

        // Format result for OpenAI
        const resultText = result.content
          .filter(c => c.type === 'text')
          .map(c => c.text)
          .join('\n');

        return {
          role: 'tool',
          tool_call_id: toolCall.id,
          content: resultText,
        };
      })
    );

    // Append all tool results to conversation
    messages.push(...toolResults);
    // Loop back to get the model's response to the tool results
  }
}

// Run it
const answer = await runWithTools('Find the top 5 electronics products under $100');
console.log('\nFinal answer:', answer);

await mcpClient.close();
OpenAI tool calling loop diagram showing messages array building up with tool calls and results dark
The tool calling loop: send messages, get tool_calls, execute via MCP, append results, repeat until final content.

Using GPT-4o vs o3 with MCP Tools

Different OpenAI models have different tool calling behaviours. GPT-4o is the most reliable for agentic tool use: it calls tools precisely, handles multi-tool scenarios well, and respects tool descriptions. The o3 and o3-mini reasoning models think before calling tools, which improves accuracy on complex multi-step tasks but adds latency and cost.

// For fast, reliable tool calling:
model: 'gpt-4o'

// For complex reasoning tasks where accuracy matters more than speed:
model: 'o3-mini'

// o3 supports a different parameter for "thinking budget":
const response = await openai.chat.completions.create({
  model: 'o3',
  messages,
  tools: openAITools,
  reasoning_effort: 'medium',  // 'low', 'medium', 'high'
});

Failure Modes with OpenAI + MCP

Case 1: Not Handling Multiple Simultaneous Tool Calls

GPT-4o can call multiple tools in a single response. If you only handle the first tool call, you will get protocol errors when the model expects all tool call results before it continues.

// WRONG: Only handles first tool call
const toolCall = message.tool_calls[0];
const result = await mcpClient.callTool({ name: toolCall.function.name, arguments: ... });

// CORRECT: Handle all tool calls, run them in parallel
const toolResults = await Promise.all(
  message.tool_calls.map(async (toolCall) => { ... })
);
messages.push(...toolResults);

Case 2: Infinite Tool Call Loops

If a tool always returns data that prompts another tool call, the loop never terminates. Set a maximum iteration count.

const MAX_ITERATIONS = 10;
let iterations = 0;

while (true) {
  if (++iterations > MAX_ITERATIONS) {
    throw new Error(`Tool calling loop exceeded ${MAX_ITERATIONS} iterations`);
  }
  // ... rest of loop
}

Case 3: Passing MCP Tool Input Schema Directly Without Validation

OpenAI requires tool parameter schemas to be valid JSON Schema. MCP’s inputSchema is JSON Schema, so it should work – but some edge cases (like Zod’s default values, which add non-standard keys) can cause OpenAI API errors. Strip unknown keys before passing to OpenAI.

// Safe schema extraction
function safeInputSchema(mcpTool) {
  const schema = mcpTool.inputSchema;
  // OpenAI does not accept 'default' at the schema root level
  // Strip it to avoid API validation errors
  const { default: _, ...safeSchema } = schema;
  return safeSchema;
}

What to Check Right Now

  • Test tool conversion – print your OpenAI tools array and verify each tool has the correct name, description, and parameter schema.
  • Run with gpt-4o-mini first – use the cheaper model during development to iterate faster and avoid burning GPT-4o quota on debugging.
  • Log tool calls and results – add logging every time a tool is called and its result received. This makes agentic debugging dramatically easier.
  • Cap iteration count – always set a maximum loop iteration and handle the case where the model runs out of allowed turns.

nJoy πŸ˜‰