David Saliba

Lesson 55 of 55 (Capstone): Full MCP Platform – Registry, Gateway, and Agents

Posted on March 19, 2026March 22, 2026 by David Saliba

This final capstone assembles everything from the course into a complete MCP platform: a registry for server discovery, an API gateway for authentication and routing, a collection of domain-specific MCP servers, and a web interface where teams can explore available tools, run agent queries, and review audit logs. When you deploy this platform, you have the infrastructure that enterprise teams need to build and manage AI-powered workflows on MCP.

Full MCP platform architecture registry gateway domain servers web interface audit logs monitoring dark — The complete MCP platform: registry, gateway, domain servers, and a management web interface.

Platform Architecture Overview

Component	Purpose	Lesson Reference
MCP Registry	Server discovery and health tracking	Lesson 44
API Gateway	Auth (OAuth), rate limiting, routing	Lessons 31, 41
Domain MCP Servers	Business tools (CRM, docs, analytics)	Parts I-III
Multi-Provider Agent	Route queries to OpenAI/Claude/Gemini	Lessons 28-30
Audit Service	Structured logs, compliance reporting	Lesson 35
Observability Stack	Prometheus + Grafana + OpenTelemetry	Lesson 42
Management UI	Tool explorer, query interface, logs	This lesson

Every row in this table maps to a lesson you have already completed. The capstone’s job is not to teach new concepts but to show how they compose into a real system. In production, these components run as separate services that communicate over HTTP and message queues, so a failure in analytics does not bring down the gateway or registry.

Platform Bootstrap Script

// platform/bootstrap.js
// Register all MCP servers with the registry on startup

const REGISTRY_URL = process.env.REGISTRY_URL ?? 'http://localhost:4000';

const MCP_SERVERS = [
  {
    id: 'products',
    name: 'Product Catalog Server',
    description: 'Search, browse, and manage product catalog',
    url: process.env.PRODUCTS_SERVER_URL,
    tags: ['products', 'catalog', 'inventory'],
    auth: { type: 'bearer' },
    healthUrl: `${process.env.PRODUCTS_SERVER_URL}/health`,
  },
  {
    id: 'analytics',
    name: 'Analytics Server',
    description: 'Business metrics, trends, and reports',
    url: process.env.ANALYTICS_SERVER_URL,
    tags: ['analytics', 'metrics', 'reports'],
    auth: { type: 'bearer' },
    healthUrl: `${process.env.ANALYTICS_SERVER_URL}/health`,
  },
  // ... more servers
];

async function registerAll() {
  for (const server of MCP_SERVERS) {
    await fetch(`${REGISTRY_URL}/servers`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(server),
    });
    console.log(`Registered: ${server.name}`);
  }
}

await registerAll();

Registry-driven discovery is what makes this platform extensible. When a new team wants to expose their internal API as an MCP server, they register it here and it becomes automatically available to the agent and the management UI. No code changes, no redeployment of the gateway – just a single POST to the registry endpoint.

Management API

// platform/management-api.js
// REST API for the management UI

import express from 'express';
const app = express();
app.use(express.json());

// List all registered MCP servers with health
app.get('/api/platform/servers', async (req, res) => {
  const response = await fetch(`${REGISTRY_URL}/status`);
  res.json(await response.json());
});

// List all tools from all healthy servers
app.get('/api/platform/tools', async (req, res) => {
  const discovery = new McpDiscoveryClient(REGISTRY_URL);
  await discovery.connect();
  const tools = await discovery.getAllTools();
  res.json({ tools, count: tools.length });
});

// Execute an agent query
app.post('/api/platform/query', async (req, res) => {
  const { question, provider = 'auto', userId } = req.body;
  // Rate limit, auth check, then:
  const agent = await createAgent({ scope: getUserScope(userId), preferredProvider: provider });
  const answer = await agent.run(question);
  res.json({ answer });
  await agent.close();
});

// Get audit logs for a user
app.get('/api/platform/audit', async (req, res) => {
  const { userId, from, to, limit = 50 } = req.query;
  const logs = await auditDb.query({ userId, from, to, limit });
  res.json({ logs });
});

app.listen(5000, () => console.log('Management API on :5000'));

Platform component interaction diagram registry discovery client agent router domain servers management UI dark — Component interaction: the discovery client queries the registry, builds the tool set, and routes through the agent.

One risk in a distributed platform like this: if the registry goes down, no new agent sessions can discover tools. The management API’s /tools endpoint depends on a live registry connection. In production, cache the last-known server list in the gateway so it can continue serving requests even during a brief registry outage.

The audit endpoint at /api/platform/audit is what compliance teams will query most frequently. It lets managers review what their team asked the AI, which tools it called, and whether any requests failed. Without this, AI assistants become a black box that security teams will rightly refuse to approve.

Docker Compose – Full Platform

services:
  registry:
    build: ./registry
    ports: ["4000:4000"]
    depends_on: [redis]

  gateway:
    build: ./gateway
    ports: ["3000:3000"]
    environment:
      REGISTRY_URL: http://registry:4000
    depends_on: [registry, redis]

  management-api:
    build: ./platform
    ports: ["5000:5000"]
    depends_on: [gateway, registry]

  products-server:
    build: ./servers/products
    environment:
      DATABASE_URL: ${PRODUCTS_DB_URL}

  analytics-server:
    build: ./servers/analytics
    environment:
      DATABASE_URL: ${ANALYTICS_DB_URL}

  redis:
    image: redis:7-alpine

  prometheus:
    image: prom/prometheus:v2.50.0
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports: ["9090:9090"]

  grafana:
    image: grafana/grafana:10.3.0
    ports: ["3001:3000"]
    depends_on: [prometheus]

Eight services in a single Compose file. This is a realistic local development setup, but for production you would break these into separate deployment units – the gateway and domain servers behind a load balancer, Prometheus and Grafana in a dedicated monitoring namespace, and the registry behind its own high-availability cluster.

What You Have Built

Across all 53 lessons and 5 capstone projects you have built:

MCP servers using every primitive: tools, resources, prompts, sampling, elicitation, roots
Clients for all three major LLM providers: OpenAI, Claude, and Gemini
Production infrastructure: Docker, Kubernetes, Nginx, Redis
Security stack: OAuth 2.0, RBAC, input validation, audit logging, secrets management
Multi-agent systems: A2A delegation, LangGraph integration, state management
Observability: Prometheus metrics, OpenTelemetry tracing, structured logs
A complete enterprise platform: registry, gateway, domain servers, management UI

MCP is the connective tissue of the AI application stack. You now know it from protocol fundamentals to enterprise deployment. Go build something important.

nJoy 😉

Lesson 54 of 55 (Capstone): Enterprise Assistant With Auth, RBAC, and Audit Logs

Posted on March 19, 2026March 22, 2026 by David Saliba

This capstone builds the most complete MCP application in the course: an enterprise AI assistant with OAuth 2.0 authentication, RBAC tool access control, full audit logging, rate limiting, and a multi-provider backend. It brings together patterns from every major part of the course into a single deployable system. Deploy it and you have a production-ready enterprise AI assistant that your security team can audit and your compliance team can sign off on.

Enterprise AI assistant full architecture OAuth RBAC audit logging rate limiting multi-provider MCP dark — Enterprise-grade: OAuth tokens + RBAC scope filtering + audit logs + rate limiting + multi-provider routing.

System Architecture

enterprise-assistant/
├── gateway/
│   ├── server.js          (HTTP API gateway with auth + rate limiting)
│   ├── auth.js            (OAuth 2.0 token validation, JWKS)
│   ├── rbac.js            (Role-to-scope mapping, tool filtering)
│   ├── audit.js           (Structured audit logging)
│   └── rate-limiter.js    (Per-user rate limiting with Redis)
├── agent/
│   ├── router.js          (Multi-provider routing: OpenAI/Claude/Gemini)
│   └── executor.js        (Tool loop with retry, timeout, token budget)
├── servers/
│   ├── knowledge-server.js (Knowledge base search)
│   └── actions-server.js   (Business action tools)
└── docker-compose.yml

The Gateway Server

// gateway/server.js
import express from 'express';
import { validateToken, getRolesFromToken } from './auth.js';
import { getScopeFromRoles, getAllowedTools } from './rbac.js';
import { AuditLogger } from './audit.js';
import { createRateLimiter } from './rate-limiter.js';
import { createAgent } from '../agent/router.js';

const app = express();
app.use(express.json());

const auditLog = new AuditLogger();
const rateLimiter = createRateLimiter(60);  // 60 req/min per user

// Health check
app.get('/health', (req, res) => res.json({ status: 'ok', uptime: process.uptime() }));
app.get('/metrics', (req, res) => res.end(getPrometheusMetrics()));

// Main API endpoint
app.post('/api/ask', async (req, res) => {
  const requestId = crypto.randomUUID();

  // 1. Authenticate
  const authHeader = req.headers.authorization;
  if (!authHeader?.startsWith('Bearer ')) {
    return res.status(401).json({ error: 'Bearer token required' });
  }

  let claims;
  try {
    claims = await validateToken(authHeader.slice(7));
  } catch {
    return res.status(401).json({ error: 'Invalid token' });
  }

  // 2. Rate limit
  try {
    await rateLimiter.consume(claims.sub);
  } catch (rl) {
    res.setHeader('Retry-After', Math.ceil(rl.msBeforeNext / 1000));
    return res.status(429).json({ error: 'Rate limit exceeded' });
  }

  // 3. Determine role and scope
  const roles = getRolesFromToken(claims);
  const scope = getScopeFromRoles(roles);

  // 4. Get question
  const { question, preferredProvider } = req.body;
  if (!question?.trim()) return res.status(400).json({ error: 'question is required' });

  // 5. Build and run the agent
  const agent = await createAgent({ scope, preferredProvider });

  // 6. Run with audit logging
  await auditLog.write({
    eventId: requestId,
    eventType: 'api_request',
    actor: { userId: claims.sub, roles },
    request: { question: question.slice(0, 100) },
    scope: scope.split(' '),
  });

  try {
    const answer = await agent.run(question);

    await auditLog.write({
      eventId: requestId,
      eventType: 'api_response',
      actor: { userId: claims.sub },
      outcome: { success: true },
    });

    res.json({ answer, requestId });
  } catch (err) {
    await auditLog.write({
      eventId: requestId,
      eventType: 'api_error',
      actor: { userId: claims.sub },
      outcome: { success: false, error: err.message },
    });
    res.status(500).json({ error: 'Agent execution failed', requestId });
  } finally {
    await agent.close();
  }
});

const PORT = process.env.PORT ?? 3000;
app.listen(PORT, () => console.log(`Enterprise assistant listening on :${PORT}`));

Request flow diagram authenticate rate limit RBAC scope filter agent run audit log response dark — Request lifecycle: every request goes through 6 stages before the agent runs.

The six-stage pipeline (authenticate, rate limit, resolve roles, validate input, run agent, audit) is the same request lifecycle used by production API gateways at companies like Stripe and Shopify. Each stage can reject the request independently, and the audit log captures the outcome regardless of success or failure. This is what compliance teams actually review during security audits.

Notice that the agent is created fresh per request and closed in the finally block. This prevents one user’s MCP session state from leaking into another user’s query. It costs a bit more in connection overhead, but the isolation guarantee is worth it for a multi-tenant system.

RBAC Configuration

// gateway/rbac.js
const ROLE_SCOPES = {
  employee: 'knowledge:read',
  manager: 'knowledge:read actions:read',
  admin: 'knowledge:read knowledge:write actions:read actions:write',
};

const SCOPE_TOOLS = {
  'knowledge:read': ['search_knowledge', 'get_article', 'list_categories'],
  'knowledge:write': ['create_article', 'update_article', 'publish_article'],
  'actions:read': ['get_ticket', 'list_tickets', 'get_report'],
  'actions:write': ['create_ticket', 'update_ticket', 'trigger_alert'],
};

export function getScopeFromRoles(roles) {
  return [...new Set(roles.flatMap(r => (ROLE_SCOPES[r] ?? '').split(' ')).filter(Boolean))].join(' ');
}

export function getAllowedTools(scope, allTools) {
  const allowed = new Set(
    scope.split(' ').flatMap(s => SCOPE_TOOLS[s] ?? [])
  );
  return allTools.filter(t => allowed.has(t.name));
}

A misconfigured RBAC map is one of the most dangerous bugs in this system. If you accidentally give the employee role actions:write scope, every employee can trigger alerts and modify tickets through the AI assistant. Always test your scope mapping with unit tests, and consider adding a “dry run” mode that logs what a user would be allowed to do without actually executing anything.

Multi-Provider Agent Router

// agent/router.js - select provider based on question complexity
import { OpenAIProvider } from './providers/openai.js';
import { ClaudeProvider } from './providers/claude.js';
import { GeminiProvider } from './providers/gemini.js';
import { getAllowedTools } from '../gateway/rbac.js';

export async function createAgent({ scope, preferredProvider = 'auto' }) {
  // Load MCP servers
  const mcpClients = await connectMcpServers();
  const allTools = await aggregateTools(mcpClients);
  const scopedTools = getAllowedTools(scope, allTools);

  // Select provider
  const question = '';  // Provider selection is done at query time
  const providerKey = preferredProvider === 'auto'
    ? selectProvider(question)
    : preferredProvider;

  const Provider = { openai: OpenAIProvider, claude: ClaudeProvider, gemini: GeminiProvider }[providerKey];
  const provider = new Provider({ maxTurns: 12, tokenBudget: 50_000 });

  return {
    async run(question) {
      return provider.run(question, scopedTools, mcpClients);
    },
    async close() {
      await Promise.all(mcpClients.map(c => c.close()));
    },
  };
}

The multi-provider router gives you vendor resilience. If OpenAI has an outage, you can fall back to Claude or Gemini without changing any application code. In practice, teams also use this pattern for cost optimization – routing simple queries to cheaper models and complex analytical questions to more capable ones.

Deployment

services:
  gateway:
    build: .
    ports: ["3000:3000"]
    environment:
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
      GEMINI_API_KEY: ${GEMINI_API_KEY}
      JWKS_URL: ${JWKS_URL}
      REDIS_URL: redis://redis:6379
    depends_on: [redis]
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:3000/health"]
      interval: 30s; timeout: 5s; retries: 3

  redis:
    image: redis:7-alpine
    volumes: ["redis-data:/data"]

volumes:
  redis-data:

The Docker Compose file gives you a single docker compose up to launch the entire stack locally. Redis handles both rate limiting state and session caching. For production, you would swap the single Redis container for a managed service (like AWS ElastiCache or GCP Memorystore) and add TLS termination in front of the gateway.

nJoy 😉

Lesson 53 of 55 (Capstone): Multi-API Integration Hub With MCP

Posted on March 19, 2026March 22, 2026 by David Saliba

Real-world AI assistants need to integrate many APIs: a CRM for customer data, a ticketing system for support requests, a payment processor for billing status, a calendar for scheduling. Each of these becomes an MCP server, and the multi-provider abstraction layer from Lesson 29 routes queries to the right provider. This capstone builds a multi-API integration hub that unifies five real-world APIs behind a single MCP interface, with tool routing, error handling, and a unified context window.

Multi-API hub architecture five MCP servers CRM ticketing payments calendar analytics unified gateway dark — Five MCP servers, one agent: the hub aggregates tools from all servers and routes calls automatically.

Project Architecture

mcp-api-hub/
├── servers/
│   ├── crm-server.js          (Customer data: search, get, update)
│   ├── tickets-server.js      (Support tickets: list, create, update)
│   ├── payments-server.js     (Billing: get_invoice, check_subscription)
│   ├── calendar-server.js     (Meetings: list, create, cancel)
│   └── analytics-server.js   (Metrics: get_report, get_trend)
├── agent/
│   └── hub-agent.js           (Multi-server MCP + OpenAI agent)
└── index.js

The key architectural decision here is one agent, many servers. Each API gets its own MCP server process, which means they are isolated – a crash in the payments server does not take down the CRM. It also means you can develop, test, and deploy each server independently, exactly like microservices.

The Multi-Server Agent

// agent/hub-agent.js
import OpenAI from 'openai';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const SERVER_CONFIGS = [
  { id: 'crm', command: 'node', args: ['./servers/crm-server.js'] },
  { id: 'tickets', command: 'node', args: ['./servers/tickets-server.js'] },
  { id: 'payments', command: 'node', args: ['./servers/payments-server.js'] },
  { id: 'calendar', command: 'node', args: ['./servers/calendar-server.js'] },
  { id: 'analytics', command: 'node', args: ['./servers/analytics-server.js'] },
];

export async function createHubAgent() {
  const openai = new OpenAI();
  const connections = new Map();
  const allTools = [];

  // Connect to all servers in parallel
  await Promise.all(SERVER_CONFIGS.map(async config => {
    const transport = new StdioClientTransport({ command: config.command, args: config.args, env: process.env });
    const client = new Client({ name: 'hub-agent', version: '1.0.0' });
    await client.connect(transport);
    connections.set(config.id, client);

    const { tools } = await client.listTools();
    for (const tool of tools) {
      allTools.push({
        serverId: config.id,
        tool,
        openaiFormat: {
          type: 'function',
          function: { name: tool.name, description: `[${config.id}] ${tool.description}`, parameters: tool.inputSchema, strict: true },
        },
      });
    }
  }));

  console.log(`Hub connected to ${connections.size} servers, ${allTools.length} tools total`);

  // Find which server owns a tool
  const toolIndex = new Map(allTools.map(t => [t.tool.name, t]));

  return {
    async query(userMessage) {
      const messages = [
        {
          role: 'system',
          content: `You are a comprehensive business assistant with access to CRM, ticketing, payments, calendar, and analytics systems.
Tools are prefixed with their system: [crm], [tickets], [payments], [calendar], [analytics].
When answering questions, use tools from multiple systems as needed to give a complete answer.
Always check multiple related systems when investigating customer issues.`,
        },
        { role: 'user', content: userMessage },
      ];

      const openaiTools = allTools.map(t => t.openaiFormat);
      let turns = 0;

      while (true) {
        const response = await openai.chat.completions.create({
          model: 'gpt-4o', messages, tools: openaiTools, tool_choice: 'auto',
          parallel_tool_calls: true,
        });
        const msg = response.choices[0].message;
        messages.push(msg);

        if (msg.finish_reason !== 'tool_calls') return msg.content;
        if (++turns > 15) throw new Error('Max turns exceeded');

        const results = await Promise.all(msg.tool_calls.map(async tc => {
          const entry = toolIndex.get(tc.function.name);
          if (!entry) {
            return { role: 'tool', tool_call_id: tc.id, content: `Tool '${tc.function.name}' not found` };
          }
          const client = connections.get(entry.serverId);
          const args = JSON.parse(tc.function.arguments);
          const result = await client.callTool({ name: tc.function.name, arguments: args });
          const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
          return { role: 'tool', tool_call_id: tc.id, content: text };
        }));
        messages.push(...results);
      }
    },

    async close() {
      await Promise.all([...connections.values()].map(c => c.close()));
    },
  };
}

Multi-server query flow OpenAI calling tools from CRM tickets payments in parallel collecting results dark — Parallel tool calling: GPT-4o queries CRM, tickets, and payments simultaneously for a complete customer view.

The parallel_tool_calls: true flag is critical for performance. Without it, the model would call CRM, wait for the response, then call tickets, wait again, then call payments. With parallel calls, all three fire simultaneously and the total latency is the slowest server, not the sum of all servers. For customer-facing support bots, this can cut response time from 6 seconds to 2.

One thing that can go wrong here: tool name collisions. If both the CRM server and the tickets server expose a tool called search, the toolIndex map will silently overwrite one with the other. The description prefix ([crm], [tickets]) helps the model distinguish them, but the routing map needs unique names. Namespace your tool names (like crm_search, tickets_search) to avoid this.

Sample CRM Server (Condensed)

// servers/crm-server.js
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';

const server = new McpServer({ name: 'crm-server', version: '1.0.0' });

server.tool('search_customers', {
  query: z.string().min(1).max(100),
  limit: z.number().int().min(1).max(20).default(10),
}, async ({ query, limit }) => {
  const customers = await crmApi.search(query, limit);
  return { content: [{ type: 'text', text: JSON.stringify(customers) }] };
});

server.tool('get_customer', {
  id: z.string().uuid(),
}, async ({ id }) => {
  const customer = await crmApi.getById(id);
  if (!customer) return { content: [{ type: 'text', text: 'Customer not found' }], isError: true };
  return { content: [{ type: 'text', text: JSON.stringify(customer) }] };
});

const transport = new StdioServerTransport();
await server.connect(transport);

Example Usage

const agent = await createHubAgent();

const answer = await agent.query(
  'Customer john.smith@acme.com says their subscription renewal failed last week. ' +
  'What is their account status, do they have any open support tickets, ' +
  'and what does their payment history look like?'
);
// Agent will call: search_customers, get_subscription, list_tickets, get_payment_history
// in parallel, then synthesize a complete answer

console.log(answer);
await agent.close();

This hub pattern is how enterprise support platforms like Zendesk and Intercom are building their AI assistants. A single user question like “why was this customer charged twice?” requires data from billing, CRM, and ticketing systems simultaneously. Without MCP’s standardized tool interface, you would need custom integration code for every API combination.

nJoy 😉

Lesson 52 of 55 (Capstone): Filesystem Agent With Claude and MCP

Posted on March 19, 2026March 22, 2026 by David Saliba

This capstone builds a filesystem agent powered by Claude 3.7 Sonnet. The agent can read files, search codebases, analyze code structure, and refactor files under user supervision. It applies the security patterns from Part VIII: roots for filesystem boundaries, tool safety for path validation, and confirmation-based elicitation for destructive file writes. The result is a safe, auditable codebase assistant that you can trust with your actual project files.

Filesystem agent architecture Claude MCP server file tools read search analyze write with roots boundary dark — Filesystem agent: Claude plans file operations, MCP server executes them within roots-defined boundaries.

The Filesystem MCP Server

// servers/fs-server.js
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';
import fs from 'node:fs/promises';
import path from 'node:path';

const server = new McpServer({ name: 'fs-server', version: '1.0.0' });

// Get the allowed root from the client (via roots capability)
let allowedRoots = [];
server.server.onroots_list_changed = async () => {
  const { roots } = await server.server.listRoots();
  allowedRoots = roots.map(r => r.uri.replace('file://', ''));
};

// Path safety: ensure the path is within an allowed root
function validatePath(filePath) {
  const resolved = path.resolve(filePath);
  if (allowedRoots.length === 0) {
    throw new Error('No filesystem roots configured');
  }
  const isAllowed = allowedRoots.some(root => resolved.startsWith(path.resolve(root)));
  if (!isAllowed) {
    throw new Error(`Path '${resolved}' is outside allowed roots: ${allowedRoots.join(', ')}`);
  }
  return resolved;
}

// Tool: Read a file
server.tool('read_file', {
  path: z.string().min(1).max(512).refine(p => !p.includes('..'), 'Path traversal not allowed'),
}, async ({ path: filePath }) => {
  const safePath = validatePath(filePath);
  try {
    const content = await fs.readFile(safePath, 'utf8');
    const lines = content.split('\n').length;
    return { content: [{ type: 'text', text: `// ${safePath} (${lines} lines)\n${content}` }] };
  } catch (err) {
    return { content: [{ type: 'text', text: `Cannot read file: ${err.message}` }], isError: true };
  }
});

// Tool: List directory
server.tool('list_directory', {
  path: z.string().min(1).max(512),
  recursive: z.boolean().default(false),
}, async ({ path: dirPath, recursive }) => {
  const safePath = validatePath(dirPath);
  const entries = await listDir(safePath, recursive, 0, []);
  return { content: [{ type: 'text', text: entries.join('\n') }] };
});

async function listDir(dirPath, recursive, depth, results) {
  if (depth > 3) return results;  // Max 3 levels deep
  const entries = await fs.readdir(dirPath, { withFileTypes: true });
  for (const entry of entries) {
    if (entry.name.startsWith('.') || entry.name === 'node_modules') continue;
    const full = path.join(dirPath, entry.name);
    results.push(`${'  '.repeat(depth)}${entry.isDirectory() ? '[DIR] ' : ''}${entry.name}`);
    if (recursive && entry.isDirectory()) await listDir(full, recursive, depth + 1, results);
  }
  return results;
}

// Tool: Search for text in files
server.tool('search_files', {
  directory: z.string(),
  pattern: z.string().max(200),
  file_extension: z.string().optional(),
}, async ({ directory, pattern, file_extension }) => {
  const safePath = validatePath(directory);
  const regex = new RegExp(pattern, 'i');
  const matches = [];
  await searchFiles(safePath, regex, file_extension, matches);
  return { content: [{ type: 'text', text: matches.slice(0, 50).join('\n') || 'No matches found' }] };
});

// Tool: Write file (requires confirmation via elicitation)
server.tool('write_file', {
  path: z.string().min(1).max(512),
  content: z.string().max(100_000),
}, async ({ path: filePath, content }, context) => {
  const safePath = validatePath(filePath);

  // Check if file already exists
  const exists = await fs.access(safePath).then(() => true).catch(() => false);

  if (exists) {
    const confirm = await context.elicit(
      `This will overwrite '${safePath}'. Confirm?`,
      { type: 'object', properties: { confirm: { type: 'boolean' } } }
    );
    if (!confirm.content?.confirm) {
      return { content: [{ type: 'text', text: 'Write cancelled.' }] };
    }
  }

  await fs.mkdir(path.dirname(safePath), { recursive: true });
  await fs.writeFile(safePath, content, 'utf8');
  return { content: [{ type: 'text', text: `Written: ${safePath}` }] };
});

const transport = new StdioServerTransport();
await server.connect(transport);

Filesystem tools read_file list_directory search_files write_file with path validation roots check dark — Four filesystem tools with layered safety: roots validation, path sanitization, and elicitation for writes.

The layered validation here is worth studying. The Zod schema rejects path traversal (..) at the input level, validatePath enforces the roots boundary, and the write_file tool adds elicitation as a final gate. Each layer catches different attack vectors: malicious input, logic bugs, and unintended overwrites. Removing any single layer would leave a real gap.

If no roots are configured, every operation fails immediately. This is a deliberate fail-closed design. In production, you never want a misconfiguration to silently grant full filesystem access – it is far safer to break loudly than to expose /etc/passwd because someone forgot to set the project root.

The Claude Filesystem Agent

// agent/fs-agent.js
import Anthropic from '@anthropic-ai/sdk';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const anthropic = new Anthropic();

export async function createFilesystemAgent(projectRoot) {
  const transport = new StdioClientTransport({
    command: 'node',
    args: ['./servers/fs-server.js'],
    env: { ...process.env },
  });
  const mcp = new Client({
    name: 'fs-agent',
    version: '1.0.0',
    capabilities: { roots: { listChanged: true } },  // Declare roots support
  });
  await mcp.connect(transport);

  // Set the allowed root to the project directory
  // (roots are set by the client, enforced by the server)
  console.log(`Filesystem agent initialized. Root: ${projectRoot}`);

  const { tools: mcpTools } = await mcp.listTools();
  const tools = mcpTools.map(t => ({
    name: t.name, description: t.description, input_schema: t.inputSchema,
  }));

  return {
    async ask(question) {
      const messages = [{ role: 'user', content: question }];
      let turns = 0;

      while (true) {
        const response = await anthropic.messages.create({
          model: 'claude-3-7-sonnet-20250219',
          max_tokens: 4096,
          system: `You are a codebase assistant. The project root is ${projectRoot}.
Use read_file to examine files, list_directory to explore structure, search_files to find code.
Only use write_file when explicitly asked to modify files.`,
          messages,
          tools,
        });
        messages.push({ role: 'assistant', content: response.content });

        if (response.stop_reason !== 'tool_use') {
          return response.content.filter(b => b.type === 'text').map(b => b.text).join('');
        }

        if (++turns > 15) throw new Error('Max turns exceeded');

        const toolResults = await Promise.all(
          response.content.filter(b => b.type === 'tool_use').map(async block => {
            const result = await mcp.callTool({ name: block.name, arguments: block.input });
            const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
            return { type: 'tool_result', tool_use_id: block.id, content: text };
          })
        );
        messages.push({ role: 'user', content: toolResults });
      }
    },
    async close() { await mcp.close(); },
  };
}

This agent pattern is the same one powering tools like Cursor, Windsurf, and Claude Code. A model reads your files, understands the structure, and proposes edits – but the human confirms destructive writes. The elicitation step in write_file is what separates a helpful assistant from a dangerous one.

One subtle risk: the search_files tool returns up to 50 matches, and large codebases could easily produce hundreds. If the model receives all 50 results in a single tool response, that burns a significant chunk of the context window. Consider adding pagination or relevance ranking if you deploy this against a large repository.

What to Extend

Add a run_tests tool that executes node --test and returns the output – the agent can then read failing test files and suggest fixes.
Add Claude’s extended thinking for architectural analysis queries (Lesson 21 pattern).
Add the prompt caching pattern from Lesson 23 to cache the system prompt for long analysis sessions.

nJoy 😉

Lesson 51 of 55 (Capstone): PostgreSQL Query Agent With OpenAI and MCP

Posted on March 19, 2026March 22, 2026 by David Saliba

This capstone project builds a complete, production-ready PostgreSQL query agent using OpenAI GPT-4o and MCP. By the end you will have a fully functional system where a user can ask questions in natural language and the agent translates them to safe, parameterized SQL queries, executes them against a real PostgreSQL database, formats the results, and explains its reasoning. The project incorporates lessons from throughout the course: schema validation, tool safety, audit logging, retry logic, and graceful shutdown.

PostgreSQL query agent architecture diagram OpenAI GPT-4o MCP server database tools natural language SQL dark — The database query agent: user asks a question, GPT-4o plans SQL queries, MCP tools execute them safely.

Project Structure

mcp-db-agent/
├── package.json         (type: module, node 22+)
├── .env                 (DATABASE_URL, OPENAI_API_KEY)
├── servers/
│   └── db-server.js     (MCP server with database tools)
├── agent/
│   └── query-agent.js   (OpenAI + MCP client loop)
├── lib/
│   ├── db.js            (PostgreSQL connection pool)
│   ├── audit.js         (Audit logger)
│   └── safety.js        (SQL safety checks)
└── index.js             (CLI entry point)

The MCP Database Server

// servers/db-server.js
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';
import pg from 'pg';

const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
const server = new McpServer({ name: 'db-server', version: '1.0.0' });

// Tool 1: List available tables
server.tool('list_tables', {}, async () => {
  const { rows } = await pool.query(
    "SELECT table_name, table_type FROM information_schema.tables WHERE table_schema = 'public' ORDER BY table_name"
  );
  return { content: [{ type: 'text', text: JSON.stringify(rows) }] };
});

// Tool 2: Describe a table's columns
server.tool('describe_table', {
  table_name: z.string().regex(/^[a-zA-Z_][a-zA-Z0-9_]*$/, 'Invalid table name'),
}, async ({ table_name }) => {
  const { rows } = await pool.query(
    'SELECT column_name, data_type, is_nullable, column_default FROM information_schema.columns WHERE table_schema = $1 AND table_name = $2 ORDER BY ordinal_position',
    ['public', table_name]
  );
  if (rows.length === 0) {
    return { content: [{ type: 'text', text: `Table '${table_name}' not found` }], isError: true };
  }
  return { content: [{ type: 'text', text: JSON.stringify(rows) }] };
});

// Tool 3: Execute a read-only query (SELECT only)
server.tool('execute_query', {
  sql: z.string().max(2000),
  params: z.array(z.union([z.string(), z.number(), z.null()])).max(20).default([]),
}, async ({ sql, params }) => {
  // Safety check: only allow SELECT statements
  const normalizedSql = sql.trim().toUpperCase();
  if (!normalizedSql.startsWith('SELECT') && !normalizedSql.startsWith('WITH')) {
    return { content: [{ type: 'text', text: 'Only SELECT queries are allowed' }], isError: true };
  }

  // Forbid dangerous keywords
  const dangerous = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE', 'GRANT', 'REVOKE'];
  if (dangerous.some(kw => normalizedSql.includes(kw))) {
    return { content: [{ type: 'text', text: 'Query contains forbidden keywords' }], isError: true };
  }

  try {
    const { rows, rowCount } = await pool.query(sql, params);
    return {
      content: [{ type: 'text', text: JSON.stringify({ rows: rows.slice(0, 100), total: rowCount, truncated: rowCount > 100 }) }],
    };
  } catch (err) {
    return { content: [{ type: 'text', text: `Query failed: ${err.message}` }], isError: true };
  }
});

// Tool 4: Get row count (for planning queries)
server.tool('count_rows', {
  table_name: z.string().regex(/^[a-zA-Z_][a-zA-Z0-9_]*$/),
  where_clause: z.string().max(500).optional(),
}, async ({ table_name, where_clause }) => {
  const sql = where_clause
    ? `SELECT COUNT(*) as count FROM ${table_name} WHERE ${where_clause}`
    : `SELECT COUNT(*) as count FROM ${table_name}`;
  const { rows } = await pool.query(sql);
  return { content: [{ type: 'text', text: JSON.stringify(rows[0]) }] };
});

const transport = new StdioServerTransport();
await server.connect(transport);

Four database MCP tools list_tables describe_table execute_query count_rows with safety validation dark — Four tools: schema discovery (list, describe), safe query execution, and row counting for query planning.

In practice, this four-tool design is intentional: it mirrors how a careful human analyst works. Rather than handing the model a single “run any SQL” tool, you force it through a discovery workflow – list tables, inspect columns, then query. This staged approach dramatically reduces hallucinated column names and malformed joins because the model sees the real schema before writing SQL.

Watch the safety check in execute_query closely. The keyword blocklist approach is simple but brittle – a query like SELECT * FROM updates would be rejected because “UPDATE” appears in the table name. In a production system, you would use a proper SQL parser or run queries through a read-only database user instead of string matching.

The OpenAI Query Agent

// agent/query-agent.js
import OpenAI from 'openai';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const openai = new OpenAI();

export async function createQueryAgent() {
  const transport = new StdioClientTransport({
    command: 'node',
    args: ['./servers/db-server.js'],
    env: { ...process.env },
  });
  const mcp = new Client({ name: 'query-agent', version: '1.0.0' });
  await mcp.connect(transport);
  const { tools: mcpTools } = await mcp.listTools();

  const tools = mcpTools.map(t => ({
    type: 'function',
    function: { name: t.name, description: t.description, parameters: t.inputSchema, strict: true },
  }));

  const SYSTEM_PROMPT = `You are a precise database analyst.
You have access to a PostgreSQL database. To answer questions:
1. First call list_tables to see available tables
2. Call describe_table for tables relevant to the question
3. Plan a safe SELECT query (use parameters for any user values)
4. Call execute_query with the query and parameters
5. Present results clearly with a brief interpretation

Always use parameterized queries. Never build SQL by string concatenation.
If a question cannot be answered with a SELECT, say so clearly.`;

  return {
    async query(userQuestion) {
      const messages = [
        { role: 'system', content: SYSTEM_PROMPT },
        { role: 'user', content: userQuestion },
      ];
      let turns = 0;

      while (true) {
        const response = await openai.chat.completions.create({
          model: 'gpt-4o', messages, tools, tool_choice: 'auto',
        });
        const msg = response.choices[0].message;
        messages.push(msg);

        if (msg.finish_reason !== 'tool_calls') {
          return msg.content;
        }

        if (++turns > 10) throw new Error('Max turns exceeded');

        const results = await Promise.all(msg.tool_calls.map(async tc => {
          const args = JSON.parse(tc.function.arguments);
          const result = await mcp.callTool({ name: tc.function.name, arguments: args });
          const text = result.content.filter(c => c.type === 'text').map(c => c.text).join('\n');
          return { role: 'tool', tool_call_id: tc.id, content: text };
        }));
        messages.push(...results);
      }
    },
    async close() { await mcp.close(); },
  };
}

The agent loop here follows the same pattern you have seen throughout the course, but notice the turn cap of 10. Without it, a confusing question could cause the model to loop indefinitely – calling tools, misinterpreting results, and calling more tools. In a billing-sensitive environment, a runaway loop like that translates directly into unexpected API costs.

Running the Agent

// index.js
import { createQueryAgent } from './agent/query-agent.js';
import readline from 'node:readline';

const agent = await createQueryAgent();
const rl = readline.createInterface({ input: process.stdin, output: process.stdout });

console.log('PostgreSQL Query Agent ready. Ask anything about your data.');
console.log('Type "exit" to quit.\n');

rl.on('line', async (line) => {
  if (line.trim() === 'exit') { await agent.close(); process.exit(0); }
  if (!line.trim()) return;
  try {
    const answer = await agent.query(line);
    console.log('\n' + answer + '\n');
  } catch (err) {
    console.error('Error:', err.message);
  }
});

Teams commonly deploy this exact pattern as an internal analytics bot on Slack or Teams. A support engineer asks “how many orders shipped last week from warehouse 3?” and gets an answer in seconds, without needing SQL skills or database access credentials. The read-only constraint means the bot is safe to hand to non-technical staff.

What to Extend

Add the audit logging middleware from Lesson 35 to log every execute_query call with the SQL, user, and result count.
Add a sample_rows tool that returns 3 rows from any table – helps the model understand data format before writing queries.
Connect it to your real production database with a read-only service account.

nJoy 😉

Lesson 50 of 55: Custom MCP Transports and Protocol Extensions in Node.js

Posted on March 19, 2026March 22, 2026 by David Saliba

The MCP SDK ships with two built-in transports: stdio and Streamable HTTP. These cover the vast majority of use cases. But sometimes you need something different: an in-process transport for testing, a WebSocket transport for browser environments, an IPC transport for Electron apps, or a transport that encrypts the JSON-RPC stream at the application layer. The SDK’s transport interface is deliberately minimal, making it straightforward to implement your own. This lesson covers the interface, two reference implementations, and practical extension points.

MCP custom transport interface diagram showing Transport interface implementations InProcess WebSocket IPC dark — The Transport interface is three methods: start, send, and close. Any communication channel can become an MCP transport.

The Transport Interface

// The MCP SDK Transport interface (TypeScript definition for reference)
// interface Transport {
//   start(): Promise;
//   send(message: JSONRPCMessage): Promise;
//   close(): Promise;
//   onmessage?: (message: JSONRPCMessage) => void;
//   onerror?: (error: Error) => void;
//   onclose?: () => void;
// }

// In JavaScript, implement the same shape:
class CustomTransport {
  onmessage = null;   // Called when a message is received
  onerror = null;     // Called on transport errors
  onclose = null;     // Called when the transport closes

  async start() {
    // Initialize the underlying communication channel
  }

  async send(message) {
    // Send a JSONRPCMessage object
  }

  async close() {
    // Clean up the channel
  }
}

The interface is intentionally minimal: three async methods and three event callbacks. This simplicity is the point. Any communication channel that can send and receive JSON objects – WebSockets, Unix domain sockets, shared memory, even a pair of browser MessageChannels – can become an MCP transport by implementing these six members.

In-Process Transport for Testing

An in-process transport connects a client directly to a server in the same Node.js process. Essential for integration tests without spawning subprocesses:

// in-process-transport.js

export function createInProcessTransport() {
  let clientTransport, serverTransport;

  clientTransport = {
    onmessage: null, onerror: null, onclose: null,
    async start() {},
    async send(msg) {
      // Route to server
      if (serverTransport.onmessage) serverTransport.onmessage(msg);
    },
    async close() {
      if (clientTransport.onclose) clientTransport.onclose();
      if (serverTransport.onclose) serverTransport.onclose();
    },
  };

  serverTransport = {
    onmessage: null, onerror: null, onclose: null,
    async start() {},
    async send(msg) {
      // Route to client
      if (clientTransport.onmessage) clientTransport.onmessage(msg);
    },
    async close() {
      if (clientTransport.onclose) clientTransport.onclose();
      if (serverTransport.onclose) serverTransport.onclose();
    },
  };

  return { clientTransport, serverTransport };
}

// Usage in tests:
import { test } from 'node:test';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { createInProcessTransport } from './in-process-transport.js';

test('in-process round trip', async (t) => {
  const { clientTransport, serverTransport } = createInProcessTransport();
  const server = buildServer();
  const client = new Client({ name: 'test', version: '1.0.0' });

  await server.connect(serverTransport);
  await client.connect(clientTransport);

  const { tools } = await client.listTools();
  assert.ok(tools.length > 0);

  await client.close();
});

This in-process transport eliminates the main pain point of MCP integration tests: subprocess management. No ports to allocate, no processes to spawn and kill, no race conditions between server startup and client connection. Tests using this pattern typically run 10-50x faster than their subprocess equivalents.

In-process transport diagram client and server connected directly in same process for testing no network dark — In-process transport: no network, no subprocess, instant round trip – ideal for unit and integration testing.

WebSocket Transport

npm install ws

// websocket-transport.js - client side
import WebSocket from 'ws';

export class WebSocketClientTransport {
  #url;
  #ws = null;
  onmessage = null;
  onerror = null;
  onclose = null;

  constructor(url) {
    this.#url = url;
  }

  async start() {
    return new Promise((resolve, reject) => {
      this.#ws = new WebSocket(this.#url);
      this.#ws.once('open', resolve);
      this.#ws.once('error', reject);
      this.#ws.on('message', (data) => {
        try {
          const msg = JSON.parse(data.toString());
          if (this.onmessage) this.onmessage(msg);
        } catch (err) {
          if (this.onerror) this.onerror(err);
        }
      });
      this.#ws.on('close', () => {
        if (this.onclose) this.onclose();
      });
      this.#ws.on('error', (err) => {
        if (this.onerror) this.onerror(err);
      });
    });
  }

  async send(message) {
    this.#ws.send(JSON.stringify(message));
  }

  async close() {
    this.#ws?.close();
  }
}

// WebSocket server transport
export class WebSocketServerTransport {
  #socket;
  onmessage = null;
  onerror = null;
  onclose = null;

  constructor(socket) {
    this.#socket = socket;
    socket.on('message', (data) => {
      try {
        const msg = JSON.parse(data.toString());
        if (this.onmessage) this.onmessage(msg);
      } catch (err) {
        if (this.onerror) this.onerror(err);
      }
    });
    socket.on('close', () => {
      if (this.onclose) this.onclose();
    });
  }

  async start() {}

  async send(message) {
    this.#socket.send(JSON.stringify(message));
  }

  async close() {
    this.#socket.close();
  }
}

// Server side: wrap ws.WebSocketServer
import { WebSocketServer } from 'ws';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';

const wss = new WebSocketServer({ port: 9000 });
wss.on('connection', async (socket) => {
  const transport = new WebSocketServerTransport(socket);
  const server = buildMcpServer();
  await server.connect(transport);
});

WebSocket transport is the natural choice when your MCP client runs in a browser. Unlike Streamable HTTP, which requires the client to open new connections for each request, a WebSocket keeps a single persistent bidirectional channel open. The trade-off is that WebSocket connections are harder to load-balance (no standard sticky-session mechanism) and are not part of the official MCP spec, so you take on compatibility risk.

Protocol Extensions: Custom Methods

Beyond custom transports, MCP’s JSON-RPC foundation lets you add entirely new methods outside the spec. Prefixing them with your company namespace (like com.mycompany/) avoids collisions with future spec additions. This is useful for operational tooling – metrics, health checks, debug endpoints – that your internal clients need but that do not belong in the standard tool/resource model.

// MCP allows custom methods beyond the spec - they are prefixed with your namespace
// Use this for proprietary extensions that are specific to your deployment

// Server side: handle a custom method
server.server.setRequestHandler(
  { method: 'com.mycompany/getServerMetrics' },
  async (request) => {
    return {
      uptime: process.uptime(),
      activeSessions: sessionStore.size,
      memoryMB: Math.round(process.memoryUsage().heapUsed / 1024 / 1024),
    };
  }
);

// Client side: call a custom method
const metrics = await client.request(
  { method: 'com.mycompany/getServerMetrics', params: {} },
  /* ResultSchema */ undefined
);

One thing to watch out for with custom methods: they are invisible to standard MCP clients. If you add com.mycompany/getServerMetrics, only clients you control will know it exists. Standard MCP clients will not discover or call these methods via listTools, since they are not tools. Use them for internal operational purposes, not for functionality you expect third-party clients to use.

The extensions Capability Field

New in Draft – This feature is in the Draft spec and may be finalised in a future revision.

The Draft specification adds an extensions field to both ClientCapabilities and ServerCapabilities. This provides a standardised place to advertise optional protocol extensions beyond the core spec, replacing the ad-hoc approach of custom methods and namespaced capabilities.

// Server declaring support for a custom extension during initialization
{
  capabilities: {
    tools: {},
    resources: {},
    extensions: {
      'com.mycompany/streaming-progress': {
        version: '1.0.0',
      },
      'com.mycompany/team-collaboration': {
        version: '2.1.0',
      },
    },
  },
}

// Client checking for extension support
const serverCaps = client.getServerCapabilities();
if (serverCaps?.extensions?.['com.mycompany/streaming-progress']) {
  // Enable the streaming progress UI
}

The extensions field gives custom methods a discoverable surface. Instead of blindly calling com.mycompany/getServerMetrics and hoping it exists, a client can check capabilities.extensions during initialisation and adapt its behaviour. Namespace your extensions with a reverse-domain prefix (like Java packages) to avoid collisions with future spec additions or other vendors.

What to Build Next

Replace subprocess spawning in your integration tests with the in-process transport. Measure the test speedup.
If you have a browser-based MCP client, implement the WebSocket transport and test it against your existing MCP server with a WebSocket adapter.

nJoy 😉

Lesson 49 of 55: MCP Protocol Versioning, Compatibility, and Migration

Posted on March 19, 2026March 22, 2026 by David Saliba

The MCP specification evolves. New capabilities are added; some older mechanisms are deprecated; breaking changes occasionally ship. Building MCP servers that handle protocol version negotiation correctly means your clients and servers can interoperate across version boundaries without hard dependencies on a single spec revision. This lesson covers how MCP versioning works, how to negotiate capabilities with older clients, how to write migration guides when your own server schema changes, and the stability guarantees you can rely on from Anthropic.

MCP protocol versioning negotiation diagram client offering versions server selecting compatible version dark — MCP version negotiation: client offers supported versions, server selects the best match.

How MCP Protocol Versioning Works

MCP uses date-stamped version strings like 2024-11-05 or 2025-11-25. During initialization, the client sends the version it wants, and the server responds with the version it will use (typically the same, or the highest version both sides support). If they cannot agree, the connection fails at initialization.

// Initialization exchange (JSON-RPC)
// Client sends:
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2025-11-25",
    "clientInfo": { "name": "my-client", "version": "2.0.0" },
    "capabilities": { "sampling": {}, "elicitation": {} }
  }
}

// Server responds with the version it accepts:
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2025-11-25",
    "serverInfo": { "name": "my-server", "version": "1.5.0" },
    "capabilities": { "tools": {}, "resources": {}, "prompts": {} }
  }
}

// The @modelcontextprotocol/sdk handles version negotiation automatically
// You do not need to implement it manually

// To check the negotiated version in your server:
server.server.oninitialized = () => {
  const version = server.server.negotiatedProtocolVersion;
  console.log(`MCP session initialized with protocol version: ${version}`);
};

In practice, you rarely implement version negotiation by hand – the SDK handles it for you. The important thing is understanding what happens under the hood: if a client sends a version your server’s SDK does not support, the connection fails at initialization with a clear error. Logging the negotiated version on startup (as shown above) helps you quickly diagnose “why can’t this client connect?” issues in production.

Feature Detection (Capability Negotiation)

// Check if the connected client supports a specific capability
// before using it in your server code

server.server.oninitialized = () => {
  const clientCaps = server.server.getClientCapabilities();

  const supportsElicitation = !!clientCaps?.elicitation;
  const supportsSampling = !!clientCaps?.sampling;
  const supportsRoots = !!clientCaps?.roots;

  console.log(`Client capabilities: elicitation=${supportsElicitation} sampling=${supportsSampling} roots=${supportsRoots}`);

  if (!supportsElicitation) {
    // Fall back to returning instructions in tool result instead of interactive elicitation
    console.warn('Client does not support elicitation - using text fallback');
  }
};

This matters in real deployments because not all MCP clients are equal. Claude Desktop supports elicitation and sampling, but a custom CLI client you built might not. If your server blindly calls server.createElicitation() against a client that did not declare the capability, the request will fail. Checking capabilities first and providing a text-based fallback keeps your server compatible with the broadest range of clients.

Capability negotiation table client declares capabilities server checks before using elicitation sampling roots dark — Always check client capabilities before using server-initiated features like elicitation or sampling.

Migrating Your Tool Schema

When you change a tool’s input schema, existing clients that have cached the old schema will break. Follow a compatibility-first migration process:

// Backwards-compatible schema evolution: add optional fields, never remove required ones

// Version 1 schema (existing clients use this)
// search_products: { query: z.string(), limit: z.number().optional().default(10) }

// Version 2: add optional 'category' filter without breaking v1 clients
server.tool('search_products', {
  query: z.string(),
  limit: z.number().optional().default(10),
  category: z.string().optional(),           // New optional field - backwards compatible
  // NEVER remove or rename 'query' or 'limit' - that breaks v1 clients
  // NEVER make an optional field required - that also breaks v1 clients
}, handler);

// Breaking change strategy: add a versioned tool name during transition
// Phase 1: add new tool alongside old one
server.tool('search_products_v2', {
  query: z.string(),
  limit: z.number().optional().default(10),
  filters: z.object({  // New required field - would break v1 if added to original
    category: z.string().optional(),
    priceMax: z.number().optional(),
    inStock: z.boolean().optional().default(true),
  }),
}, handler);

// Phase 2: deprecate old tool via description
// server.tool('search_products', ... 
//   description: 'DEPRECATED: use search_products_v2 instead'

// Phase 3 (after client migration window): remove old tool

The biggest gotcha with schema migration is that LLM clients cache tool definitions. Even after you update the server, an agent might still send arguments matching the old schema until it re-fetches the tool list. Making new fields optional (or using versioned tool names) ensures that stale cached schemas do not cause hard failures during the transition window.

Version Compatibility Matrix

The MCP specification has gone through four published revisions. Each is backwards-incompatible with the previous, which is why the date changes. A Draft version tracks work-in-progress changes that have not yet shipped.

MCP Spec Version	Status	Key Features Added
`2024-11-05`	Final	Initial release: tools, resources, prompts, sampling, stdio transport, HTTP+SSE transport
`2025-03-26`	Final	OAuth 2.1 authorization framework, Streamable HTTP transport (replaces HTTP+SSE), tool annotations (`destructiveHint`, `readOnlyHint`, etc.), JSON-RPC batching, audio content type, `completions` capability
`2025-06-18`	Final	Elicitation (server asks user for input), structured tool output, resource links in tool results, removed JSON-RPC batching, OAuth resource server classification (RFC 8707), `MCP-Protocol-Version` header required on HTTP, `title` field for human-friendly names
`2025-11-25`	Current	Experimental tasks API (durable request tracking), OAuth Client ID Metadata Documents, tool calling in sampling requests, URL mode elicitation, enhanced authorization with incremental scope consent, icon metadata for tools/resources/prompts, OpenID Connect Discovery support, SSE polling
`Draft`	Draft	Work in progress: `extensions` field on capabilities, OpenTelemetry trace context propagation in `_meta`, SEP workflow formalisation. Do not target Draft in production.

The version jumps tell you something important: 2025-03-26 shipped tool annotations and a new transport. 2025-06-18 then removed JSON-RPC batching that 2025-03-26 had just added – proof that the spec is willing to walk back decisions quickly. Always check the changelog between your current version and the target version before upgrading.

Stability Guarantees

With four published spec revisions in roughly 18 months, a reasonable question is: what can I actually depend on? The list below separates the stable foundations from the parts that have already changed between versions.

JSON-RPC 2.0 wire format: Stable. Will not change between spec versions.
Core methods (initialize, tools/call, resources/read, prompts/get): Stable across all versions.
New capabilities: Always added as optional; never required for a functional server.
Removals: Features can be removed between versions (JSON-RPC batching was added in 2025-03-26 and removed in 2025-06-18). Pin your protocol version in production.
SDK APIs: The TypeScript/JavaScript SDK minor versions maintain backwards compatibility; only major versions may include breaking changes.

2026 Roadmap Priorities

2026 Roadmap (blog.modelcontextprotocol.io)

The MCP project published a 2026 roadmap organised around Working Group priorities rather than fixed release dates. The two highest-priority areas reflect production deployment needs:

Transport Evolution and Scalability: Addressing gaps in Streamable HTTP for production deployments. Focus areas include horizontal scaling without server-side state holding, standard session handling mechanisms, and a .well-known metadata format for server capability discovery. The goal is to keep the set of official transports small (a core MCP principle) while making them production-ready for enterprise-scale clusters.
Agent Communication: Expanding the experimental Tasks primitive with lifecycle improvements including retry semantics for transient failures, expiry policies for task results, and better integration with multi-agent orchestration patterns. This builds directly on the Tasks API introduced in 2025-11-25.

The shift from date-driven releases to Working Group-driven priorities signals that MCP is entering a production-hardening phase. For course readers: pin to 2025-11-25 in production, watch the roadmap for transport and tasks changes, and participate in Working Groups if you want to shape the next spec revision.

What to Build Next

Add a server://version resource to your MCP server that returns the current protocol version, SDK version, and your tool schema versions. Update it on every release.
Review your most-used tools for any fields that are currently optional but should be made required. Use the v2 naming strategy to transition safely.

nJoy 😉

Lesson 48 of 55: Cancellation, Progress, and Backpressure in MCP Streams

Posted on March 19, 2026March 22, 2026 by David Saliba

Streaming responses, long-running tools, and multi-step agent pipelines all share a common challenge: what happens when the client stops listening? Without proper cancellation propagation, cancelled client connections leave expensive operations running on the server indefinitely. This lesson covers three related mechanisms: request cancellation using AbortSignal, progress reporting with real-time updates, and backpressure strategies that prevent fast producers from overwhelming slow consumers.

Cancellation propagation diagram client disconnect AbortSignal tool cleanup chain stop resource release dark — Cancellation must propagate through the entire call chain: from client disconnect to every active resource.

AbortSignal in MCP Tool Handlers

When a client disconnects or cancels a request, the MCP SDK calls server.setRequestHandler‘s signal. Tool handlers should check this signal and abort expensive operations:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({ name: 'streaming-server', version: '1.0.0' });

server.tool('search_large_dataset', {
  query: z.string(),
  maxResults: z.number().default(100),
}, async ({ query, maxResults }, { signal }) => {
  // Pass signal to database query
  const results = await db.search(query, { maxResults, signal });

  // Pass signal to downstream HTTP calls
  const enriched = await Promise.all(
    results.map(r =>
      fetch(`https://enrichment.api/v1/${r.id}`, { signal })
        .then(res => res.json())
        .catch(err => {
          if (err.name === 'AbortError') throw err;  // Re-throw cancellation
          return r;  // Return unenriched on other errors
        })
    )
  );

  return { content: [{ type: 'text', text: JSON.stringify(enriched) }] };
});

The key insight here is that signal must be threaded through every async boundary. If you pass it to fetch but not to your database query, a cancelled request still hammers the database while the HTTP calls abort cleanly. Every layer of your call stack that does I/O should receive and respect the signal.

// In database clients that support AbortSignal
async function search(query, { maxResults, signal } = {}) {
  const client = await pool.connect();

  // Register cleanup on signal abort
  const cleanup = () => {
    client.query('SELECT pg_cancel_backend(pg_backend_pid())').catch(() => {});
    client.release();
  };

  signal?.addEventListener('abort', cleanup, { once: true });

  try {
    const result = await client.query(
      'SELECT * FROM products WHERE to_tsvector(description) @@ plainto_tsquery($1) LIMIT $2',
      [query, maxResults]
    );
    return result.rows;
  } finally {
    signal?.removeEventListener('abort', cleanup);
    client.release();
  }
}

A common mistake in the database cleanup pattern above is forgetting to call removeEventListener in the finally block. Without it, a completed query still has a dangling abort listener that fires if the signal is aborted later in the request lifecycle, potentially cancelling an already-released connection and corrupting your connection pool state.

Progress Reporting via Streaming Tool Results

Cancellation handles the “stop” case. Progress reporting handles the “how far along are we?” case. For batch operations that take minutes, silence from the server is indistinguishable from a hang. Emitting periodic progress via sendLoggingMessage lets monitoring tools and MCP Inspector show real-time status without changing the tool’s return type.

// MCP tools can emit progress events using the server's notification mechanism
// For now, progress is communicated via the task polling pattern from Lesson 45
// or via streaming text content updates

server.tool('process_batch', {
  items: z.array(z.string()).max(1000),
}, async ({ items }, { signal }) => {
  const results = [];
  const total = items.length;

  for (let i = 0; i < items.length; i++) {
    if (signal?.aborted) {
      return {
        content: [{ type: 'text', text: JSON.stringify({
          status: 'cancelled',
          processed: i,
          total,
          results,
        }) }],
      };
    }

    const result = await processItem(items[i]);
    results.push(result);

    // Emit progress via logs/notification (visible in MCP Inspector)
    if (i % 50 === 0) {
      server.server.sendLoggingMessage({
        level: 'info',
        data: `Progress: ${i + 1}/${total} (${Math.round(((i + 1) / total) * 100)}%)`,
      });
    }
  }

  return { content: [{ type: 'text', text: JSON.stringify({ status: 'complete', results }) }] };
});

Progress reporting pattern batch processing loop checking AbortSignal emitting log notifications at intervals dark — Batch tools check the AbortSignal on each iteration and emit progress via logging notifications.

Backpressure in Streaming Tool Results

Progress tells the client what is happening. Backpressure prevents the server from producing data faster than the client can consume it. Without backpressure, a tool that streams thousands of log lines can exhaust server memory buffering unsent data while the client processes earlier chunks. The generator pattern below yields control between batches, giving the runtime a chance to drain the outbound buffer.

// When a tool generates large amounts of streaming data,
// use a ReadableStream with backpressure control

server.tool('stream_logs', {
  service: z.string(),
  since: z.string(),
}, async ({ service, since }, { signal }) => {
  // Generator-based streaming with backpressure
  async function* generateLogs() {
    const logStream = await getLiveLogStream(service, since, { signal });
    let buffer = [];

    for await (const logLine of logStream) {
      if (signal?.aborted) break;
      buffer.push(logLine);

      // Yield batches of 50 lines to avoid overwhelming the response
      if (buffer.length >= 50) {
        yield buffer.join('\n');
        buffer = [];
        // Yield control to allow backpressure to work
        await new Promise(r => setImmediate(r));
      }
    }

    if (buffer.length > 0) yield buffer.join('\n');
  }

  // Collect all chunks (in practice, return first N lines for tool calls)
  const chunks = [];
  let totalLines = 0;

  for await (const chunk of generateLogs()) {
    chunks.push(chunk);
    totalLines += chunk.split('\n').length;
    if (totalLines > 500) {
      chunks.push('[...truncated, 500 line limit reached]');
      break;
    }
  }

  return { content: [{ type: 'text', text: chunks.join('\n') }] };
});

Handling SSE Client Disconnections

All of the above - cancellation, progress, backpressure - ultimately depends on detecting when the client is gone. For Streamable HTTP servers, the browser or HTTP client closing the connection triggers a socket close event. The code below wires that event to an AbortController, which the SDK then propagates to your tool handlers automatically.

// For Streamable HTTP servers, detect client disconnections via res.on('close')
app.post('/mcp', async (req, res) => {
  const transport = getOrCreateTransport(req);

  // Create an AbortController for this connection
  const controller = new AbortController();
  req.socket.on('close', () => controller.abort());

  // Pass the signal to the MCP transport (SDK handles propagation to tool handlers)
  await transport.handleRequest(req, res, req.body, { signal: controller.signal });
});

What to Build Next

Add signal?.addEventListener('abort', cleanup) to your longest-running tool handler. Test it by disconnecting the client mid-execution and verify resources are released.
Add a per-tool timeout using AbortSignal.timeout(ms) to prevent any single tool call from running indefinitely.

nJoy 😉

Lesson 47 of 55: MCP Tasks API – Long-Running Async Operations and Progress

Posted on March 19, 2026March 22, 2026 by David Saliba

New in 2025-11-25 (experimental) – The Tasks API replaced older DIY polling patterns with a protocol-level state machine. The entire feature is experimental and may evolve in future spec versions.

Most MCP tool calls complete in under a second: query a database, call an API, read a file. But some operations take minutes or hours: training a model, processing a large dataset, running a batch export, triggering a CI/CD pipeline. For these, a synchronous request-response model breaks down. The 2025-11-25 specification introduced the Tasks API – a protocol-level mechanism for durable, async request tracking. Instead of inventing your own “start_task + poll get_task_status” pattern (which every server implemented differently), the Tasks API provides a standard state machine, standard polling endpoints (tasks/get, tasks/list, tasks/cancel, tasks/result), and per-tool opt-in via execution.taskSupport.

MCP Tasks API async operation diagram task submitted accepted polling progress events completion dark — Tasks API: augment a tools/call request with a task, poll via tasks/get, retrieve the result via tasks/result when done.

When to Use Tasks vs Regular Tools

Use regular tools for operations that complete in under 30 seconds. Keep them synchronous – the LLM waits for the result before proceeding.
Use task-augmented tools for operations that take longer than 30 seconds, produce intermediate results the user or LLM can act on, or may fail partway through and need resumability.

Before the Tasks API, every server had to invent its own polling scheme (two tools, custom status fields, ad-hoc cancellation). The protocol-level approach standardises the state machine and the polling endpoints, so every client handles async the same way regardless of which server it talks to.

Task State Machine

Every task starts in the working state and follows a strict lifecycle. The three terminal states (completed, failed, cancelled) are irreversible – once a task reaches one of them, it cannot transition to any other state.

//  Task Status State Machine
//
//  [created] --> working --+--> completed (terminal)
//                 |    ^   +--> failed    (terminal)
//                 |    |   +--> cancelled (terminal)
//                 v    |
//           input_required
//
//  working --> input_required: server needs client input to proceed
//  input_required --> working: client provided the requested input
//  Any non-terminal --> cancelled: via tasks/cancel

The input_required state is for cases where the task cannot proceed without additional input from the client – for example, the server needs an MFA code or the user must approve an intermediate step. When the client sees input_required, it should call tasks/result to receive the pending request (an elicitation or sampling request), handle it, and allow the task to transition back to working.

Capability Negotiation

Both servers and clients declare task support during initialisation. The capabilities structure is organised by request type – a server declares which of its incoming request types (like tools/call) support task augmentation, and a client declares which of its incoming request types (like sampling/createMessage and elicitation/create) support it.

// Server capabilities: tasks supported for tools/call
{
  capabilities: {
    tasks: {
      list: {},                      // supports tasks/list
      cancel: {},                    // supports tasks/cancel
      requests: {
        tools: { call: {} },         // tools/call can be task-augmented
      },
    },
  },
}

// Client capabilities: tasks supported for sampling and elicitation
{
  capabilities: {
    tasks: {
      list: {},
      cancel: {},
      requests: {
        sampling: { createMessage: {} },    // sampling can be task-augmented
        elicitation: { create: {} },        // elicitation can be task-augmented
      },
    },
  },
}

If a server does not include tasks.requests.tools.call, clients MUST NOT attempt task augmentation on that server’s tools, regardless of per-tool settings.

Tool-Level Task Support

Individual tools declare their task support via execution.taskSupport in the tools/list response. This is a fine-grained layer on top of the server-level capability.

// In the tools/list response, each tool can declare task support
{
  name: 'generate_report',
  title: 'Generate Report',
  description: 'Generates a PDF report from analytics data. May take several minutes.',
  inputSchema: { /* ... */ },
  execution: {
    taskSupport: 'optional',   // 'forbidden' (default) | 'optional' | 'required'
  },
}

"forbidden" (default): the tool cannot be invoked as a task. If a client tries, the server returns error -32601.
"optional": the client may invoke the tool normally (synchronous) or as a task (async). Both work.
"required": the client MUST invoke the tool as a task. Synchronous invocation returns error -32601.

Creating a Task-Augmented Request

To invoke a tool as a task, the client includes a task field in the tools/call params. The server accepts the request immediately and returns a CreateTaskResult containing the task metadata – not the actual tool result.

// Client: send a task-augmented tools/call
const response = await client.request({
  method: 'tools/call',
  params: {
    name: 'generate_report',
    arguments: { reportType: 'quarterly', period: '2025-Q3' },
    task: {
      ttl: 300000,   // requested lifetime: 5 minutes
    },
  },
});

// Response is a CreateTaskResult, not the tool result
// {
//   task: {
//     taskId: '786512e2-9e0d-44bd-8f29-789f320fe840',
//     status: 'working',
//     statusMessage: 'Report generation started.',
//     createdAt: '2025-11-25T10:30:00Z',
//     lastUpdatedAt: '2025-11-25T10:30:00Z',
//     ttl: 300000,
//     pollInterval: 5000,
//   }
// }

const { taskId, pollInterval } = response.task;

The ttl (time-to-live in milliseconds) tells the server how long the client wants the task and its results to be retained. The server may override the requested TTL. After the TTL expires, the server may delete the task and its results regardless of status.

Polling With tasks/get

Clients poll for task status using tasks/get. The server returns the current task state including status, statusMessage, and a pollInterval suggestion. Clients SHOULD respect the pollInterval to avoid overwhelming the server.

// Client: poll until terminal status
async function pollTask(client, taskId, initialInterval = 5000) {
  let interval = initialInterval;

  while (true) {
    await new Promise(r => setTimeout(r, interval));

    const status = await client.request({
      method: 'tasks/get',
      params: { taskId },
    });

    console.log(`Task ${taskId}: ${status.status} - ${status.statusMessage ?? ''}`);

    if (['completed', 'failed', 'cancelled'].includes(status.status)) {
      return status;
    }

    if (status.status === 'input_required') {
      // The server needs input - call tasks/result to get the pending request
      return status;
    }

    // Respect the server's suggested poll interval
    if (status.pollInterval) {
      interval = status.pollInterval;
    }
  }
}

Task status polling pattern client calling tasks/get multiple times watching progress working to completed dark — Polling pattern: send task-augmented tools/call, poll with tasks/get respecting pollInterval, retrieve result with tasks/result.

Retrieving Task Results

Once a task reaches a terminal status, the actual tool result is retrieved via tasks/result. This is distinct from tasks/get (which returns task metadata). The result has the same shape as a normal CallToolResult.

// Client: retrieve the actual tool result
const taskStatus = await pollTask(client, taskId);

if (taskStatus.status === 'completed') {
  const result = await client.request({
    method: 'tasks/result',
    params: { taskId },
  });

  // result is a CallToolResult: { content: [...], isError: false }
  console.log('Report ready:', result.content[0].text);
}

if (taskStatus.status === 'failed') {
  const result = await client.request({
    method: 'tasks/result',
    params: { taskId },
  });
  // result may contain an error description
  console.error('Task failed:', result.content?.[0]?.text);
}

If tasks/result is called while the task is still working, the server MUST block until the task reaches a terminal status and then return the result. This makes tasks/result a long-poll alternative to repeated tasks/get calls. However, clients SHOULD still poll with tasks/get in parallel if they want to display progress updates.

Listing and Cancelling Tasks

// List all tasks (paginated)
const listing = await client.request({
  method: 'tasks/list',
  params: { cursor: undefined },  // or a cursor from a previous response
});
// listing.tasks: array of Task objects
// listing.nextCursor: pagination token (if more tasks exist)

// Cancel a running task
try {
  const cancelled = await client.request({
    method: 'tasks/cancel',
    params: { taskId },
  });
  console.log(`Cancelled: ${cancelled.status}`);  // 'cancelled'
} catch (err) {
  // Error -32602 if the task is already in a terminal state
  console.error('Cannot cancel:', err.message);
}

Cancellation transitions the task to the cancelled terminal state. The server SHOULD attempt to stop the underlying work, but the task MUST be marked cancelled even if the underlying computation continues to run (best-effort cancellation). Clients SHOULD NOT rely on cancelled tasks being retained – retrieve any needed data before cancelling.

Status Notifications

Servers MAY send notifications/tasks/status when a task’s status changes. These are a convenience – clients MUST NOT rely on them for correctness, because notifications are optional and may be dropped. Always poll with tasks/get as the source of truth.

// Server: optionally notify the client of status changes
server.notification({
  method: 'notifications/tasks/status',
  params: {
    taskId: '786512e2-...',
    status: 'completed',
    statusMessage: 'Report generation finished.',
    createdAt: '2025-11-25T10:30:00Z',
    lastUpdatedAt: '2025-11-25T10:35:00Z',
    ttl: 300000,
    pollInterval: 5000,
  },
});

Client-Side Tasks: Sampling and Elicitation

Tasks are not server-only. Servers can also send task-augmented requests to the client for sampling/createMessage and elicitation/create. This is useful when a server initiates a sampling request that might take a long time (the client is calling an LLM), or an elicitation that requires the user to complete an out-of-band flow.

The pattern mirrors the server side: the server sends the request with a task field, the client accepts immediately with a CreateTaskResult, and the server polls the client’s tasks/get and tasks/result endpoints. The client declares which request types support this in its capabilities under tasks.requests.sampling.createMessage and tasks.requests.elicitation.create.

Task Metadata: Related Tasks

All requests, notifications, and responses related to a task MUST include io.modelcontextprotocol/related-task in their _meta field. This links sub-operations (like an elicitation triggered during a task-augmented tool call) back to the parent task.

// Elicitation triggered during a task-augmented tool call
// The _meta links it to the parent task
{
  method: 'elicitation/create',
  params: {
    message: 'Enter the MFA code to continue the deployment.',
    requestedSchema: { /* ... */ },
    _meta: {
      'io.modelcontextprotocol/related-task': {
        taskId: '786512e2-9e0d-44bd-8f29-789f320fe840',
      },
    },
  },
}

Server-Side Implementation Pattern

The SDK does not yet have high-level helpers for the Tasks API (it is experimental). In practice, you implement it by managing a task store, intercepting tool calls that include a task field, and exposing the tasks/* methods. Production systems should use Redis or a database so task state survives server restarts.

import crypto from 'node:crypto';

const taskStore = new Map();

// When a tools/call includes params.task, create a task entry and return immediately
function createTask(ttl = 60000) {
  const task = {
    taskId: crypto.randomUUID(),
    status: 'working',
    statusMessage: null,
    createdAt: new Date().toISOString(),
    lastUpdatedAt: new Date().toISOString(),
    ttl,
    pollInterval: 5000,
    _result: null,    // stored when complete
    _error: null,     // stored on failure
  };
  taskStore.set(task.taskId, task);
  return task;
}

function updateTask(taskId, updates) {
  const task = taskStore.get(taskId);
  if (!task) return;
  Object.assign(task, updates, { lastUpdatedAt: new Date().toISOString() });
}

// Clean up expired tasks
setInterval(() => {
  const now = Date.now();
  for (const [id, task] of taskStore) {
    const created = new Date(task.createdAt).getTime();
    if (task.ttl !== null && now - created > task.ttl) {
      taskStore.delete(id);
    }
  }
}, 60_000);

What to Check Right Now

Audit your slow tools – any tool that regularly takes over 30 seconds is a candidate for execution.taskSupport: 'optional'.
Check server capabilities – if you add task support, declare tasks.requests.tools.call in your server capabilities.
Respect pollInterval – never hard-code a polling frequency. Always use the server’s suggested pollInterval from the tasks/get response.
Handle all terminal states – completed, failed, and cancelled all need distinct handling in your polling loop.
Remember this is experimental – the Tasks API was introduced in 2025-11-25 and may change. Pin your implementation to the spec version and watch for updates.

nJoy 😉

Lesson 46 of 55: MCP Registry, Discovery, and Service Mesh Patterns

Posted on March 19, 2026March 22, 2026 by David Saliba

In large organizations, the number of MCP servers grows quickly. A payments MCP server, a customer data MCP server, a product catalog server, an analytics server – each maintained by different teams. Without a registry, every agent developer must manually configure each server’s URL, credentials, and capabilities. A registry solves this: publish once, discover everywhere. This lesson builds an MCP server registry, a discovery client, and covers service mesh integration patterns for enterprise deployments.

MCP server registry diagram servers publishing capabilities agents discovering via registry service mesh dark — MCP registry: servers publish capabilities, agents query the registry to build their tool set dynamically.

Registry Data Model

// A registry entry describes one MCP server
/**
 * @typedef {Object} RegistryEntry
 * @property {string} id - Unique server identifier (slug)
 * @property {string} name - Human-readable name
 * @property {string} description - What this server does
 * @property {string} url - Base URL for Streamable HTTP transport
 * @property {string} version - Server version (semver)
 * @property {string[]} tags - Capability tags for discovery (e.g., ['products', 'inventory'])
 * @property {Object} auth - Authentication requirements
 * @property {string} auth.type - 'none' | 'bearer' | 'oauth2'
 * @property {string} [auth.tokenEndpoint] - OAuth token endpoint if auth.type === 'oauth2'
 * @property {string} healthUrl - Health check endpoint
 * @property {Date} lastSeen - Last successful health check
 * @property {'healthy' | 'degraded' | 'down'} status - Current health status
 */

A shared schema keeps every team describing servers the same way: URL and version for routing upgrades, tags for capability search, and auth metadata so hosts know whether to attach a bearer token or run an OAuth flow. healthUrl and lastSeen let the registry drop or deprioritize dead endpoints before agents waste time connecting.

Simple Registry Server

// registry-server.js - A lightweight HTTP registry for MCP servers
import express from 'express';

const app = express();
app.use(express.json());

// In-memory store (use Redis or PostgreSQL in production)
const registry = new Map();

// Register a server
app.post('/servers', (req, res) => {
  const entry = {
    ...req.body,
    registeredAt: new Date().toISOString(),
    lastSeen: new Date().toISOString(),
    status: 'healthy',
  };
  registry.set(entry.id, entry);
  res.status(201).json({ id: entry.id });
});

// List all healthy servers (with optional tag filter)
app.get('/servers', (req, res) => {
  const { tags, status = 'healthy' } = req.query;
  let servers = [...registry.values()].filter(s => s.status === status);

  if (tags) {
    const filterTags = tags.split(',');
    servers = servers.filter(s => filterTags.some(t => s.tags?.includes(t)));
  }

  res.json({ servers });
});

// Health check runner: poll all registered servers every 30 seconds
setInterval(async () => {
  for (const [id, entry] of registry) {
    try {
      const res = await fetch(entry.healthUrl, { signal: AbortSignal.timeout(5000) });
      entry.status = res.ok ? 'healthy' : 'degraded';
      entry.lastSeen = new Date().toISOString();
    } catch {
      entry.status = 'down';
    }
    registry.set(id, entry);
  }
}, 30_000);

app.listen(4000, () => console.log('Registry listening on :4000'));

The in-memory Map is enough to learn the flow; in a real project, you would persist entries in PostgreSQL or Redis, authenticate POST /servers so only CI or cluster identity can register, and run the health poller as a separate worker or cron so API latency stays predictable under many servers.

Discovery Client for Agents

The discovery client connects to servers found in the registry, lists their tools once, builds an index, and routes tool calls to the correct server without repeating the listTools() round trip on every invocation.

// discovery-client.js - Used by agent hosts to discover MCP servers
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamable-http.js';

class McpDiscoveryClient {
  #registryUrl;
  #connections = new Map();   // serverId -> { client, server }
  #toolIndex = new Map();     // toolName -> client (built once, refreshed on change)

  constructor(registryUrl) {
    this.#registryUrl = registryUrl;
  }

  // Discover servers by tags, connect, and build the tool index
  async connect(tags = []) {
    const query = tags.length ? `?tags=${tags.join(',')}` : '';
    const res = await fetch(`${this.#registryUrl}/servers${query}`);
    const { servers } = await res.json();

    const connected = [];
    for (const server of servers) {
      if (this.#connections.has(server.id)) {
        connected.push(server);
        continue;
      }

      try {
        const transport = new StreamableHTTPClientTransport(new URL(`${server.url}/mcp`));
        const client = new Client({ name: 'discovery-host', version: '1.0.0' });
        await client.connect(transport);

        // Listen for tool list changes so the index stays current
        client.setNotificationHandler(
          { method: 'notifications/tools/list_changed' },
          async () => { await this.#rebuildIndex(); }
        );

        this.#connections.set(server.id, { client, server });
        connected.push(server);
        console.log(`Connected to ${server.name} (${server.id})`);
      } catch (err) {
        console.error(`Failed to connect to ${server.name}: ${err.message}`);
      }
    }

    // Build the tool index once after all connections are established
    await this.#rebuildIndex();
    return connected;
  }

  // Build a Map<toolName, client> from all connected servers
  // Called once on connect() and again if any server signals tools/list_changed
  async #rebuildIndex() {
    this.#toolIndex.clear();
    for (const [id, { client }] of this.#connections) {
      try {
        const { tools } = await client.listTools();
        for (const tool of tools) {
          this.#toolIndex.set(tool.name, client);
        }
      } catch (err) {
        console.error(`Failed to index tools from ${id}: ${err.message}`);
      }
    }
    console.log(`Tool index rebuilt: ${this.#toolIndex.size} tools across ${this.#connections.size} servers`);
  }

  // Get all tools from all connected servers (uses cached index)
  async getAllTools() {
    const allTools = [];
    for (const [id, { client }] of this.#connections) {
      try {
        const { tools } = await client.listTools();
        allTools.push(...tools.map(t => ({ ...t, serverId: id })));
      } catch (err) {
        console.error(`Failed to list tools from ${id}: ${err.message}`);
      }
    }
    return allTools;
  }

  // Route a tool call to the correct server via the pre-built index
  // No listTools() round trip on each call - O(1) lookup
  async callTool(toolName, args) {
    const client = this.#toolIndex.get(toolName);
    if (!client) {
      throw new Error(`Tool '${toolName}' not found in any connected server`);
    }
    return client.callTool({ name: toolName, arguments: args });
  }
}

// Usage
const discovery = new McpDiscoveryClient('https://registry.internal');
await discovery.connect(['products', 'analytics']);
const allTools = await discovery.getAllTools();
console.log(`Discovered ${allTools.length} tools across all servers`);

// Tool calls now use the index - no extra round trip
const result = await discovery.callTool('search_products', { query: 'laptop', limit: 5 });

The previous version of this code called listTools() on every server on every callTool() invocation – a quadratic round-trip cost that gets expensive fast. The index is built once and refreshed automatically when a server emits notifications/tools/list_changed.

Discovery client connecting to registry fetching server list connecting to multiple MCP servers aggregating tools dark — Discovery flow: query registry by tags -> connect to relevant servers -> aggregate tools -> route calls.

Service Mesh Integration (Istio / Linkerd)

In Kubernetes environments, a service mesh handles mutual TLS, traffic routing, and observability for all service-to-service communication, including MCP connections:

# With Istio, MCP server-to-server communication is automatically mTLS
# No code changes required - the sidecar proxy handles it

# Example: VirtualService for traffic splitting during MCP server rollout
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: mcp-product-server
spec:
  hosts:
    - mcp-product-server
  http:
    - route:
        - destination:
            host: mcp-product-server
            subset: v2
          weight: 10  # 10% to new version
        - destination:
            host: mcp-product-server
            subset: v1
          weight: 90  # 90% to stable version

Weighted routes let you canary a new MCP build while most sessions stay on the known-good subset. In a real project, you would pair this with metric and trace dashboards so a spike in tool errors on the new subset triggers an automatic rollback or traffic shift.

Server Health Aggregation

// Aggregate health status across all registered servers for a status page
app.get('/status', async (req, res) => {
  const servers = [...registry.values()];
  const healthy = servers.filter(s => s.status === 'healthy').length;
  const degraded = servers.filter(s => s.status === 'degraded').length;
  const down = servers.filter(s => s.status === 'down').length;

  const overall = down > 0 ? 'degraded' : (degraded > 0 ? 'degraded' : 'operational');

  res.json({
    status: overall,
    summary: { total: servers.length, healthy, degraded, down },
    servers: servers.map(s => ({
      id: s.id, name: s.name, status: s.status, lastSeen: s.lastSeen,
    })),
  });
});

A single /status response gives operators and internal portals a fleet-wide view without opening each MCP server. In a real project, you would cache or rate-limit heavy dashboards and map down servers to paging rules so on-call sees registry-level outages, not only per-pod alerts.

What to Build Next

Deploy the registry server alongside your existing MCP servers. Register each server on startup using a POST to the registry.
Build a simple status page that reads from /status and shows which MCP servers are healthy.

nJoy 😉

Older posts →