Model Context Protocol: How to Build Custom MCP Servers for AI Agent Tooling
What Is MCP and Why It Matters
Model Context Protocol (MCP) is an open standard created by Anthropic for connecting AI models and agents to external tools, data sources, and services. Think of it as a universal adapter layer between an LLM and the world. Before MCP, every AI agent integration was bespoke: custom function-calling schemas, one-off API wrappers, and framework-specific tool definitions that could not be reused across different AI systems. MCP standardizes this into a JSON-RPC 2.0 based protocol with a clear client-server architecture. An MCP server exposes three types of capabilities: tools (executable functions the model can invoke, like querying a database or creating a Jira ticket), resources (data the model can read, like file contents, database records, or API responses), and prompts (reusable prompt templates with parameters). An MCP client (like Claude Desktop, Claude Code, or any application using the MCP SDK) discovers and invokes these capabilities through a standardized handshake and message format. The practical impact is significant: you build an MCP server once, and it works with any MCP-compatible AI client. No more rewriting tool integrations for each new framework or model.
MCP Architecture: Transports, Sessions, and Capabilities
MCP supports two transport mechanisms: stdio (standard input/output) for local integrations where the client spawns the server as a subprocess, and SSE (Server-Sent Events) over HTTP for remote servers. Stdio is simpler and used by Claude Desktop and Claude Code for local tools. SSE is used for shared servers deployed as web services. The protocol lifecycle follows a clear sequence: the client sends an initialize request with its protocol version and capabilities, the server responds with its own capabilities (supported tools, resources, prompts), the client sends an initialized notification to confirm, and then the session is active for tool calls and resource reads. Each tool call follows JSON-RPC 2.0: the client sends a tools/call request with the tool name and arguments as a JSON object, and the server responds with content blocks (text, images, or embedded resources). Error handling uses standard JSON-RPC error codes. The server declares its capabilities during initialization, so clients know exactly what is available. Servers can also send notifications to clients, enabling real-time updates for long-running operations. This bidirectional communication model makes MCP significantly more flexible than simple function-calling APIs.
Building a Custom MCP Server: Step by Step
Let us walk through building a production MCP server that provides database access to an AI agent. We will use the official TypeScript SDK (@modelcontextprotocol/sdk). First, install the SDK: npm install @modelcontextprotocol/sdk zod. Create your server entry point: instantiate a new McpServer with a name and version, then define tools using server.tool(). Each tool declaration includes a name, a description (this is what the LLM reads to decide when to use the tool, so make it clear and specific), a Zod schema defining the input parameters, and an async handler function that executes the tool logic and returns a content array. For example, a query_database tool would accept a sql_query parameter (string), validate it against an allowlist of safe operations (SELECT only, no DDL), execute it against your PostgreSQL database using a connection pool, and return the results as a JSON text block. For the transport layer, if targeting Claude Desktop, use StdioServerTransport and connect it with server.connect(transport). If deploying as a remote service, use SSEServerTransport with an Express.js HTTP server. The server should handle graceful shutdown by listening for SIGINT and closing the database pool. We recommend keeping each MCP server focused on a single domain (database access, file operations, API integration) rather than building monolithic servers with dozens of tools. This follows the single-responsibility principle and makes servers easier to test, deploy, and compose.
Deploying MCP Servers in Production
For local MCP servers (stdio transport), deployment means packaging the server as an npm package or Docker container that users configure in their Claude Desktop or Claude Code settings via the mcpServers configuration in claude_desktop_config.json. Specify the command (node or npx), args (path to your server entry point), and any environment variables (database connection strings, API keys) in the env field. For remote MCP servers (SSE transport), we deploy on ECS Fargate behind an ALB with the following architecture: the Express.js server runs on port 3000, the ALB handles TLS termination and health checks (a GET /health endpoint returning 200), and an API Gateway authorizer or ALB authentication rule handles auth via OAuth 2.0 tokens or API keys. Key production considerations include authentication (MCP itself does not prescribe auth, so you must implement it at the transport layer; we use JWT validation middleware in the Express.js server), rate limiting (apply per-tool rate limits to prevent runaway agents from overwhelming backend systems; we use express-rate-limit with Redis-backed storage at 100 requests per minute per client), input validation (validate every tool input rigorously; the Zod schemas in the SDK help, but also validate at the business logic layer, especially for database queries and file operations), and logging (log every tool invocation with the client ID, tool name, sanitized inputs, execution duration, and outcome; ship logs to CloudWatch and create dashboards in Grafana). We have deployed MCP servers for internal tools like Jira integration, database query access, deployment status checks, and documentation search. One team uses a custom MCP server that wraps their internal API, letting engineers query production metrics, check deployment status, and search error logs directly from Claude Code.
Composing MCP Servers for Complex Agentic Workflows
The real power of MCP emerges when you compose multiple servers into an agentic workflow. An AI agent using Claude as its backbone can connect to multiple MCP servers simultaneously, gaining access to a diverse toolkit. For example, an SRE agent we built connects to four MCP servers: a Kubernetes server (queries pod status, reads logs, checks resource utilization via the Kubernetes API), a Prometheus server (runs PromQL queries, retrieves alert status, fetches metric time series), a PagerDuty server (acknowledges incidents, adds notes, escalates to on-call), and a Runbook server (searches internal documentation, retrieves step-by-step remediation procedures). When an alert fires, the agent uses the PagerDuty server to read the alert details, the Prometheus server to query related metrics and confirm the issue, the Kubernetes server to check pod health and recent deployments, and the Runbook server to find the relevant remediation steps. The agent then summarizes its findings and either executes the remediation (for well-understood issues with approved runbooks) or provides a detailed analysis to the on-call engineer. Building this without MCP would require a monolithic integration layer. With MCP, each server is independently developed, tested, and deployed. Adding a new capability (say, a GitHub server for checking recent commits) is as simple as deploying a new MCP server and adding it to the agent's configuration. The protocol's standardized discovery mechanism means the agent automatically learns the new tools without code changes.