AI Agents

Building a Company Research Agent with MCP and UKDataAPI

Published 2026-04-23

What Is the Model Context Protocol (MCP)?

The Model Context Protocol, or MCP, is an open standard created by Anthropic that lets AI models connect to external tools and data sources. Think of it as a universal adapter between language models and APIs — instead of writing custom function-calling code for every data source, MCP provides a standardised way for models to discover, understand, and use tools.

Before MCP, connecting Claude or GPT to an API required writing wrapper functions, managing authentication, formatting prompts to explain the available tools, and parsing the model's attempts to call those functions. Every integration was bespoke. MCP replaces all of that with a protocol: the data source publishes its capabilities in a standard format, and any MCP-compatible client can use them immediately.

For developers, the practical impact is significant. You configure your MCP client (Claude Desktop, Cursor, a LangChain agent, or any compatible framework) with a server URL, and the model instantly gains access to all the tools that server exposes. No prompt engineering, no function schema writing, no custom parsing. The model knows what tools are available, what parameters they accept, and how to interpret the responses.

UKDataAPI exposes all 22 of its endpoints as MCP tools at https://www.ukdatapi.com/api/mcp. Once connected, a model can look up companies, check director histories, assess property risk, find government tenders, and generate due diligence reports — all through natural language conversation. This tutorial walks through building a company research agent that uses these tools in a structured workflow.

Connecting UKDataAPI's MCP Server

Setting up the connection takes about two minutes. The exact configuration depends on your MCP client, but the pattern is the same everywhere: point the client at the MCP server URL and provide your API key.

In Claude Desktop, open Settings, navigate to the MCP section, and add a new server. Set the URL to https://www.ukdatapi.com/api/mcp and add your API key as a header: Authorization: Bearer ukd_live_YOUR_KEY. Claude will immediately discover the 22 available tools and list them in the interface.

In Cursor, the configuration goes in your .cursor/mcp.json file. Add an entry with the server URL and your authentication header. After saving, Cursor's AI features will have access to live UK data in every conversation.

For LangChain and other frameworks, use the MCP client library for your language. In Python, the langchain-mcp package handles the connection. Initialise the client with the server URL and auth header, and add it to your agent's tool list. The library handles tool discovery, parameter validation, and response parsing.

Once connected, verify the setup by asking the model a question that requires live data — something like "What's the company status of Tesco PLC?" If the model calls the entity tool and returns current Companies House data, the connection is working. If it hallucinates a response without calling any tools, check that the MCP configuration is correctly loaded and the API key is valid.

Building a Multi-Step Research Workflow

A useful company research agent doesn't just answer single questions — it chains multiple data sources into a coherent analysis. Here's a three-step workflow that produces a comprehensive company research report: Entity lookup, Director analysis, and Due diligence report.

Step one: Entity lookup. The agent starts with whatever identifier the user provides — a company name, number, or even a director name. It calls the entity tool to get the core company profile: registration details, SIC codes, filing history, and the Corporate Distress Score. This establishes the basic facts and gives the agent context for the next steps.

Step two: Director analysis. Using the director names from the entity response, the agent calls the director tool for each active director. This reveals their other directorships, any dissolved companies in their history, and the Director Risk Score. The agent looks for patterns — directors with multiple recent dissolutions, overlapping directorships that might indicate conflicts of interest, or directors who appear across a network of related companies.

Step three: Due diligence report. With entity and director data in hand, the agent calls the report endpoint. This generates a structured verdict — PROCEED, CAUTION, or DO NOT ENGAGE — with a detailed rationale. The report synthesises the data the agent has already gathered with additional signals from the Gazette, FCA register, and court records.

The agent then composes these results into a structured research memo: company overview, director assessment, risk factors, and overall recommendation. Because it has access to the underlying data, it can cite specific evidence for each finding rather than making vague assertions.

Handling Responses and Edge Cases

Real-world data is messy, and your agent needs to handle the edge cases gracefully. The most common issues are ambiguous company names, dissolved companies, and rate limiting.

Ambiguous names are the trickiest. If a user asks about "Smith Consulting", there might be dozens of matches. The entity endpoint accepts company numbers for precise lookups, but users rarely know these. Build your agent to first search by name, present the top matches with enough context to disambiguate (registered address, incorporation date, active status), and let the user confirm before proceeding with the full research workflow.

Dissolved companies return data but the information is historical. Your agent should clearly flag when a company is no longer active and adjust its analysis accordingly — a Corporate Distress Score for a dissolved company isn't meaningful in the same way as for an active one. The director data remains valuable, though, since you might be researching the directors' current activities.

Rate limiting follows standard HTTP patterns. If you hit the rate limit, the API returns a 429 status with a Retry-After header. In an interactive agent, the best approach is to tell the user there's a brief wait and retry after the specified interval. In a batch workflow, implement exponential backoff. The MCP client libraries handle basic retry logic, but you should add application-level handling for extended waits.

Error responses are consistently formatted with a JSON body containing an error code and message. Common errors include 404 for unknown company numbers, 401 for invalid API keys, and 402 for insufficient credits. Your agent should interpret these errors and communicate them naturally rather than dumping raw error JSON at the user.

Deploying Your Research Agent

Once your research workflow is working in a development environment, there are several deployment options depending on your use case.

For internal tools, the simplest deployment is a Claude Desktop or Cursor configuration shared across your team. Create a standard MCP configuration file with the company API key and distribute it. Every team member gets an AI assistant that can research UK companies on demand. This works well for due diligence teams, investment analysts, and business development — anyone who regularly needs to investigate companies.

For customer-facing applications, you'll want a proper backend. Build the research workflow as a server-side process using LangChain, LlamaIndex, or the Anthropic SDK directly. The MCP connection runs server-side with your API key, and your frontend sends research requests to your backend. This keeps the API key secure and lets you add authentication, usage tracking, and result caching on your side.

For fully autonomous agents, the x402 payment option is compelling. Instead of managing API keys, your agent pays per request using USDC on Base Mainnet. This is ideal for agents that operate independently — crawling company registries, monitoring portfolios, or generating research reports on a schedule. The agent needs a funded wallet but no human-managed API credentials.

Caching is important regardless of deployment model. Company data changes infrequently — a Companies House profile doesn't change daily — so caching entity responses for 24 hours dramatically reduces API costs and response latency. Director data can be cached for a similar period. Only the report endpoint, which may incorporate real-time signals, should be called fresh each time.

Try it yourself

Get started with 200 free credits. No contract, no sales call.