Back to blogs
Trading / Forex
9 min read
Apr 18, 2025

Building an AI Agent on Top of Trading Data: Querying MT5 and CRM for Real Insights

How I built a natural-language AI agent at Islero Capital that lets traders and risk managers query their MT5 trading history, open positions, and CRM data by simply asking questions—powered by LLM function calling over live financial data.

AI AgentMT5CRMLangChainFunction CallingTrading AnalyticsRAG
Table of contents

Islero Capital's risk managers and portfolio analysts spent hours each week writing SQL to answer questions like "which trader had the largest drawdown last month?" or "show me all positions that were open during the London session spike on Tuesday." These are reasonable analytical questions. They shouldn't require SQL expertise or waiting for a report.

We built an AI agent that lets anyone on the team ask these questions in plain English and get accurate, cited answers in seconds.

What the Agent Needed to Do

The requirements were more specific than "talk to a database." The agent needed to:

  1. Answer questions about individual trader performance (P&L, win rate, average hold time)
  2. Query open and closed positions filtered by symbol, date range, lot size, or profit
  3. Look up CRM data—client tier, account balance, recent activity—and join it with trade data
  4. Generate performance summaries that could be copy-pasted into reports
  5. Flag anomalies: unusually large positions, trades outside working hours, abnormal drawdowns

The challenge was that the data lived in three places: TimescaleDB (tick history), PostgreSQL (positions, accounts), and a CRM (Salesforce). A single user question often required data from two or three of these.

Agent Architecture

We used LangChain's agent framework with a ReAct-style reasoning loop. The agent has access to a set of tools and decides which ones to call, in what order, based on the question:

User Question
      ↓
   Planner (GPT-4o)
      ↓
   Tool Selection
      ↓
  ┌───────────────────────────────────┐
  │ get_trader_performance(...)       │
  │ get_positions(...)                │
  │ get_account_info(...)             │
  │ search_crm_client(...)            │
  │ get_tick_context(symbol, ts)      │
  └───────────────────────────────────┘
      ↓
  Tool Results → Synthesizer (GPT-4o)
      ↓
  Final Answer (with citations)

The planner and synthesizer are both GPT-4o. We tried GPT-4o-mini for the planner but it consistently missed multi-hop reasoning (e.g., "get this trader's account → then get their positions → then compare to their CRM tier"). GPT-4o planned correctly on the first try most of the time.

Tool Definitions

Each tool wraps a specific data access pattern. Here's a representative selection:

const tools = [
  {
    name: "get_trader_performance",
    description: `Get performance metrics for a trader over a date range. 
Returns: total_pips, total_profit_usd, win_rate, avg_hold_minutes, 
max_drawdown, trade_count.`,
    schema: z.object({
      trader_id: z.string(),
      from: z.string().describe("ISO date, e.g. 2025-01-01"),
      to: z.string().describe("ISO date, e.g. 2025-04-30"),
    }),
    execute: async ({ trader_id, from, to }) => {
      return db.query(
        `SELECT 
          COUNT(*) AS trade_count,
          SUM(profit) AS total_profit_usd,
          SUM(pips) AS total_pips,
          AVG(EXTRACT(EPOCH FROM (close_time - open_time)) / 60) AS avg_hold_minutes,
          (COUNT(*) FILTER (WHERE profit > 0))::float / COUNT(*) AS win_rate,
          MIN(running_dd) AS max_drawdown
         FROM positions
         WHERE trader_id = $1
           AND close_time BETWEEN $2 AND $3
           AND status = 'closed'`,
        [trader_id, from, to]
      );
    },
  },
 
  {
    name: "get_positions",
    description: "Query positions with flexible filters. Supports status (open/closed), symbol, lot size range, profit range, and date range.",
    schema: z.object({
      trader_id: z.string().optional(),
      symbol: z.string().optional(),
      status: z.enum(["open", "closed", "all"]).default("all"),
      from: z.string().optional(),
      to: z.string().optional(),
      min_lots: z.number().optional(),
      max_lots: z.number().optional(),
      limit: z.number().default(20),
    }),
    execute: buildPositionsQuery,
  },
 
  {
    name: "search_crm_client",
    description: "Look up a client in Salesforce CRM by name, email, or account ID. Returns tier, AUM, relationship manager, and recent activity.",
    schema: z.object({
      query: z.string().describe("Name, email, or account ID"),
    }),
    execute: async ({ query }) => salesforceClient.search(query),
  },
];

The tool descriptions are verbose on purpose. GPT-4o reads them to decide which tool to call—vague descriptions lead to wrong tool choices.

Handling Multi-Hop Questions

Multi-hop questions were where most agents fail. Consider: "Compare the performance of our top-tier CRM clients versus standard clients over Q1."

This requires:

  1. Get the list of top-tier clients from CRM
  2. Get their trader IDs (CRM uses Salesforce IDs, positions table uses MT5 account IDs)
  3. Get performance metrics for each group
  4. Compute aggregated comparison

Without explicit guidance, the LLM sometimes tried to do this in a single get_trader_performance call. We fixed it with a planning prompt that forces decomposition:

const SYSTEM_PROMPT = `You are a trading analytics assistant for Islero Capital.
 
For complex questions requiring data from multiple sources, ALWAYS:
1. Plan your tool calls explicitly before making them
2. Make one tool call at a time and review the result before proceeding
3. Join data across tools manually in your reasoning
4. Cite which tool provided each number in your answer
 
You have access to real-time and historical trading data. Be precise with numbers.
Never estimate or hallucinate figures—if you don't have the data, say so.`;

The "cite which tool provided each number" instruction was added after we caught the model blending numbers from different tools incorrectly.

Anomaly Detection Integration

Beyond answering questions, the agent can proactively flag anomalies. We built a nightly job that runs the agent with an anomaly-focused prompt over the day's trading data:

const ANOMALY_PROMPT = `
Review today's trading activity for Islero Capital.
Flag any of the following:
- Positions larger than 3x the trader's average lot size
- Trades opened or closed outside 07:00-17:00 UTC
- Any trader with a drawdown exceeding 8% today
- Multiple positions in the same direction on correlated pairs (risk concentration)
 
For each flag: state the trader ID, the specific anomaly, and the relevant figures.
`;
 
const report = await agent.run(ANOMALY_PROMPT);
await slackClient.postToChannel("#risk-alerts", report);

The risk team gets this report every morning before markets open. It replaced a manual check that previously took 45 minutes.

Security Considerations

Trading data is sensitive. We implemented three layers of access control:

  1. Tool-level access gates: Each tool checks the calling user's role before executing. The get_crm_client tool is restricted to relationship managers and above.
  2. Read-only database connections: The agent's DB user has SELECT-only privileges. There's no path for the LLM to modify data.
  3. Query parameterization everywhere: Tool inputs are always bound parameters, never string concatenation. LLM output is treated as untrusted user input.
// Never do this:
const bad = `SELECT * FROM positions WHERE trader_id = '${traderIdFromLLM}'`;
 
// Always do this:
const good = db.query("SELECT * FROM positions WHERE trader_id = $1", [traderIdFromLLM]);

We also log every agent interaction—the question, the tools called, and the final answer—for audit purposes.

Results

After three months in production, the risk team's SQL query volume dropped by 70%. The analysts who built the most queries became the heaviest users of the agent. Average time to answer a performance question went from ~15 minutes (write SQL, run, interpret) to under 30 seconds.

The most unexpected value came from the anomaly detection job—it caught a risk concentration issue (a trader with correlated positions on EURUSD, GBPUSD, and AUDUSD) on the day of a major USD news event that would have been missed until the evening reconciliation.

Key Takeaways

  • Tool descriptions are the API surface for the LLM—write them like documentation, not code comments.
  • Separate planner from synthesizer when questions require multi-hop reasoning. The planner's job is decomposition; the synthesizer's job is assembling a coherent answer.
  • Force citation in the system prompt. Without it, GPT-4o will occasionally blend numbers from different tool results.
  • Treat all LLM tool call arguments as untrusted input—parameterize every query, restrict DB users to read-only.
  • Nightly anomaly runs are one of the highest-ROI uses of a trading analytics agent. The model's pattern recognition over structured financial data is genuinely impressive.