Building and Monitoring AI Agents: A Step-by-Step Guide (Part 1)

Yusuf Ishola's headshotYusuf Ishola· May 2, 2025

Time to complete: ~30 minutes

Your AI agent worked perfectly in testing, but now in production it's making bizarre recommendations and you have no idea why. Sound familiar? As AI agents grow increasingly complex, the black box problem is becoming the number one obstacle to reliable deployment.

Building and Monitoring AI Agents

In this first part of our two-part series on AI agent observability, we'll build a financial research assistant that demonstrates the key components of a modern AI agent. In part two, we'll explore how to effectively monitor it with Helicone's agentic AI observability features.

Let's get started!

Table of Contents

Prerequisites

Before we dive in, you'll need:

Quick Start

Want to skip ahead and try the code immediately? Clone the GitHub repository and run the code:

git clone https://github.com/Yusu-f/helicone-agent-tutorial.git
cd helicone-agent-tutorial
npm install

Create a .env file with your API keys

OPENAI_API_KEY=your_openai_key_here
ALPHA_VANTAGE_API_KEY=your_alpha_vantage_key_here
HELICONE_API_KEY=your_helicone_key_here

Run the assistant

npm start

This gives you the version of the financial assistant with basic Helicone monitoring.

In part 2, we'll show you how to add comprehensive monitoring to your AI agent with Helicone's Sessions feature.

How We'll Build Our Financial Assistant

Our financial assistant does two things:

  1. Fetches real-time price information and news for specific tickers
  2. Uses RAG to answer questions about financial concepts

The agent intelligently determines which approach to take for each query—a pattern applicable to many domains beyond finance, including customer support, healthcare, and legal applications.

Key Components of Our AI Agent

1. Tools

Our agent uses OpenAI's function calling tools system to determine how to handle different queries:

// Define OpenAI tools for function calling
const tools = [
  {
    type: "function",
    function: {
      name: "getStockData",
      description: "Get current price and other information for a specific stock by ticker symbol",
      parameters: {
        type: "object",
        properties: {
          ticker: {
            type: "string",
            description: "The stock ticker symbol, e.g., AAPL for Apple Inc."
          }
        },
        required: ["ticker"]
      }
    }
  },
  ...
]

This approach allows the model to decide which functions to call based on the user's query.

2. Basic Helicone Monitoring

The financial assistant uses Helicone's basic monitoring to track the cost, latency, and error rate of our LLM calls. You can create an account for free here.

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://oai.helicone.ai/v1",
  defaultHeaders: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
  },
});

3. RAG & External API access

For popular financial term queries, we use a vector store to retrieve relevant information, while for stock queries, such as real-time price information or news, we connect to the Alpha Vantage API:

async function searchFinancialTerms(query, vectorStore) {
  console.log("Searching for financial term definitions in knowledge base...");
  
  // Get relevant documents from vector store with similarity scores
  const resultsWithScores = await vectorStore.similaritySearchWithScore(query, 2);

  // Process results... 
}
async function getStockData(ticker) {
  try {
    const url = `https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=${ticker}&apikey=${ALPHA_VANTAGE_API_KEY}`;
    const response = await axios.get(url);
    
    ...
  }
}

The RAG implementation provides domain-specific knowledge to the agent. However, as we'll see later, without proper monitoring, detecting what's causing the system to fail when it does might be difficult.

4. Minimal Agent Loop (Tool Calling)

We expose three tools to the LLM:

  • getStockData: Retrieves current price and market information for a specific ticker
  • getStockNews: Fetches the latest news articles related to a stock ticker
  • searchFinancialTerms: Queries our vector database for information about financial concepts

The LLM may call a tool, receive its output as feedback, and then answer the user.

The loop allows our agent to call tools and process results for as long as needed to generate an appropriate response:

async function processQuery(userQuery, vectorStore) {
  let messages = [
    {
      role: "system",
      content: `You're a financial assistant. Use tools when needed. If you have enough information to answer, reply normally.`
    },
    { role: "user", content: userQuery }
  ];
  
  // Add chat history for context if available
  if (chatHistory.length > 0) {
    messages.splice(1, 0, ...chatHistory);
  }
  
  while (true) {
    console.log("Sending query to OpenAI...");
    const llmResp = await openai.chat.completions.create({
      model: "gpt-3.5-turbo",
      tools,
      messages,
      temperature: 0.1,
    });
    
    const msg = llmResp.choices[0].message;
    
    if (msg.tool_calls && msg.tool_calls.length > 0) {
      // Execute the helper and push message into history...

      continue;
    }
    
    // No tool call → LLM has produced the final answer
    return msg.content;
  }
}

Testing Our Financial Assistant

Now, let's take our financial assistant for a spin!

Run the following command to start the assistant:

npm start

We can view the results of our queries in the Helicone dashboard.

Prompt: What is tesla's stock price?

Result:

Example of a financial research assistant responding to the prompt 'What is Tesla's stock price?' displaying the stock information.

Prompt: What is GreenEnergy's profit margin?

Result:

Example of an AI agent failing to retrieve information about GreenEnergy's profit margin despite having it in the knowledge base.

It looks like there's an issue with our agent!

It can't find the profit margin of GreenEnergy despite it being in our knowledge base.

Something is obviously wrong with our RAG implementation—but what?

This is where observability comes in!

Debugging Our Financial Assistant

Looking at our implementation, there are several blind spots that could potentially cause issues:

  • Hallucinations and retrieval issues: Our agent failed to answer the query related to GreenEnergy despite having the requisite information—how do we pinpoint the problem?
  • Cost Visibility: How many tokens is each component of our agent consuming? Which queries are most expensive?
  • Latency Issues: If the agent becomes slow, which step is causing the bottleneck?
  • Error Patterns: Are certain types of queries consistently failing? Where in the pipeline do these failures occur?

In Part 2 of this tutorial on AI agent optimization, we'll add Helicone to our financial assistant to gain comprehensive visibility into every step of the process. Here's a preview of what you can see:

Helicone dashboard showcasing the Sessions feature for debugging AI agents, demonstrating AI observability and agent monitoring capabilities.

We'll monitor each step of the agent's workflow, resolve bugs, and gain insights into useful metrics like cost, latency, and error rates.

Stay tuned!

Observe Your AI Agents with Helicone ⚡️

Stop building AI in the dark. Get complete visibility into every step of your AI workflows, track costs down to the penny, and debug complex issues in minutes instead of days.

You might also like:

Frequently Asked Questions

Why do AI agents need specialized observability tools?

AI agents have unique monitoring challenges including non-deterministic execution paths, multi-step LLM calls, complex branching logic, and dependencies on external systems. Unlike traditional applications with fixed flows, agents' decision trees vary with each request. Standard monitoring tools can't track these dynamic workflows or evaluate response quality across interconnected steps, which is why specialized tools like Helicone's session-based tracing are essential for AI agent observability.

What are the biggest blind spots when deploying AI agents to production?

The most dangerous blind spots include: undetected hallucinations in responses, hidden cost escalations from inefficient prompts, silent failures in multi-step reasoning chains, data leakage in RAG implementations, inconsistent performance across different user segments, and degrading accuracy over time as data or usage patterns change. Without proper observability, these issues can persist for weeks before being discovered, potentially causing significant business impact.

What metrics should I monitor for any AI agent?

Critical metrics for all AI agents include: end-to-end latency of complete workflows, token usage per step and total cost per request, step completion rates showing where agents get stuck, retrieval quality for RAG implementations, routing accuracy between different processing pathways, error rates for external API calls, and user satisfaction with responses. Tracking these metrics helps identify bottlenecks, optimize costs, and ensure reliable agent performance.

How do I implement observability across different AI agent frameworks?

Helicone offers flexible integration options for all major AI frameworks. For LangChain, CrewAI, and LlamaIndex, direct integrations are available. For custom agents or other frameworks, you can typically use either Helicone's proxy approach (changing just the base URL) or the SDK integration. The Sessions feature works consistently across most major frameworks to trace multi-step agent workflows regardless of your technology choices, giving you a unified view of all AI operations.


Questions or feedback?

Are the information out of date? Please raise an issue or contact us, we'd love to hear from you!