Building and Monitoring AI Agents with Helicone: A Step-by-Step Guide (Part 2)

Yusuf Ishola's headshotYusuf Ishola· June 2, 2025

Time to complete: ~15 minutes

In Part 1 of this tutorial, we built a financial research assistant that had some concerning issues. While our agent could fetch Tesla's stock price and news, it failed to properly handle knowledge base queries and inappropriately searched for information it already knew.

Now it's time to fix these problems using Helicone's comprehensive observability features.

Building and Monitoring AI Agents: Part 2

Table of Contents

The Problems We Need to Fix

From Part 1, we identified several critical issues:

  1. RAG retrieval failure: Our agent couldn't find GreenEnergy's profit margin despite having it in the knowledge base
  2. Inappropriate tool usage: The agent searched its knowledge base for information about Elon Musk instead of using its general knowledge
  3. Hidden performance issues: Without proper monitoring, we couldn't see what was actually happening in our agent's decision-making process

Let's fix these systematically.

Adding Comprehensive Monitoring with Helicone Sessions

First, we need visibility into every step of our agent's workflow. Helicone's Sessions feature allows us to trace multi-step interactions and see exactly where things go wrong.

Step 1: Install Helicone Helpers

npm install @helicone/helpers

Step 2: Add Session Tracking

We'll modify our code to track each tool/RAG call and LLM interaction as part of a session:

import { HeliconeManualLogger } from '@helicone/helpers';
import crypto from 'crypto';

// Initialize Helicone manual logger
const heliconeLogger = new HeliconeManualLogger({
  apiKey: process.env.HELICONE_API_KEY,
  headers: {
    "Helicone-Property-Type": "Financial-Research-Agent"
  }
});

// Session management
let sessionId;
let sessionHeaders;

// Function to start a new session
function startNewSession() {
  sessionId = crypto.randomUUID();
  sessionHeaders = {
    "Helicone-Session-Id": sessionId,
    "Helicone-Session-Name": "Financial Research Session",
    "Helicone-Session-Path": "/financial-research",
  };
  console.log(`\nStarted new session: ${sessionId}`);
}

Step 3: Add Detailed Tool Logging

Now we'll wrap each tool/RAG call with comprehensive logging using the Helicone ManualLogger—this step isn't required, but provides greater visibility.

Snippet for implementing logging
// Function to search for company information with Helicone logging
async function searchCompanyInfo(query, vectorStore) {
  return heliconeLogger.logRequest(
    {
      _type: "vector_db",
      operation: "search",
      text: query,
      topK: 2,
      databaseName: "company_profiles_db",
      metadata: {
        searchType: "similarity_search_with_score",
        threshold: 0.9, // This threshold is too high!
        requestTime: new Date().toISOString()
      }
    },
    async (resultRecorder) => {
      console.log("Searching for company information in knowledge base...");
      
      const resultsWithScores = await vectorStore.similaritySearchWithScore(query, 2);
      
      if (resultsWithScores.length === 0 || resultsWithScores[0][1] < 0.9) {
        const result = {
          found: false,
          message: "No relevant company information found in the knowledge base."
        };
        
        resultRecorder.appendResults({
          results: result,
          matchCount: 0,
          bestScore: resultsWithScores[0]?.[1] || 0,
          searchLatency: 50,
          status: "no_match"
        });
        
        return result;
      }
      
      // Process successful results...
    },
    {
      ...sessionHeaders,
      "Helicone-Session-Path": `/financial-research/company-search/${query.toLowerCase().replace(/\s+/g, '-')}`
    }
  );
}

With this setup, let's test our agent and see what's happening in the Helicone dashboard.

Helicone Sessions dashboard showing agent behavior before fixes

The dashboard reveals the exact problems:

  • Vector DB searches, while successful, are failing to return appropriate results
  • The agent is making unnecessary knowledge base searches for general information
  • We can see the complete decision tree leading to these failures

Fix #1: Adjusting the RAG Threshold

The first issue is clear from our dashboard—the similarity threshold of 0.9 is too restrictive (can be viewed by clicking on the vector_db call row).

Even relevant documents about GreenEnergy aren't meeting this threshold.

Let's update our code to use a lower threshold of 0.7.

// Change this line in searchCompanyInfo function
if (resultsWithScores.length === 0 || resultsWithScores[0][1] < 0.7) { // Reduced from 0.9 to 0.7

Fix #2: Improving Tool Selection Logic

The second issue requires a more sophisticated approach. Our agent is using tools when it should rely on its general knowledge. Let's update the system prompt to be more specific:

async function processQuery(userQuery, vectorStore) {
  let messages = [
    {
      role: "system",
      content: `You're a financial assistant. Use the provided tools to answer the user's questions, but only when necessary.  

              Available tools:
              1. getStockData - Fetches current stock price information for a specific ticker symbol
              2. getStockNews - Retrieves recent news articles about a company based on its ticker symbol
              3. searchCompanyInfo - Searches the knowledge base for detailed information about companies or financial concepts

              NOTES:
              - DO NOT use tools for general knowledge questions that you can answer directly
              - ONLY use tools when you need:
                * Current stock prices or market data
                * Recent news about specific companies
                * Detailed information about unknown companies
              - If you have sufficient information to provide an accurate and complete answer from your existing knowledge, respond directly without using tools
              `
    },
    { role: "user", content: userQuery }
  ];
  
  // Rest of the function...
}

However, testing this change alone doesn't fully resolve the issue. The model still tends to overuse tools.

Fix #3: Upgrading the Model

Sometimes more sophisticated reasoning requires a more capable model. Let's upgrade from GPT-3.5 Turbo to GPT-4o Mini:

const llmResp = await openai.chat.completions.create({
  model: "gpt-4o-mini", // Upgraded from "gpt-3.5-turbo"
  tools,
  messages,
  temperature: 0.1,
}, {
  headers: {
    ...currentSessionHeaders,
    "Helicone-Prompt-Id": lastToolUsed ? `${lastToolUsed}-response` : "financial-research-reasoning"
  }
});

Why Model Upgrades Matter 💡

GPT-4o Mini has better reasoning capabilities for understanding when NOT to use tools, leading to more appropriate tool selection and fewer unnecessary API calls.

Testing Our Fixes

Let's run our improved agent and see the results:

Prompt: Who is Elon Musk?

Before fixes: Agent searches knowledge base, finds nothing, then provides answer

After fixes: Agent directly answers from general knowledge without unnecessary tool calls

Prompt: What is GreenEnergy's profit margin?

Before fixes: Agent fails to find information despite having it in knowledge base

After fixes: Agent successfully retrieves "GreenEnergy Corp has a profit margin of 14%"

Helicone Sessions dashboard showing improved agent behavior after fixes, with proper tool usage and successful retrieval

The improved dashboard shows:

  • Appropriate tool selection (no unnecessary knowledge base searches)
  • Successful vector database retrievals with lower threshold
  • Clean decision paths with better reasoning
  • Reduced token usage from fewer unnecessary tool calls

Discovering Hidden Issues

One powerful benefit of comprehensive monitoring is discovering problems you didn't know existed. In our Helicone dashboard, we can see 429 rate limit errors that were previously invisible in the CLI:

Status: 429 Error
Error: Rate limit exceeded for model gpt-4o-mini

These errors didn't cause complete failures but added significant latency to responses. Without Helicone's observability, we might never have noticed these delays in a production environment.

To fix rate limiting issues, you can:

  • Implement exponential backoff retry logic
  • Add request queuing
  • Monitor usage patterns to optimize request timing
  • Consider upgrading API tier limits if needed

The Complete Solution

Here's our final agent with all fixes applied:

Click to see the complete code
import dotenv from 'dotenv';
import axios from 'axios';
import { OpenAI } from 'openai';
import { MemoryVectorStore } from 'langchain/vectorstores/memory';
import { OpenAIEmbeddings } from '@langchain/openai';
import { Document } from '@langchain/core/documents';
import { HeliconeManualLogger } from '@helicone/helpers';
import readline from 'readline';
import crypto from 'crypto';

// Load environment variables
dotenv.config();

// Initialize OpenAI client with Helicone monitoring
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://oai.helicone.ai/v1",
  defaultHeaders: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
  },
});

// Initialize Helicone manual logger
const heliconeLogger = new HeliconeManualLogger({
  apiKey: process.env.HELICONE_API_KEY,
  headers: {
    "Helicone-Property-Type": "Financial-Research-Agent"
  }
});

// Alpha Vantage API key
const ALPHA_VANTAGE_API_KEY = process.env.ALPHA_VANTAGE_API_KEY;

// Session management
let sessionId;
let sessionHeaders;
let chatHistory = [];

// Sample company profiles for RAG - fake financial data
const companyProfiles = [
  `
  # TechVision Inc. (TVIX)
  
  Industry: Technology
  Founded: 2005
  Headquarters: San Francisco, CA
  
  TechVision is a leading AI and machine learning company specializing in computer vision solutions. Their flagship product, VisionCore, is used by major automotive manufacturers for autonomous driving systems.
  
  Recent developments:
  - Announced partnership with AutoDrive to enhance autonomous vehicle safety features
  - Introduced new AI chip with 40% better performance than previous generation
  - Expanding into healthcare imaging with acquisition of MedSight Technologies
  
  Financial highlights:
  - Annual revenue: $3.2B (up 18% YoY)
  - Profit margin: 22%
  - R&D spending: $780M (24% of revenue)
  `,
  `
  # GreenEnergy Corp (GRNE)
  
  Industry: Renewable Energy
  Founded: 2010
  Headquarters: Austin, TX
  
  GreenEnergy specializes in solar and wind energy solutions with a focus on energy storage technology. Their battery systems are used in both residential and commercial applications.
  
  Recent developments:
  - Launched next-generation home battery with 30% increased capacity
  - Secured $500M contract to build solar farm in Nevada
  - Expanding manufacturing facilities in Texas and Arizona
  
  Financial highlights:
  - Annual revenue: $1.8B (up 25% YoY)
  - Profit margin: 14%
  - Net cash position: $620M
  `,
  `
  # HealthPlus Inc. (HLTH)
  
  Industry: Healthcare
  Founded: 1998
  Headquarters: Boston, MA
  
  HealthPlus develops innovative medical devices and digital health platforms. Their diabetes management system has captured significant market share in the US.
  
  Recent developments:
  - FDA approval for next-generation continuous glucose monitor
  - Expanded telemedicine platform to include mental health services
  - Strategic partnership with major insurance providers
  
  Financial highlights:
  - Annual revenue: $2.4B (up 12% YoY)
  - Profit margin: 18%
  - International sales: 35% of revenue
  `,
  `
  # DigitalFinance Group (DFG)
  
  Industry: Fintech
  Founded: 2015
  Headquarters: New York, NY
  
  DigitalFinance provides blockchain-based payment solutions and digital banking services to both consumers and businesses.
  
  Recent developments:
  - Launched small business lending platform with AI-powered risk assessment
  - Obtained banking license in European Union
  - Integrated with major e-commerce platforms
  
  Financial highlights:
  - Annual revenue: $950M (up 40% YoY)
  - Profit margin: 8%
  - User base: 12 million (up 30% YoY)
  `,
  `
  # ConsumerBrands Corp (CNBC)
  
  Industry: Consumer Goods
  Founded: 1975
  Headquarters: Chicago, IL
  
  ConsumerBrands manages a portfolio of household products, personal care items, and food brands with strong presence in North America and Europe.
  
  Recent developments:
  - Sustainability initiative to make all packaging recyclable by 2026
  - Expansion into Asian markets
  - Divested underperforming snack food division
  
  Financial highlights:
  - Annual revenue: $8.5B (up 5% YoY)
  - Profit margin: 15%
  - Dividend yield: 3.2%
  `
];

// Function to initialize the vector store with company profiles
async function initializeVectorStore() {
  const embeddings = new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY,
  });
  
  const docs = companyProfiles.map(
    (text) => new Document({ pageContent: text })
  );
  
  return MemoryVectorStore.fromDocuments(docs, embeddings);
}

// Function to start a new session
function startNewSession() {
  sessionId = crypto.randomUUID();
  sessionHeaders = {
    "Helicone-Session-Id": sessionId,
    "Helicone-Session-Name": "Financial Research Session",
    "Helicone-Session-Path": "/financial-research",
  };
  console.log(`\nStarted new session: ${sessionId}`);
}

// Function to generate session path based on tool and arguments
function getToolSessionPath(toolName, args) {
  switch(toolName) {
    case "getStockData":
      return `/financial-research/stock-data/${args.ticker.toLowerCase()}`;
    case "getStockNews":
      return `/financial-research/news/${args.ticker.toLowerCase()}`;
    case "searchCompanyInfo":
      return `/financial-research/company-search/${args.query.toLowerCase().replace(/\s+/g, '-')}`;
    default:
      return "/financial-research";
  }
}

// Function to get stock price data from Alpha Vantage with Helicone logging
async function getStockData(ticker) {
  return heliconeLogger.logRequest(
    {
      _type: "tool",
      toolName: "get_stock_data",
      input: { ticker },
      metadata: {
        source: "alpha_vantage",
        operation: "global_quote",
        requestTime: new Date().toISOString()
      }
    },
    async (resultRecorder) => {
      try {
        const url = `https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=${ticker}&apikey=${ALPHA_VANTAGE_API_KEY}`;
        const response = await axios.get(url);
        
        if (response.data['Global Quote'] && Object.keys(response.data['Global Quote']).length > 0) {
          const quote = response.data['Global Quote'];
          const result = {
            symbol: ticker.toUpperCase(),
            price: parseFloat(quote['05. price']),
            change: parseFloat(quote['09. change']),
            changePercent: quote['10. change percent'],
            volume: parseInt(quote['06. volume']),
            latestTradingDay: quote['07. latest trading day']
          };
          
          resultRecorder.appendResults({
            output: result,
            status: "success",
            responseTime: Date.now()
          });
          
          return result;
        } else {
          const error = { error: `No data found for ticker ${ticker}` };
          resultRecorder.appendResults({
            output: error,
            status: "no_data",
            responseTime: Date.now()
          });
          return error;
        }
      } catch (error) {
        console.error('Error fetching stock data:', error);
        const errorResult = { error: `Failed to get stock data for ${ticker}` };
        resultRecorder.appendResults({
          output: errorResult,
          status: "error",
          error: error.message,
          responseTime: Date.now()
        });
        return errorResult;
      }
    },
    {
      ...sessionHeaders,
      "Helicone-Session-Path": `/financial-research/stock-data/${ticker.toLowerCase()}`
    }
  );
}

// Function to get latest news for a ticker with Helicone logging
async function getStockNews(ticker) {
  return heliconeLogger.logRequest(
    {
      _type: "tool",
      toolName: "get_stock_news",
      input: { ticker },
      metadata: {
        source: "alpha_vantage",
        operation: "news_sentiment",
        requestTime: new Date().toISOString()
      }
    },
    async (resultRecorder) => {
      try {
        const url = `https://www.alphavantage.co/query?function=NEWS_SENTIMENT&tickers=${ticker}&apikey=${ALPHA_VANTAGE_API_KEY}`;
        const response = await axios.get(url);
        
        if (response.data.feed && response.data.feed.length > 0) {
          const result = response.data.feed.slice(0, 3).map((item) => ({
            title: item.title,
            summary: item.summary,
            source: item.source,
            url: item.url,
            timePublished: item.time_published
          }));
          
          resultRecorder.appendResults({
            output: result,
            status: "success",
            newsCount: result.length,
            responseTime: Date.now()
          });
          
          return result;
        } else {
          const error = { error: `No news found for ticker ${ticker}` };
          resultRecorder.appendResults({
            output: error,
            status: "no_data",
            responseTime: Date.now()
          });
          return error;
        }
      } catch (error) {
        console.error('Error fetching news:', error);
        const errorResult = { error: `Failed to get news for ${ticker}` };
        resultRecorder.appendResults({
          output: errorResult,
          status: "error",
          error: error.message,
          responseTime: Date.now()
        });
        return errorResult;
      }
    },
    {
      ...sessionHeaders,
      "Helicone-Session-Path": `/financial-research/news/${ticker.toLowerCase()}`
    }
  );
}

// Function to search for company information with Helicone logging
async function searchCompanyInfo(query, vectorStore) {
  return heliconeLogger.logRequest(
    {
      _type: "vector_db",
      operation: "search",
      text: query,
      topK: 2,
      databaseName: "company_profiles_db",
      metadata: {
        searchType: "similarity_search_with_score",
        threshold: 0.7,
        requestTime: new Date().toISOString()
      }
    },
    async (resultRecorder) => {
      console.log("Searching for company information in knowledge base...");
      
      // Get relevant documents from vector store with similarity scores
      const resultsWithScores = await vectorStore.similaritySearchWithScore(query, 2);
      
      // Check if we have relevant results with good similarity scores
      if (resultsWithScores.length === 0 || resultsWithScores[0][1] < 0.7) {
        console.log("No relevant company information found in the knowledge base.");
        const result = {
          found: false,
          message: "No relevant company information found in the knowledge base."
        };
        
        resultRecorder.appendResults({
          results: result,
          matchCount: 0,
          bestScore: resultsWithScores[0]?.[1] || 0,
          searchLatency: 50,
          status: "no_match"
        });
        
        return result;
      }
      
      // Extract just the documents from the results for context
      const relevantDocs = resultsWithScores.map(([doc]) => doc);
      const result = {
        found: true,
        documents: relevantDocs
      };
      
      resultRecorder.appendResults({
        results: result,
        matchCount: relevantDocs.length,
        bestScore: resultsWithScores[0][1],
        searchLatency: 50,
        status: "success"
      });
      
      return result;
    },
    {
      ...sessionHeaders,
      "Helicone-Session-Path": `/financial-research/company-search/${query.toLowerCase().replace(/\s+/g, '-')}`
    }
  );
}

// Define the tools for our agent
const tools = [
  {
    type: "function",
    function: {
      name: "getStockData",
      description: "Get current price and other market information for a specific stock by ticker symbol",
      parameters: {
        type: "object",
        properties: {
          ticker: {
            type: "string",
            description: "The stock ticker symbol, e.g., AAPL for Apple Inc."
          }
        },
        required: ["ticker"]
      }
    }
  },
  {
    type: "function",
    function: {
      name: "getStockNews",
      description: "Get the latest news articles for a specific stock by ticker symbol",
      parameters: {
        type: "object",
        properties: {
          ticker: {
            type: "string",
            description: "The stock ticker symbol, e.g., AAPL for Apple Inc."
          }
        },
        required: ["ticker"]
      }
    }
  },
  {
    type: "function",
    function: {
      name: "searchCompanyInfo",
      description: "Search for detailed company information in the knowledge base",
      parameters: {
        type: "object",
        properties: {
          query: {
            type: "string",
            description: "The company name or topic to search for"
          }
        },
        required: ["query"]
      }
    }
  }
];

// Simple helper to map tool calls to functions
async function callHelper(name, args, vectorStore) {
  switch(name) {
    case "getStockData":
      return await getStockData(args.ticker);
    case "getStockNews":
      return await getStockNews(args.ticker);
    case "searchCompanyInfo":
      return await searchCompanyInfo(args.query, vectorStore);
    default:
      return { error: `Unknown tool: ${name}` };
  }
}

// Agent loop for processing queries with session tracking
async function processQuery(userQuery, vectorStore) {
  let messages = [
    {
      role: "system",
      content: `You're a financial assistant. Use the provided tools to answer the user's questions, but only when necessary.  

              Available tools:
              1. getStockData - Fetches current stock price information for a specific ticker symbol
              2. getStockNews - Retrieves recent news articles about a company based on its ticker symbol
              3. searchCompanyInfo - Searches the knowledge base for detailed information about companies or financial concepts

              NOTES:
              - DO NOT use tools for general knowledge questions that you can answer directly
              - ONLY use tools when you need:
                * Current stock prices or market data
                * Recent news about specific companies
                * Detailed information about unknown companies
              - If you have sufficient information to provide an accurate and complete answer from your existing knowledge, respond directly without using tools
              `
    },
    { role: "user", content: userQuery }
  ];
  
  // Add chat history for context if available
  if (chatHistory.length > 0) {
    messages.splice(1, 0, ...chatHistory);
  }

  // Track the last tool used for session path context
  let lastToolUsed = null;
  let lastToolArgs = null;
  
  while (true) {    
    // Determine session headers based on context
    let currentSessionHeaders = { ...sessionHeaders };
    if (lastToolUsed && lastToolArgs) {
      // If we just used a tool, update the session path for the LLM response
      currentSessionHeaders["Helicone-Session-Path"] = getToolSessionPath(lastToolUsed, lastToolArgs);
    }
    
    const llmResp = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      tools,
      messages,
      temperature: 0.1,
    }, {
      headers: {
        ...currentSessionHeaders,
        "Helicone-Prompt-Id": lastToolUsed ? `${lastToolUsed}-response` : "financial-research-reasoning"
      }
    });
    
    const msg = llmResp.choices[0].message;
    
    if (msg.tool_calls && msg.tool_calls.length > 0) {
      // Execute the helper
      const toolCall = msg.tool_calls[0];
      const functionName = toolCall.function.name;
      const functionArgs = JSON.parse(toolCall.function.arguments);
      
      // Store the tool information for the next LLM response
      lastToolUsed = functionName;
      lastToolArgs = functionArgs;
      
      console.log(`Executing ${functionName} with args:`, functionArgs);
      const toolResult = await callHelper(functionName, functionArgs, vectorStore);
      
      // Push feedback & loop
      messages.push(msg);  // LLM's tool call
      messages.push({
        role: "tool",
        tool_call_id: toolCall.id,
        name: functionName,
        content: JSON.stringify(toolResult)
      });
      continue;
    }
    
    // No tool call → LLM has produced the final answer
    // Reset tool tracking for next query
    lastToolUsed = null;
    lastToolArgs = null;
    return msg.content;
  }
}

// Main function
async function main() {
  console.log("Initializing Financial Research Assistant with Helicone Sessions...");
  
  // Check for API keys
  if (!process.env.OPENAI_API_KEY) {
    console.error("Error: OPENAI_API_KEY not found in environment variables");
    process.exit(1);
  }
  
  if (!process.env.ALPHA_VANTAGE_API_KEY) {
    console.error("Error: ALPHA_VANTAGE_API_KEY not found in environment variables");
    process.exit(1);
  }
  
  if (!process.env.HELICONE_API_KEY) {
    console.error("Error: HELICONE_API_KEY not found in environment variables");
    process.exit(1);
  }
  
  // Initialize vector store
  const vectorStore = await initializeVectorStore();
  
  // Start new session
  startNewSession();
  
  // Create readline interface
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });
  
  console.log("\n===== Financial Research Assistant with Helicone =====");
  console.log("Ask about stock prices, news, or company information.");
  console.log("Type 'exit' to quit or 'new' to start a new session.");
  console.log("======================================================\n");
  
  // Start conversation loop
  const askQuestion = () => {
    rl.question("\nWhat would you like to know? ", async (query) => {
      if (query.toLowerCase() === "exit") {
        console.log("Thank you for using the Financial Research Assistant. Goodbye!");
        rl.close();
        return;
      }
      
      if (query.toLowerCase() === "new") {
        startNewSession();
        chatHistory = []; // Clear chat history for new session
        console.log("Started a new research session.");
        askQuestion();
        return;
      }
      
      try {
        console.log("\nResearching your question...");
        
        // Process query with the agentic approach
        const answer = await processQuery(query, vectorStore);
        
        // Display answer
        console.log("\nAnswer:", answer);
        
        // Update chat history (maintain a limited history)
        chatHistory.push({ role: "user", content: query });
        chatHistory.push({ role: "assistant", content: answer || "" });
        
        // Keep only the last 4 exchanges (8 messages)
        if (chatHistory.length > 8) {
          chatHistory.splice(0, 2);
        }
        
        // Continue conversation
        askQuestion();
      } catch (error) {
        console.error("Error processing query:", error);
        console.log("\nSorry, I encountered an error while researching your question. Please try again.");
        askQuestion();
      }
    });
  };
  
  // Start the conversation
  askQuestion();
}

// Run the assistant
main().catch(console.error);

Key Insights from Monitoring

Through this process, we learned several critical lessons:

  1. Similarity thresholds need empirical testing: Our initial 0.9 threshold was too conservative
  2. Tool selection requires explicit guidance: Even advanced models need clear instructions about when NOT to use tools
  3. Model capabilities matter: GPT-4o Mini's better reasoning significantly improved tool selection
  4. Hidden errors impact performance: Rate limiting and other issues are invisible without proper monitoring
  5. Session-based tracing is essential: Understanding the complete decision flow is crucial for debugging complex agents

Master AI Agent Observability with Helicone ⚡️

Stop building agents in the dark. Get complete visibility into every decision, optimize performance, and catch issues before they impact users.

Conclusion

Building reliable AI agents requires more than just getting the basic functionality working. Through comprehensive monitoring with Helicone Sessions, we:

  • Identified the root causes of retrieval failures and inappropriate tool usage
  • Fixed threshold issues and improved reasoning with better prompts and model upgrades
  • Discovered hidden performance issues like rate limiting that would have been invisible otherwise
  • Optimized our agent's decision-making process for better performance and lower costs

The difference between a working agent and a production-ready agent is observability. With the right monitoring tools, you can build agents that not only work correctly but continue to perform reliably as they scale.

You might also like:

Frequently Asked Questions

Why did changing the similarity threshold fix the retrieval issue?

Vector similarity scores can vary significantly based on the embedding model and text content. A threshold of 0.9 was too restrictive, filtering out relevant documents that had slightly lower similarity scores. Reducing to 0.7 allowed the system to retrieve contextually relevant information while maintaining quality.

How do I determine the right similarity threshold for my RAG system?

Start with empirical testing using your actual data. Test various thresholds (0.5, 0.6, 0.7, 0.8) and manually evaluate the quality of retrieved documents. Helicone's logging can easily reveal where RAG retrieval is failing, helping you know what needs to be improved. Consider implementing dynamic thresholds based on query type or confidence levels.

Why did upgrading from GPT-3.5 Turbo to GPT-4o Mini improve tool selection?

GPT-4o Mini has better reasoning capabilities for understanding context and following complex instructions about when NOT to use tools. It's more effective at distinguishing between queries that require external data versus those that can be answered from general knowledge, leading to more appropriate tool selection and reduced unnecessary API calls.

How can I prevent rate limiting issues in production?

Implement exponential backoff retry logic, add request queuing to smooth out traffic spikes, monitor usage patterns to identify peak times, and consider upgrading your API tier if you consistently hit limits. Helicone's monitoring can help you identify rate limiting patterns before they become critical issues.

What other metrics should I monitor for production AI agents?

Beyond what we covered, monitor: user satisfaction scores, end-to-end response times, cost per successful interaction, tool success rates, error patterns by user segment, and prompt effectiveness over time. Set up alerts in Helicone for sudden changes in these metrics to catch issues early.


Questions or feedback?

Are the information out of date? Please raise an issue or contact us, we'd love to hear from you!