Post

Why Every Native Provider Must Wire the Same Tool-Persistence Hook

Why Every Native Provider Must Wire the Same Tool-Persistence Hook — companion deep-dive for the NeuroLink blog with architectural detail and code examples.

Why Every Native Provider Must Wire the Same Tool-Persistence Hook

We built NeuroLink to give agents memory. A recent production incident showed us just how fragile that memory can be. An agent running on Juspay’s internal support desk, powered by a Google Vertex Gemini model, would correctly invoke a tool to look up a customer’s transaction status, receive the correct data, and then immediately forget it had ever happened. When the user asked a follow-up question, the model would apologize for not having the information it had literally just processed. The root cause wasn’t a bug in the model or the tool, but a missing hook in one of our own provider implementations, the GoogleVertexProvider.

The Anatomy of a Tool Call

In a perfect world, a tool call is a simple, stateful round trip. The process should feel atomic to the end-user, but internally it’s a multi-step sequence.

  1. The user sends a message.
  2. The model, seeing the user’s intent, responds not with text but with a request to call one or more tools. This model-generated response is a specific data structure, often a JSON object, that names the tool (e.g., getTransactionStatus) and specifies the arguments (e.g., { "transactionId": "TXN_123" }).
  3. NeuroLink’s runtime intercepts this request. Our analyzeAIResponse function is responsible for parsing the model’s output and identifying these structured tool-call requests.
  4. It executes the specified tools with the arguments the model provided. This involves looking up the tool function in a registry and invoking it, often within a sandbox for security.
  5. It receives the output from the tools. This output, which could be anything from a simple string to a complex JSON object, is the raw result of the function call. A failure here, like an exception thrown by the tool, must also be caught and handled.
  6. It packages that output into a new message. This isn’t a user message; it’s a special “tool result” message type that contains the tool_call_id and the serialized output from the tool. This format is rigid and required by the model APIs.
  7. It sends this new message back to the model so it can formulate a final, text-based response. This is part of a new request to the model, with the tool result appended to the conversation history.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// A simplified view of the messages appended to history
const toolRequestMessage = {
  role: 'assistant',
  content: null,
  tool_calls: [
    {
      id: 'call_abc123',
      type: 'function',
      function: {
        name: 'getTransactionStatus',
        arguments: '{"transactionId":"TXN_123"}',
      },
    },
  ],
};

const toolResponseMessage = {
  role: 'tool',
  tool_call_id: 'call_abc123',
  content: '{"status":"SUCCESS","authorizedBy":"finance-dept"}',
};

This flow is the foundation of any useful agent. But there’s a hidden, critical step. For an agent to have a coherent, multi-turn conversation that involves tools, it must remember the tool call sequence. The agent’s own history—the transcript of its internal monologue—must include the tool request and the tool result.

Without this record, the model has amnesia. It enters the next turn of conversation with no memory of the tool it just used, making follow-up questions impossible.

The BaseProvider Contract

Every provider in NeuroLink, from OpenAI to Anthropic to Google Vertex, inherits from our BaseProvider abstract class. This is not just for code reuse. It’s an architectural contract. As we detail in What You Actually Inherit When You Extend BaseProvider, this class provides a huge amount of shared infrastructure for logging, error handling, analytics, and lifecycle management. The BaseProvider handles common setup like creating analytics contexts via createAnalytics and validating options with validateOptions.

A key part of this contract is ensuring that critical side-effects, like tool execution, are recorded reliably. The BaseProvider defines a method specifically for this purpose: handleToolExecutionStorage. This method is designed to be called after the executeGeneration step completes but before the final result is returned to the user. It is a mandatory checkpoint in the request lifecycle. It works in tandem with other lifecycle hooks like logGenerationComplete and recordPerformanceMetrics to create a full picture of the generation event.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// A simplified view of the BaseProvider contract
export abstract class BaseProvider implements AIProvider {
  // ... many other methods

  /**
   * A protected method that all subclasses are expected to call
   * after a tool-using generation step completes. This ensures
   * the interaction is persisted for conversational memory.
   */
  protected async handleToolExecutionStorage(
    // ... arguments related to the tool call and result
  ): Promise<void> {
    // ... logic to persist the tool interaction
  }

  // ...
}

This method is the designated hook. It’s the single, unified entrypoint for persisting a tool interaction to the agent’s memory. Its presence in BaseProvider signals to the developer of any new provider: “If you handle a tool call, you MUST call this.” Failing to do so is a direct violation of the provider contract, with immediate and obvious consequences for the agent’s stateful behavior.

Where Memory Fails: A Missing Hook

The challenge is that high-performance providers often need to bypass the generic BaseProvider flow to implement native, streaming-first SDK integrations. Our GoogleVertexProvider has optimized paths for Gemini and Anthropic-on-Vertex models, like executeNativeGemini3Stream and executeNativeAnthropicStream. These methods talk directly to the underlying Google AI SDK for maximum performance and feature support, bypassing the more generic executeGeneration path. This allows us to leverage model-specific features that aren’t available in the common interface.

In the push for performance, it’s easy to forget the contract. The native SDKs return their own stream objects, which don’t automatically integrate with our wrapStreamWithLifecycleCallbacks function. This means the provider developer becomes responsible for manually invoking the necessary callbacks at the right time.

The bug that caused our agent’s amnesia was a missing call to handleToolExecutionStorage deep inside the executeNativeGemini3Stream implementation. The code correctly executed the tool and sent the result back to the model for the current turn, but it never called the hook to save that interaction to the agent’s long-term memory. The native stream was consumed, the text was sent to the user, and the crucial tool-call/tool-result pair was dropped on the floor.

The agent wasn’t forgetting; we were failing to tell it what to remember.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Simplified logic showing where the hook was missed
async function processNativeStream(nativeSDKStream: AsyncIterable<any>) {
  for await (const chunk of nativeSDKStream) {
    if (chunk.isToolCall) {
      // 1. Correctly identify and execute the tool
      const toolResult = await executeTool(chunk.toolCall);
      // 2. Correctly send the result back into the stream
      await sendToolResultToModel(toolResult);
    } else {
      // 3. Yield the text chunk to the end user
      yield chunk.text;
    }
  }
  // 4. The stream ends.
  // ... but the tool interaction was never persisted globally.
  // The call to handleToolExecutionStorage should have happened here!
}
graph TD
    subgraph Conversation Turn N
        A[User asks: "What's the status of TXN_123?"] --> B{Model decides to use getTransactionStatus tool};
        B --> C{NeuroLink executes tool};
        C --> D[Tool returns: "SUCCESS"];
    end

    subgraph "GoogleVertexProvider (Buggy Path)"
        D --> E{executeNativeGemini3Stream};
        E --> F["Model generates response: 'The status is SUCCESS.'"];
        E -.-> G((State Dropped));
    end

    subgraph "Any Correctly Implemented Provider"
        D --> H{executeStandardGenerateFlow};
        H --> I["Model generates response: 'The status is SUCCESS.'"];
        H --> J[handleToolExecutionStorage];
        J --> K((State Persisted));
    end

    subgraph Conversation Turn N+1
        F --> L[User asks: "Who authorized it?"];
        L --> M{Model has no memory of TXN_123};
        M --> N["Response: 'I'm sorry, which transaction are you referring to?'"];

        K --> O[User asks: "Who authorized it?"];
        O --> P{Model remembers TXN_123};
        P --> Q["Response: 'It was authorized by the finance department.'"];
    end

This diagram shows the divergence. The correctly implemented path calls the persistence hook, creating a durable memory of the event. The buggy native path skips it, leading directly to state loss and a broken user experience.

The Fix: Uniformity is Non-Negotiable

The fix itself was trivial: add the await this.handleToolExecutionStorage(...) call to the end of executeNativeGemini3Stream and executeNativeAnthropicGenerate before returning the final result.

This required using our extractToolInformation utility, a crucial adapter that knows how to parse the slightly different tool call formats from various provider responses—whether it’s an OpenAI-style function call or an Anthropic-style tool use block—and normalize them into the canonical ToolCall and ToolResult objects our system expects. This function ensures that the rest of our system, including handleToolExecutionStorage, can operate on a standardized data structure.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// A conceptual fix within a native provider implementation
private async executeNativeGemini3Stream(
  // ... options
): Promise<StreamResult> {
  // ... existing native SDK logic to stream from Gemini
  // ... tool execution happens here

  const finalResult = // ... result from the stream
  const toolInfo = this.extractToolInformation(finalResult);

  if (toolInfo.hasToolCalls) {
    // THE FIX: Ensure the tool interaction is saved.
    await this.handleToolExecutionStorage(
      toolInfo.toolCalls,
      toolInfo.toolResults
    );
  }

  return finalResult;
}

The architectural lesson is far more important. When you have Twenty-four providers, one BaseProvider: the adapter catalog, uniformity isn’t a “nice to have.” It is the system’s core defense against entropy. Any deviation in a critical path, especially one related to state management, creates an entire class of bugs that are incredibly difficult to diagnose. This commitment to a uniform contract is the only way to reliably getToolsForStream or buildMessagesForStream without provider-specific branching.

The user doesn’t report “the handleToolExecutionStorage hook was missed.” They report that the multi-million dollar AI investment has the memory of a goldfish.

Centralizing the Machinery with TelemetryHandler

The BaseProvider itself doesn’t contain the persistence logic. Following the principle of separation of concerns, the handleToolExecutionStorage method is a lightweight wrapper that delegates the actual work to a centralized module: the TelemetryHandler. This handler is instantiated once and passed into the provider’s constructor, ensuring a single point of control.

This design is crucial. It means we have one canonical implementation for tool persistence logic. The TelemetryHandler is responsible for writing the tool call and result pair to our durable storage layer, typically a database like Postgres or a distributed log store. It ensures the interaction is associated with the correct agent and conversation ID. Whether the call originates from a standard generate flow or a highly optimized executeNativeAnthropicStream path, the same underlying machinery is used. This is fundamental to the entire From User Input to Provider API: The Five-Stage Message Flow that defines NeuroLink’s operation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// src/lib/core/baseProvider.ts
protected async handleToolExecutionStorage(
  toolCalls: ToolCall[],
  toolResults: ToolResult[],
) {
  // ... prepare data
  await this.telemetryHandler.handleToolExecutionStorage({
    // ... pass context
  });
}

// src/lib/core/modules/TelemetryHandler.ts
export class TelemetryHandler {
  private dbClient: DatabaseClient;

  constructor(dbClient: DatabaseClient) {
    this.dbClient = dbClient;
  }

  async handleToolExecutionStorage(
    // ... args
  ): Promise<void> {
    // This is the single place where the logic for persisting
    // tool calls to the agent's memory lives.
    await this.dbClient.insert('tool_interactions', {
      conversationId: args.conversationId,
      toolCall: args.toolCall,
      toolResult: args.toolResult,
      timestamp: new Date(),
    });
  }
  // ...
}

By enforcing that all provider paths funnel through this single hook, we ensure that improvements, bug fixes, and new features related to agent memory are applied universally. We only have to get it right once, in one place. The provider’s only job is to remember to call the hook.

The Real Cost of Divergence

When a single provider implementation diverges from the core contract, it doesn’t just create a bug in that provider. It undermines the integrity of the entire platform. An application developer building on NeuroLink should not need to know if they are using Gemini, Claude, or GPT. They should certainly not have to worry that the agent’s fundamental ability to remember will change based on the selected model.

This incident served as a powerful reminder. The adapter pattern is only as strong as the discipline of its implementers. The divergence broke more than just conversation flow; it broke observability. Our debugging tools, which rely on a complete and accurate history of all messages, were showing an incomplete picture. Functions like calculateActualCost were underreporting costs because they never saw the tool-related messages, which often involve significant token counts.

1
2
3
4
5
6
7
8
9
10
11
12
// A hypothetical debugging function that would fail
async function getConversationHistory(convoId: string): Promise<Message[]> {
  // This reads from the database where TelemetryHandler writes.
  const messages = await db.query(
    'SELECT * FROM messages WHERE conversation_id = ? ORDER BY timestamp ASC',
    [convoId]
  );
  // If a provider misses the hook, the tool_calls and tool_results
  // messages are simply not in the database, leading to an
  // incomplete and misleading history.
  return messages;
}

For critical cross-cutting concerns like state persistence, logging, and security, there can be no exceptions. Every native path must pay the same tax and call the same hooks. It’s the only way to build a complex AI system that is predictable, observable, and ultimately, reliable.



Related posts:

This post is licensed under CC BY 4.0 by the author.