Insights

How AI agents actually work: A practical guide for Treasury & Finance

By Félix Grévy

SVP Platform, Data & AI

At Kyriba, we deliver innovative, secure solutions to ensure customers thrive. With a proven record of practical, effective AI‑powered capabilities, we’ve consistently applied cutting‑edge technology to address the evolving needs of our customers. Recognizing the transformative potential of Large Language Models (LLMs), we introduced our agentic AI, TAI. A common question we hear is: how does it actually work?

To help demystify AI agents for treasury and finance, we’ve developed this guide to explain how they function and why understanding their capabilities—and limitations—matters for your organization. We walk through what an agent can do and how the right approach maintains the trust, compliance, and control that treasury and finance operations demand.

What makes AI agents different: Unlike chatbots that simply respond with pre-trained knowledge, agents can reason through complex problems, call tools to gather real-time data, and recommend or perform actions.

Why treasury operations are perfect for AI agents: Treasury operations hold the richest, most fragmented financial data in the enterprise, and treasury decisions are time-sensitive, data-intensive, and governed by clear policies. Whether managing global liquidity, analyzing FX exposure, or optimizing cash positions, agents excel at aggregating multi-system data instantly and proposing actions with proper approvals and audit trails, translating to basis-point yield improvements and hours returned to treasury teams.

The Kyriba approach: Our agentic AI, TAI, is embedded within our platform, respecting role-based permissions, leaving complete audit trails, and ensuring every action follows the security and compliance frameworks you've already established. It's designed as a policy-aware teammate that explains its reasoning and proves what it did.

What TAI delivers from day one: TAI allows treasurers to improve forecast accuracy with earlier visibility into pattern shifts; optimize cash positioning to capture yield on balances that would otherwise sit idle; and accelerate exception resolution—cutting investigation time from hours to minutes.

TAI's Trust Foundation

Privately hosted on Kyriba's secure infrastructure
No customer data used for training
Role-based access control enforced
Complete audit trails for every action

Permissions, privacy, and control (non-negotiables)

Trust, security, and control are at the core of Kyriba’s AI agent. In finance, where decisions are high-stakes and data is sensitive, these principles are non-negotiable:

Data privacy: No customer data is used in training public models. Ever.
Role-based access: The agent can only see and do what you are authorized to see and do in Kyriba.
API scopes & least privilege: Each tool is strictly scoped (e.g., read payments vs. create payments). Sensitive actions always require explicit approval.
Data residency & logging: All calls are fully auditable. The agent passes only structured, relevant snippets to the LLM (not full datasets). Reasoning steps are transparently traced and visible to the user in the “Thinking Steps,” including which tools were called. Machine-generated audit trails mapped to policies and logs of segregation-of-duties significantly shorten audit preparation time and close cycles.
Human-in-the-loop: The agent recommends; you decide. Approvals can be enforced at any action level (e.g., payments, transfers, FX), ensuring full control.
Reversibility: Agent actions are reviewable and reversible within policy, ensuring an additional safety layer for treasury operations.

Download the full Guide

How AI agents work for treasury

An agent interprets the user’s request and then engages in a chain of thought, a reasoning process where it breaks down the problem into steps. Based on this reasoning, the agent identifies which tools or data sources might help, calls them, observes the results, and adapts its plan if needed. This loop continues until the agent can provide a complete, reliable answer to the user.

Large Language Model (LLM)

The key component of an agent is the Large Language Model (LLM), the “brain” that performs reasoning and planning.

LLMs were popularized by OpenAI with the release of ChatGPT in November 2022, but the underlying breakthrough came earlier, with Google researchers’ 2017 paper “Attention Is All You Need.” This work introduced the transformer architecture, which made it possible to train models much more effectively.

The idea is revolutionary: instead of hard-coding rules, the model is trained on vast amounts of text, such as content from millions of web pages and books, so that it learns statistical patterns of language and can generate coherent, well-formed sentences. Modern LLMs achieve this scale through billions of parameters, the adjustable weights inside the neural network. Smaller models may have around 7–8 billion parameters, while the largest frontier models can exceed 100 billion parameters. More parameters generally allow the model to capture more complex patterns, though they also require vastly greater computing power.

Frontier LLMs are powerful; Kyriba’s variant runs in a private, governed environment so no customer data trains any public models.

To improve usefulness and safety, an additional process called Reinforcement Learning with Human Feedback (RLHF) is applied. In this step, humans review outputs and guide the model on how to answer questions in ways that are more accurate, relevant, and aligned with user expectations.

The initial training of an LLM, compressing vast amounts of knowledge into billions of parameters, requires enormous computational power, typically using large clusters of GPUs or specialized AI accelerators. Additionally, the Reinforcement Learning with Human Feedback (RLHF) process adds weeks or months of fine-tuning, as humans guide the model to produce more useful and aligned answers.

Overall, it can take several months to produce a single model, which explains why new generations (e.g., from GPT-3.5 to GPT-4) are released on a multi-month cycle rather than continuously.

In treasury, where operations are rule-bound and high-stakes, agents don't just "chat"—they execute policy-aware workflows that close the last-mile gap from answers to actions.

Tokens

LLMs process language using tokens, small units of text that can be as short as a single character or as long as a word, but on average about four characters in English. Instead of reading sentences directly, the model breaks everything down into these tokens.

The model’s core task is to predict the next token in a sequence, based on all the tokens it has already seen. It does this using the transformer architecture: layers of attention mechanisms and neural networks that apply advanced mathematics and statistical patterns learned during training.

This process, called inference, is what happens each time you interact with the model. Unlike the months-long training process, inference is much faster and requires far less computation, which makes real-time use possible.

Prompt

Every interaction with an LLM begins with a prompt, the input text that frames the task. A prompt can be as simple as a direct question (“Explain corporate treasury”), which the model can answer from its trained knowledge.

However, there is a limit to how much text an LLM can consider at once. This limit is called the context window, and it is measured in tokens. Depending on the model, context windows today range from around 32,000 tokens (roughly 20–25 pages of text) to more than 200,000 tokens (an entire book). This ebook, for example, has roughly 2800 tokens. Prompts must not exceed the context window size.

With TAI, the agent sends the LLM only the structured, relevant snippets needed to answer each query—never full datasets.

In the initial version of ChatGPT, the main innovation was a chat interface that allowed users to have natural conversations with the LLM. The model responded using patterns and knowledge it had learned during training, but that knowledge was static and limited to the information available up to its training cut-off date.

This limitation meant that ChatGPT could answer general questions, explain concepts, or generate text in many styles, but it could not access the latest news, facts, or real-time data, since it wasn’t connected to the internet or external systems.

Smart context window management lets Kyriba’s agentic AI, TAI, compare multi-bank positions, forecasts, and payment queues in a single conversation—tasks that are hugely time consuming when done ad hoc.

Download the full Guide

Tools

The next breakthrough came with the introduction of tools. Instead of relying only on its static training, the LLM could now decide to call an external tool, for example, a search engine. Based on the user’s request, the model could trigger a search query, process the results, combine them with the ongoing conversation, and then produce a final answer.

For instance, if you ask “What’s the weather tomorrow?”, the LLM must call a tool to fetch real-time weather data, since that information isn’t part of its training. Once the result is returned, the LLM integrates it into the conversation and provides the final answer.

This capability removed one of the biggest limitations of early LLMs: it allowed them to access up-to-date information and respond accurately to questions about events or facts that happened after the training cut-off date.

Tools are the way for an LLM to observe a system and perform actions.

They extend the model beyond language prediction: instead of only generating text, the LLM can decide to call a tool to look up data, run a calculation, query a database, or trigger an external process.

In treasury, tools must follow least-privilege design: each tool is scoped precisely—"read balances" does not grant "create payments." Sensitive tools are segregated and always approval-gated.

Reasoning loop

A typical agent works through a reasoning loop: Plan → Act (call tools) → Observe → Refine → Answer. The agent repeats this loop until it has gathered enough information to respond—or, in some cases, to request approval before taking an action.

In treasury operations, this loop applies domain-specific intelligence. For example:

Plan: Identify trapped cash and yield opportunities.
Act: Call tools to retrieve balances and forecasts.
Observe: Screen urgent payments for approval.
Refine: Respect minimum balances and cutoff times.
Answer: Propose transfers and FX hedges with policy citations.

The effectiveness of this loop also depends on the context window of the LLM, which defines how much of the conversation and tool outputs it can keep in memory at once. A larger context window allows the agent to consider longer conversations, detailed reports, or multiple tool calls in sequence, making its reasoning more accurate and reliable.

Figure: The agentic loop

Download the complete PDF to explore TAI’s architecture, real treasury applications, and more.

Download the full Guide

Written By

Félix Grévy

SVP Platform, Data & AI

Félix Grévy is SVP of Platform, Data & AI at Kyriba, where he leads innovation across platform engineering, data, AI, and advanced analytics. With more than 20 years of experience in financial technology spanning product development, product management, and commercial management, Félix joined Kyriba in 2020 to lead API and connectivity strategy. He has since spearheaded Kyriba's agentic AI initiatives, including the Trusted AI (TAI) portfolio, which embeds governed intelligence directly into treasury and finance workflows by integrating LLMs and predictive analytics, without "black boxes" or training external models on customer data.