June → November 2025 : The Experiment Timeline
Month | Focus | Outcome | Key Metric |
|---|---|---|---|
June | Voice prototypes with synthetic data | ✓ Proof that complex finance could be spoken aloud | 15 test sessions |
July – Aug (early) | Guardrails + “financial clerk” model | ⚠️ Trust gap discovered | 70 % error capture |
Mid-Aug | System strain | ✗ Performance collapse | 3.2 s avg latency |
Late Aug – Sept | Architecture pivot | ⚙️ New foundation designed | 3 core layers |
Sept – Oct | Governance fabric build | ✓ Real-time provenance achieved | First L/C workflow validated |
Oct – Nov | Market validation | 🎯 Infrastructure thesis confirmed | Clear moat identified |
The Beginning : Giving Finance a Voice
TradeQu didn’t begin as infrastructure —it began as a conversation.
In June 2025 I built early voice-first prototypes to test whether trade-finance workflows could be spoken aloud. Using ElevenLabs for natural speech synthesis and custom prompt orchestration, the system simulated end-to-end funding interactions on synthetic data [29][30]:
“Show me eligible invoices for buyer.”
“Create a funding request for US$ 200,000.”
“Summarise my open invoices by buyer and currency.”
The goal was simple : could AI make complex finance feel human?
In short demos, it worked beautifully — until it didn’t.
Designing for Conversation and Control
As complexity grew, voice alone wasn’t enough. Users trusted what the system sounded like more than what it knew.
I introduced guardrails — confirmation prompts, inline compliance checks, and escalation when confidence dropped. Across 30+ sessions, these measures caught roughly 70% of potential errors —but the remaining 30% exposed a deeper risk : perceived certainty without proof.
By late July I'd built a financial clerk agent that could fetch data, render cards, and execute synthetic funding requests entirely by voice. It could process a complete workflow in under 90 seconds—but couldn't prove a single decision. For a moment, it felt like the future of interfaces. Then the architecture collapsed under its own ambition [33].
“The system could talk beautifully. We needed it to think correctly.”
Cracks Beneath the Surface
Example : A tester said, “Fund the latest invoice at spot rate.”
The system :
Identified invoice #4721 (≈ € 85 000)
Retrieved the live GBP – EUR rate
Applied it to the wrong buyer’s credit facility
Nearly approved a transaction 23 % above policy limits — a breach that would have triggered regulatory reporting requirements [36][37]
The guardrails caught it —but the near-miss revealed the core issue.
Voice had amplified both convenience and risk.
Industry research showed the same pattern : conversational polish raises user trust by ≈ 40 % without corresponding reliability gain — a miscalibration documented across multiple human–AI interaction studies [11][12][13][14]. In financial services specifically, this trust gap creates regulatory exposure [34][35].
“Voice taught us that trust needs structure.”
The Turning Point : From UX to Architecture
Through August I built fuzzy matching and normalisation to handle slang and shortened names. Each improvement added latency, fragility, and new failure modes.
It became clear: voice needs infrastructure beneath it—context, policy logic, and explainability.
So I started over.
Building the Governance Fabric
From September onwards TradeQu evolved into three interlocking layers :
Policy Store — turns legal and organisational rules into versioned, executable code [5][35].
Trade Graph — connects entities, instruments, obligations, and jurisdictions for reasoning, not reporting [6][23].
Governance Fabric — enforces compliance, records provenance, and provides end-to-end audit trails in real time [7][34].
Our first proof-of-concept — a Letter of Credit workflow — ran seamlessly : every decision logged, validated, and explainable.
Tier-1 banks have reached ≈ 99 % STP rates with OCR + RPA + ML stacks [20][21][22] —but can’t explain why any individual transaction was approved. Our approach adds policy-aware provenance by default, turning black-box automation into auditable reasoning.
“Compliance stopped being paperwork — it became computation.”
The Explainability Paradox
Voice did something unexpected: it made explainability feel solved before it actually was.
Users could ask "Why was this invoice flagged?" and receive instant answers. Compared to raising support tickets or hunting through documentation, this felt revolutionary. The conversational interface created an illusion of transparency.
But there was a critical gap: the system could answer questions—it just couldn't prove its answers were correct.
Example:
User: "Why was this buyer's limit reduced?"
Agent: "Based on their payment history and current exposure."
Reality: The system had pulled data from the wrong time period.
The answer sounded authoritative. It was delivered confidently, in natural language. But it wasn't grounded in versioned policy logic or linked to auditable evidence.
This revealed something deeper: conversational interfaces don't just need to explain—they need to explain correctly, with provenance. Voice had shown us what financial explainability should feel like. Now we needed to make it real. That realisation reframed voice from a product feature to a diagnostic lens — a way to surface how intelligence needs to justify itself.
Why This Matters
Financial institutions are entering the post-chatbot era. Conversational AI proved usability; it failed compliance.
Every major bank is now grappling with the same question: how do you make AI decisions defensible in court? [35][36]
TradeQu is building that proof layer—the infrastructure that lets AI earn trust.
Banks globally spend approximately $206 billion annually on financial crime compliance alone [15][16][17]—yet struggle to make AI-driven decisions defensible in audit or court.
The Broader Realisation
By October 2025, the insight was undeniable :
UI ≠ trust — Voice without provenance is performance.
Architecture = safety — Provenance, policy logic, and reasoning layers are non-negotiable.
Transparency = scale — Compliance must be embedded, not bolted on.
Between June and October 2025, OpenAI's models evolved rapidly—GPT-4 Turbo maintained 128K tokens, the o1-series expanded to 200K, and by October GPT-4o reached 1 million tokens [1][6][10]. Despite these advances in context, none solved the provenance problem. Hardware and models will keep evolving; governance remains the bottleneck [34][35].
Hardware and models will keep evolving; governance remains the bottleneck [34][35].
What Voice Experiments Taught Us
✓ Conversational interfaces reveal architectural gaps faster than dashboards — because users speak their intent directly, exposing where systems lack reasoning
✓ Trust scales with provenance, not politeness
✓ Compliance automation needs reasoning layers, not rule engines
✓ Finance is unforgiving—the perfect proving ground for trustworthy AI
What We’re Not Building
❌ A chatbot platform
❌ A compliance dashboard
❌ A workflow tool
We're building the reasoning and governance layer beneath all of them—Policy Store + Trade Graph + Governance Fabric—so any interface (voice, chat, UI) can be provable by design [26][27][28][35][36][37].
Looking Ahead (2026 →)
We’ll return to voice — but this time on top of the governance fabric, not ahead of it.
Planned pilots for 2026 will re-introduce conversational interfaces where :
Every command is grounded in the Trade Graph
Every response is validated through the Policy Store
Every dialogue line links to evidence and version history
Commercialisation timeline : Q2 2026 controlled pilots with Tier-2 trade-finance providers → Q4 2026 enterprise deployments.
The future of trade systems won’t look like dashboards — they’ll sound like trusted conversations, grounded in verifiable truth.
“We weren’t chasing automation. We were chasing intuition.”
References (Grouped by Theme)
AI & Context
[1] Qodo AI — Context window overview (128K)
[2] OpenAI Dev Day updates — attention scaling and limits
[6] YourGPT / TokenCalculator — o-series 200K models
[10] OpenAI — GPT-4.1 announcement (1M token context)
[33] arXiv 2402.10171 — Long-context system strain
Trust & Calibration
[11] Okamura et al., PLOS ONE (2020) — Adaptive trust calibration
[12] UMB CSE study — Conversational AI trust mechanisms
[13] ACM (2022) — Trust calibration in CAs
[14] ACM (2024) — Trust cues and over-reliance
Industry Benchmarks
[20] LinkedIn — HSBC & JPM L/C automation 99% STP
[21] Neurogaint — AI simplifying L/C processes
[22] LowCodeMinds — L/C automation case study
[23] DeepOpinion — L/C document intelligence
Compliance & Regulation
[5] FINRA Machine-Readable Rulebook Initiative (2024)
[7] ISO 42001 & NIST AI RMF (2023)
[34] BIS FSI Paper (2024) — Regulating AI in Finance
[35] BIS & FSI Guidance (2024) — Governance bottleneck
[36] Consultancy.eu — EU AI Act impact and fines
[37] EY / Goodwin — EU AI Act traceability requirements
Market Context
[15] LexisNexis Risk Solutions (2024) — True Cost of Financial Crime Compliance Study: $206B global annual spend
[16] LexisNexis (2024) — US/Canada compliance costs
[17] Oxford Economics — UK compliance costs
[18] OpenText / Gartner — Bank IT and compliance spend
[26] Atlan — Gartner governance category (2025)
[27] DataGalaxy — Gartner MQ summary (2025)
[28] Ataccama — Gartner MQ brief (2025)
Voice Technology
[29] BenePVC — ElevenLabs Conversational AI 2.0
[30] ElevenLabs Blog — Realtime financial applications
Authorship Declaration
Written by Sam Carter — TradeQu Labs.
Research and drafting assisted by ChatGPT (GPT-5), Perplexity Research, and Claude 3 Opus. All sources verified through human review. This article adheres to TradeQu’s principle of transparent AI-assisted research and publication.Transparent AI collaboration — authored and verified by TradeQu Labs.
We’re always looking for collaborators exploring how intelligence can become verifiable.
Let’s build the future of compliant AI together.
If your institution is exploring AI governance, policy-as-code, or explainable infrastructure, we’d like to collaborate.



