Retrieval at the orchestration layer

Most teams think of retrieval as a database. You write some documents, you embed them, you query them. That mental model is fine for prototypes. It breaks the moment retrieval has to support an actual workflow.

The mental model that breaks first

The mental model that breaks first is treating retrieval as a single index. Production retrieval is almost never a single index. It is a routing problem: a query needs to know which corpus to hit, which freshness to demand, which access policy to honour, and which post-processing to run before handing context back to the model.

If you build that routing logic inside your application code, it gets duplicated across every flow that needs it. Every team adds another knob. Within a few quarters, retrieval becomes the most fragile part of the stack.

Why retrieval belongs at the orchestration layer

The cleanest design we've found is to push retrieval down to the workflow runtime — not as a primitive the workflow calls, but as a typed contract the workflow declares. A step says: 'give me documents relevant to X, scoped to user role Y, no older than Z, with citations.' The runtime figures out which indexes to hit, which embedding model to use, which filters to apply.

The workflow declares intent. The runtime figures out the topology.

Once retrieval is declarative, three things get much easier. You can rebuild indexes without changing workflows. You can A/B test embedding models without coordinating six teams. You can enforce access policy at retrieval time rather than at read time, which is the only place where it actually works for AI.

Hybrid is not optional

Pure vector retrieval still loses to hybrid (lexical + vector) retrieval on most enterprise corpora. The reason is that enterprise text is full of tokens — order numbers, customer codes, internal acronyms — that no embedding model handles well. BM25-style scoring catches those. Vector scoring catches the semantic neighbours. A fused score, normalised carefully, is what production needs.

Managed cloud retrieval services that ship hybrid scoring out of the box are a significant productivity unlock — you can run hybrid against custom analysers, semantic ranking and filters in a single query. That collapses three layers of glue code we'd otherwise have to write and maintain. Pick a substrate that doesn't force you to reimplement what the platform already does well.

What we got wrong twice

We initially exposed the index name to workflows. Bad: rebuilds became breaking changes.
We tried to do PII redaction at write time only. Bad: read-time policy still required filtering by role.
We picked a single chunk size for the entire corpus. Bad: contracts and runbooks want very different windows.

Each of those decisions felt right in isolation and became technical debt within a quarter. The general lesson: the choices that look like implementation details at week one are exactly the choices that bind you at month six. Treat retrieval as a system early.

Published Apr 22, 2026 by Giampaolo Marzetti.

All posts Book a demo →

Retrieval at the orchestration layer

The mental model that breaks first

Why retrieval belongs at the orchestration layer

Hybrid is not optional

What we got wrong twice

More from the field.

Operational AI vs assistants — and why the difference matters

Deploying managed LLM inference in regulated environments