When RAG is not enough

Pablo Romeo

Co-Founder & CTO

12/22/25

5 min

For years, Retrieval-Augmented Generation (RAG) has been the go-to approach for building AI solutions involving documents. Think of scenarios like searching through thousands of legal contracts or technical manuals to answer specific questions. But as models evolve and real-world use cases expose RAG’s limitations, new techniques are emerging that are more aligned with how humans reason. Some of these are far better suited to the complex environments where enterprises operate.

Let’s walk through why this shift is happening, where RAG falls short, and how progressive context and tool discovery are redefining what’s possible in AI.

When RAG Fails: a real-world example

We recently worked with a multinational organization managing a global HR policy portal, where employees and managers need to navigate complex, jurisdiction-specific employment regulations. Imagine a scenario where a regional HR manager wants to know: “what is the maximum allowable annual leave for an employee based in Singapore, and are there any special carryover rules for employees with more than five years of service?”

They came to us with an AI product that failed to deliver, and they didn’t understand why. Their initial solution was classic RAG: index every manual and every procedure, paragraph by paragraph, and let the model retrieve the relevant chunks. This approach wasn’t right for their specific needs.

RAG is effective at surfacing paragraphs that mention “annual leave” or “Singapore”, but the answer to our example question isn’t found in a single paragraph. The policy might reference “Asia-Pacific” as a region, with sub-policies for each country, and further distinctions based on employee tenure or role. The answer is the result of traversing a hierarchical decision tree: first by region, then by country, then by employee level, and finally by years of service. RAG, by design, can’t follow this path. It might return leave policies for the wrong country or miss the carryover rules entirely, because it doesn’t respect the underlying structure of the knowledge.

The right approach would have been to mimic how a human expert navigates these documents:

1.Start at the top level (e.g. region or country)
2.Drill down through sub-items (e.g. employee level, years of service)
3.Read the relevant section in full

Or, even better, build a dedicated tool that queries the policy database by jurisdiction and employee attributes, returning the precise answer. Sometimes, the simplest solution is the best: create targeted tools for common queries and let the agent use them as needed.

This example illustrates a key lesson: not every document interaction problem calls for RAG. Careful analysis of the knowledge structure and user needs is vital. Otherwise, organizations risk wasting time and resources on the wrong solution.

The evolving limits of context

Every Large Language Model (LLM) has a limit to how much information it can “see” at once, called the “context window”. While these windows have grown dramatically over time (from 8,000 tokens to over 1M in the latest models), they’re still not infinite.

RAG’s original appeal was efficiency: inject only the most relevant snippets into the context window, saving tokens and money. But this comes at a price.

You’re forced to set arbitrary limits (e.g., top 10-15 results), risking that the right answer is just out of reach.
LLMs tend to focus on the beginning and end of the context window, often ignoring the middle chunks (a phenomenon known as “lost in the middle”).
RAG is fundamentally “one-hop”: it retrieves static facts, not dynamic, multi-step reasoning.

With today’s expanded context windows, the equation has changed. Instead of risking accuracy by injecting small, possibly irrelevant fragments, it’s often better to pay a bit more to include entire documents or larger amounts of data. This enables the LLM to reason holistically, which improves reliability.

When RAG works, and when It doesn’t

There are still many ideal use cases for RAG, but its overuse has led many to treat it as a universal hammer. In reality, it’s just one tool in a much larger toolbox.

To illustrate with a real-life case where RAG shines: at CloudX, we use this technique in our own CloudBot (an internal chatbot embedded into Slack that assists our entire company, helping with tasks, scheduling actions, answering questions, and much more). In CloudBot, RAG helps us quickly surface information about past projects from a large, relatively static source. For example, a sales representative might need to know: “which fintech projects used Java?”. For this use case, RAG is efficient and effective: the data doesn’t change often, and we don’t need to reindex constantly. It’s also a great candidate for semantic searching. However, even in this case, the way we do it is by performing similarity searches against the vectorized technical description of the projects, but actually provide the full technical descriptions for generating answers, not chunks.

For more complex or multidimensional tasks, RAG falls short. For instance, modern code assistants rarely use RAG. Instead, they open files and traverse project structures like a human developer would: observing the project’s filesystem, navigating entry points, following class names, and reading relevant modules in sequence. This is “multi-hop reasoning”: connecting facts across multiple locations, following a logical path. This approach might consume more tokens, but it’s the most cost-efficient solution long-term because it performs much better compared to RAG.

RAG, by contrast, is a “one-hop” solution. It can answer “what does this function do?” but struggles with “how does this variable propagate through the system?” And even the variant of doing Agentic RAG for multiple searches, still misses the mark for these types of use cases. For tasks such as writing code, agents need tools that let them explore, plan, read full content, and reason step by step.

Progressive context enrichment and tool discovery

Today, an approach some call “progressive discovery” or “progressive disclosure” is quickly gaining traction among AI engineers.

Progressive context enrichment means that the agent doesn’t load all potentially relevant data at once, but fetches information only as needed for each step. For example, when writing a database query, the agent first calls a tool to get just the relevant table and column names, rather than loading the entire schema upfront.

You commonly implement it by providing the agent with hints, summarized content, and a mechanism for requesting the full context as needed. One clear example of this is Anthropic’s Skills implementation. The agent always sees the list of Skills available with short descriptions, and can expand the full Skill details when needed.

But the same concept can be applied to many things, such as an index of nested titles that could reference full document content on demand. It’s a very simple, yet extremely effective approach.

Progressive tool discovery addresses the tool overload bottleneck. If an agent is connected to dozens of APIs, calculators, and databases, loading every tool’s full technical definition would quickly exhaust the context window. Instead, the agent receives a lightweight manifest; for example, just the tool names and one-sentence descriptions. When it needs a specific capability (e.g.: “send email”), it reasons about which tool to use, and only then loads the full technical schema for execution.

At CloudX, we’ve implemented progressive discovery in our Talk to Database accelerator, which autonomously explores and understands the schema of connected databases, fetching only what’s needed when it’s needed. Besides navigating the schemas, we also use it for the Knowledge-base and its Skills functionality.

The bottom line: RAG is just another tool, not a universal solution

RAG solves a real problem, but not all of them. As context windows expand and enterprise use cases grow more complex, new approaches are emerging to equip agents with the same capabilities we humans have: reasoning, planning, using tools, and much more.

In AI, as in software engineering in general, no single technique is suitable for every challenge. The most effective solutions come from understanding the structure of the problem and the real needs of the end users. In a scenario where most AI initiatives fail, success depends on our ability to really understand the problem, and provide an adequate solution by combining the right tools and methods for each unique scenario.

When RAG is not enough

When RAG Fails: a real-world example

The evolving limits of context

When RAG works, and when It doesn’t

Progressive context enrichment and tool discovery

The bottom line: RAG is just another tool, not a universal solution

Related Content

Rise with the tide: building AI solutions that evolve, not expire

Code assistants: AI’s role in modernizing software craftsmanship

Talk to Database