How RAG Works
RAG stands for Retrieval-Augmented Generation. It's how Hubrix lets AI answer questions about your documents without needing to read the entire document every time.

The four steps
Step 1: Chunking
When you upload a document, Hubrix splits it into small segments called chunks. Each chunk is roughly 500 tokens — about 350–400 words. Chunking happens once when the document is first processed.
Why chunks instead of the whole document? Because AI models have a limited context window. A 100-page PDF is far too large to send to an AI in a single request. Chunks make documents queryable.
Step 2: Embedding
Each chunk is converted into a vector — a list of numbers that mathematically represents the meaning of that text. Two chunks about similar topics will have vectors that are mathematically close to each other; two chunks about unrelated topics will have vectors that are far apart.
This conversion is done by a text embedding model (not the same model that generates answers). The vectors are stored in a vector database alongside the original chunk text.
Step 3: Retrieval
When you ask a question (in chat, via an agent, or in a workflow), Hubrix converts your question into a vector using the same embedding model. It then searches the vector database for the chunks whose vectors are closest to your question's vector.
The top K closest chunks are retrieved — typically 3–10, depending on the context. These chunks are the most semantically relevant passages from your documents.
Step 4: Generation
The retrieved chunks are included in the prompt sent to the AI model, alongside your original question. The AI reads the chunks and uses them as context to generate a precise, grounded answer.
Because the answer is based on your specific document chunks, it is more accurate and relevant than if the AI were guessing from training data alone.
RAG is why AI answers can reference specific sections of your documents — the retrieved chunks are the source of that precision. If the AI seems to be ignoring your documents, the most likely cause is that the document hasn't finished indexing, or the RAG tool isn't enabled for that agent.
Why not just send the whole document?
You might wonder: why not just paste the entire document into the prompt? A few reasons:
- Context limits — most models have a context window of 4,000–200,000 tokens. A large document can exceed this.
- Cost — larger prompts consume more tokens and therefore more credits.
- Quality — retrieving the 5 most relevant chunks often produces better answers than sending 500 pages of text, because the model doesn't have to search through noise.
When RAG doesn't work well
- Scanned PDFs with no OCR — if the document contains no extractable text, there's nothing to embed or retrieve.
- Very vague queries — a query like "tell me something interesting" won't match specific chunks accurately. Be specific.
- Cross-document relationships — RAG retrieves individual chunks. If answering a question requires synthesising information across dozens of separate chunks, a single RAG query may not capture everything.
If you update a document (by re-uploading a newer version), you need to re-index it so the new content's chunks and embeddings replace the old ones. Use the Reindex button on the document detail page.
Was this helpful?