RAG Assistant Case Study

At a glance

The problem

35+ feature modules in a multi-tab spreadsheet, queryable only by opening the file and scanning rows manually.

What I built

A natural-language Q&A tool that answers questions about product features, roles, and behaviours from the team's own use case documentation.

My role

Solo. Architecture, document parsers, retrieval logic, prompt engineering, and deployment.

Stack

R · Shiny · ragnar · DuckDB · BM25 · Anthropic Claude Sonnet API · shinyapps.io

The problem

The documentation was good. Finding things in it didn't scale.

ECD Connect's use case documentation covers every feature the platform supports: who can do what, under which role, online or offline, and how the points engine rewards it. That documentation lives in a single Excel file with four structurally different tab types — use case specs, an offline/online matrix, a points table, and a notifications schedule.

Answering a question like "which features work offline for practitioners?" meant opening the file, navigating to the right tab, and scanning rows. For the PM, QA lead, and user support staff, this was a recurring cost across every sprint — not large on any individual query, but constant.

Architecture

Four format-aware parsers feeding a BM25 index.

The main design decision was how to parse the source file. The four tab types use meaningfully different structures, so chunking by character count would have thrown away that structure and degraded retrieval. Instead I wrote one parser per tab type, each preserving the tab's structure and prepending contextual metadata to every chunk: feature set, functional area, online/offline availability, role context, and an example question the chunk can answer. The example question significantly improves BM25 recall by giving the index something to match against user phrasing.

Frontend

R Shiny app on shinyapps.io. Custom UI with markdown rendering, conversation history, and example questions in the sidebar.

Retrieval

BM25 full-text search over a ragnar/DuckDB index built in-memory from the parsed source at app start. Default: top 5 chunks. Points, notifications, and offline queries automatically retrieve top 8 — those domains span multiple feature sets and need more context.

Generation

Claude Sonnet via the Anthropic Messages API. Each request passes the retrieved chunks as context plus the previous Q&A exchange, enabling follow-up questions without rebuilding a full chat history.

Retrieval routing

Chunk count is bumped from 5 to 8 when a query mentions points, notifications, or offline behaviour — detected via keyword matching before the BM25 call. A simple heuristic that meaningfully improves results on the most common query types.

Prompt engineering

The system prompt instructs the model to answer only from the provided context, name the specific use case or feature set it's drawing from, address all relevant use cases when a question spans more than one, and say clearly when the context is insufficient. The priority was a tool the team could trust — which meant being explicit about the boundary between what the documentation says and what the model might otherwise infer.

        Sample interaction: "Which feature set covers attendance and how can users track attendance for children?" → The model returns a grounded answer citing Feature Set W6, with role-based business rules (principals vs practitioners with and without permission) drawn directly from the indexed documentation.
      

Status

Live since March 2026.

Used by the product team (PM, QA lead, user support) since March 2026. Questions about feature availability, role permissions, points calculations, and notification triggers are answered in seconds rather than requiring a manual spreadsheet lookup.

Usage logging is in progress — query volume and a retrieval quality signal — to identify which query types return weak results and whether changes to the parsers or index would help.

Limitations & next steps

BM25 is a strong baseline for this use case, but has known gaps:

Synonym and paraphrase mismatches

BM25 matches on exact terms. A query about "marking attendance" may miss chunks that use "taking a register." A lightweight embedding layer as semantic fallback would improve coverage on the long tail of queries.

No retrieval quality feedback loop yet

Usage logging is in progress. Until it's live, iteration on chunking and routing relies on ad hoc testing rather than signal from real queries.

Source file coupling

The index rebuilds from the Excel source at app start. If the source structure changes significantly, the parsers need updating. A more robust version would decouple ingestion from the app startup cycle entirely.

No source citations in the UI

The model names the feature set it's drawing from, but the interface doesn't surface the raw retrieved chunks. A "show source" toggle would let users verify answers against the documentation directly.

Building a RAG assistant for product documentation