Staging layer before indexing
Blob staging makes ingestion predictable and auditable, and unlocks re-indexing without re-scraping.
CASE STUDYENTERPRISE · KNOWLEDGE
We built a RAG knowledge base on Azure that pulls content from SharePoint and Confluence into a governed index and delivers a citation-first chat. The result: answers that link back to sources, with faster time-to-answer and shorter onboarding.

Role
Scope
Scale
Services
Tags
Knowledge was scattered across SharePoint, Confluence, wikis, and project spaces. Finding the right answer meant repeated requests to the same people or long searches with no way to verify currency or accuracy.
Trust was the harder problem: an AI that answers without provenance creates risk. The system had to link back to sources, respect access boundaries, and stay maintainable as documentation changed — and teams had to actually use it.
Ingestion from SharePoint and Confluence into Azure Blob Storage staging so re-indexing is possible without re-scraping sources.
Indexers and skillsets chunk content (with overlap), build vector and keyword/semantic indexes, and run incremental updates on schedule.
Retrieval combining vector similarity with semantic and keyword signals for robust recall on meaning and exact-match queries.
Answers always include source links so users can verify and drill down to the original document.
A lightweight evaluation set plus checks (including citation presence) to prevent silent regressions as indexing and prompts change.
User feedback signals feeding retrieval tuning and content hygiene, improving quality over time.
We designed the system around one principle: every answer must cite sources.
We considered vector-only search; we went with hybrid (vector + keyword/semantic) so acronyms and exact IDs are retrievable. We evaluated real-time indexing on every doc change; we chose scheduled incremental runs to control cost and keep indexing predictable.
Content is ingested from SharePoint and Confluence into Blob staging. Azure AI Search chunks content, generates embeddings, and builds the indexes. The chat app runs hybrid retrieval and uses Azure AI Foundry to generate a grounded answer strictly from retrieved context — with citations in the output.
Every answer must cite sources.
Blob staging makes ingestion predictable and auditable, and unlocks re-indexing without re-scraping.
Vector-only misses exact terms; keyword-only misses meaning. We combined both signals.
We optimized for precision and provenance, not impressive free-form answers.
Retrieval quality is a measurable system, not a one-time setup.
A knowledge base teams trust because it shows its work: faster time-to-answer for recurring questions, shorter onboarding, fewer repeated requests to senior experts, and higher confidence through transparent source linking.
Answers without sources do not work in serious environments. Citations are the quality contract.
Vector-only misses exact terms. Keyword-only misses meaning. Combining both improves recall and precision.
Blob staging makes ingestion auditable and re-indexing predictable without guesswork.
Retrieval is a system. Evaluation and gates prevent silent regressions as data and prompts change.
We support expansion into agentic workflows and deeper governance: continuous indexing, permission-aware retrieval, runbook execution paths, and integration with ticketing and change management — while keeping citations and traceability as the baseline.