codefunded logo iconcodefunded logo
What we doFundsetWorkTeamContact
Talk to us
codefunded logo iconcodefunded logo

CONTACT

+48 514 731 513
+48 578 626 161
contact@codefunded.com
  • LinkedIn
  • GitHub

MENU

  • What we do
  • Fundset
  • Work
  • Team
  • Contact
  • Expertise
  • Work with us

SERVICES

  • Build
  • Scale
  • Advise
  • Fractional CTO
© 2026 · codefunded services sp. z o.o.—Privacy policy—
← back to Work

CASE STUDY·ENTERPRISE · KNOWLEDGE

RAG knowledge base with citations

We built a RAG knowledge base on Azure that pulls content from SharePoint and Confluence into a governed index and delivers a citation-first chat. The result: answers that link back to sources, with faster time-to-answer and shorter onboarding.

Role

End-to-end GenAI knowledge base delivery partner

Scope

Ingestion ETL · Blob staging · Azure AI Search (hybrid) · Chat app · Evaluation gates · Azure AI Foundry

Scale

Citation-first answers · Hybrid retrieval · Maintainable indexing pipeline · Governed source linking

Services

GenAI · RAG · Azure · Knowledge base

Tags

  • AI
  • RAG
  • Software Development

The challenge

Knowledge was scattered across SharePoint, Confluence, wikis, and project spaces. Finding the right answer meant repeated requests to the same people or long searches with no way to verify currency or accuracy.

Trust was the harder problem: an AI that answers without provenance creates risk. The system had to link back to sources, respect access boundaries, and stay maintainable as documentation changed — and teams had to actually use it.

What we delivered

  • Document ingestion ETL with staging

    Ingestion from SharePoint and Confluence into Azure Blob Storage staging so re-indexing is possible without re-scraping sources.

  • Indexing with Azure AI Search

    Indexers and skillsets chunk content (with overlap), build vector and keyword/semantic indexes, and run incremental updates on schedule.

  • Hybrid retrieval by default

    Retrieval combining vector similarity with semantic and keyword signals for robust recall on meaning and exact-match queries.

  • Chat experience with citations

    Answers always include source links so users can verify and drill down to the original document.

  • Evaluation and quality gates

    A lightweight evaluation set plus checks (including citation presence) to prevent silent regressions as indexing and prompts change.

  • Feedback loop

    User feedback signals feeding retrieval tuning and content hygiene, improving quality over time.

How we built it

We designed the system around one principle: every answer must cite sources.

We considered vector-only search; we went with hybrid (vector + keyword/semantic) so acronyms and exact IDs are retrievable. We evaluated real-time indexing on every doc change; we chose scheduled incremental runs to control cost and keep indexing predictable.

Content is ingested from SharePoint and Confluence into Blob staging. Azure AI Search chunks content, generates embeddings, and builds the indexes. The chat app runs hybrid retrieval and uses Azure AI Foundry to generate a grounded answer strictly from retrieved context — with citations in the output.

Every answer must cite sources.

Key decisions

Staging layer before indexing

Blob staging makes ingestion predictable and auditable, and unlocks re-indexing without re-scraping.

Hybrid search is non-negotiable

Vector-only misses exact terms; keyword-only misses meaning. We combined both signals.

Grounding over creativity

We optimized for precision and provenance, not impressive free-form answers.

Evaluate continuously

Retrieval quality is a measurable system, not a one-time setup.

Outcomes

A knowledge base teams trust because it shows its work: faster time-to-answer for recurring questions, shorter onboarding, fewer repeated requests to senior experts, and higher confidence through transparent source linking.

Citation-first
answers
Hybrid
retrieval
Governed
indexing
Incremental
updates

What we took away

Citations create trust

Answers without sources do not work in serious environments. Citations are the quality contract.

Hybrid search wins

Vector-only misses exact terms. Keyword-only misses meaning. Combining both improves recall and precision.

Staging unlocks control

Blob staging makes ingestion auditable and re-indexing predictable without guesswork.

Quality must be measured

Retrieval is a system. Evaluation and gates prevent silent regressions as data and prompts change.

What's next

We support expansion into agentic workflows and deeper governance: continuous indexing, permission-aware retrieval, runbook execution paths, and integration with ticketing and change management — while keeping citations and traceability as the baseline.

Bring us the hard part

A first version you need shipped, a second phase you've outgrown, or a decision your team can't agree on — write a paragraph and we'll come back inside a day with whether it's a shape we take on.