Services
Development Services
SEO Services
Automation & AI
Specialized Services
Industries
Claude RAG Systems
RAG pipelines that make Claude answer from your knowledge, not its training
Retrieval-Augmented Generation lets Claude answer questions from your specific knowledge base rather than generic training data. The architecture is well-understood. The execution. chunking strategy, embedding model selection, retrieval scoring, hybrid search configuration, and citation implementation. determines whether your RAG system produces trustworthy, grounded answers or plausible-sounding hallucinations. We built and run the seo_query_kb RAG system that powers our fine-tuned Qwen3.5-27B model, delivering 100% citation rate across all knowledge base queries.
What We Build
Six components of a production-grade Claude RAG system.
A RAG system that works in a demo and a RAG system that works reliably in production require different decisions at every layer.
Vector Database Architecture
Supabase pgvector is our primary recommendation for teams already on Postgres. It puts vector search inside your existing database, eliminating a separate vector store with its own auth, scaling, and operational overhead. We design the schema: the documents table, the embeddings column, the HNSW or IVFFlat index configuration, and the similarity search functions. For teams with specific requirements, we also build on Pinecone, Weaviate, or Chroma.
- Supabase pgvector schema design
- HNSW index configuration for performance
- Similarity search function optimization
- Multi-tenant document isolation
Document Processing Pipeline
How you chunk your documents determines retrieval quality more than any other factor. Chunks that are too large retrieve too much context. Chunks that are too small lose the semantic coherence that makes retrieval work. We design chunking strategies specific to your document type: markdown documents, PDFs, structured data, code files, and web content each have different optimal chunking patterns.
- Document-type-specific chunking strategies
- Semantic coherence preservation
- Metadata extraction and indexing
- Incremental update pipeline design
Hybrid Search Implementation
Pure vector search misses exact keyword matches. Pure keyword search misses semantic equivalents. Hybrid search combines both. a weighted combination of cosine similarity (vector) and BM25 or Postgres full-text search (keyword). The right weighting depends on your query patterns. We implement hybrid search, tune the weighting against your real queries, and measure the retrieval quality improvement over single-mode search.
- Cosine similarity + Postgres full-text combination
- Weight tuning against representative queries
- Retrieval quality measurement
- Query-time relevance scoring
Citation Integration
A RAG system without citations is a hallucination risk. When Claude cannot see what it is citing, it sometimes fills gaps with training data instead of retrieved content. We implement Claude's native citations feature alongside our RAG retrieval layer: Claude receives retrieved chunks with source metadata and is instructed to cite every factual claim. Our seo_query_kb system achieves 100% citation rate on all knowledge base queries using this pattern.
- Anthropic citations API integration
- Source metadata preservation through chunking
- Citation formatting for different output types
- 100% citation enforcement via system prompt
Access Control and Document Governance
Multi-tenant RAG systems need to ensure that queries from one user or team only retrieve documents they are authorized to access. We design the access control layer: document-level permission tags, query-time permission filtering, and audit logging for every retrieval operation. For regulated industries, we add data residency controls and retrieval audit trails.
- Document-level permission tagging
- Query-time permission filtering
- Retrieval audit logging
- Data residency compliance controls
Retrieval Quality Evaluation
Retrieval quality is measurable. For a given test question, did the system retrieve the chunk that actually contains the answer? We build retrieval eval frameworks: a ground-truth set of questions with known source documents, precision and recall measurement against that ground truth, and a monitoring pipeline that runs the eval as your document corpus evolves. You should know your retrieval quality number, not just trust that it works.
- Ground-truth retrieval test set design
- Precision and recall measurement
- Retrieval quality dashboarding
- Ongoing monitoring as corpus changes
How We Build RAG Systems
Four stages from document audit to citation-grounded production system.
Step 01
Knowledge base audit and architecture design
We review your document corpus: volume, document types, update frequency, and query patterns. This drives every design decision: chunking strategy, embedding model choice, index configuration, and whether hybrid search adds enough value for your query distribution to justify the additional complexity.
Step 02
Chunking and embedding pipeline build
We build the document processing pipeline: ingestion, chunking, embedding generation, metadata extraction, and index insertion. For document types you already have, we run the first full ingestion and validate chunk quality by reviewing the retrieved results on a sample of representative queries.
Step 03
Retrieval optimization and hybrid search tuning
We build the retrieval layer, implement hybrid search if the query distribution warrants it, and tune the weighting against your representative queries. Retrieval quality is measured against a ground-truth set before we connect Claude to the system.
Step 04
Citation integration and production deployment
We connect Claude to the retrieval layer, implement citations, and validate the end-to-end system on your actual queries. The deployment includes monitoring for retrieval quality, citation rate, and response latency. Documentation covers how to add new documents, modify the chunking strategy, and debug retrieval failures.
seo_query_kb in Production
How we built a RAG system that achieves 100% citation rate on 15 daily client workflows.
The Challenge
Our team needs to answer SEO questions that are grounded in specific, citable research: Ahrefs data studies, Moz methodology documentation, Google Search Central guidance, Search Engine Journal analysis. Generic Claude responses, while often correct, lack the citation accountability that professional SEO advice requires. We needed a system where every factual claim pointed to a specific source.
Our Solution
We built seo_query_kb: a RAG system with Supabase pgvector backing a corpus from Ahrefs, Backlinko, Moz, Search Engine Journal, and Google Search Central. The system uses hybrid search (vector similarity plus keyword matching) to retrieve the most relevant chunks, feeds them to our fine-tuned Qwen3.5-27B model running at 5-6 tok/s on an M4 Pro, and enforces citation via Anthropic's citations API plus a system prompt that blocks any uncited factual claim. The system is exposed via an MCP tool called seo_query_kb, accessible to Claude in every client session.
Results Achieved
Our RAG Stack
The exact stack powering seo_query_kb with 100% citation rate.
We do not recommend tools we have not used in production. This is the stack we run.
Data Layer
- Supabase pgvector for vector storage
- Postgres full-text for keyword search
- HNSW index for approximate nearest neighbor
- Document metadata in structured columns
Processing Layer
- Document chunking by semantic unit
- Embedding generation with text-embedding models
- Metadata extraction and preservation
- Incremental index update on new documents
Retrieval Layer
- Hybrid search: vector + keyword combination
- Query-time permission filtering
- Relevance re-ranking before context injection
- Context window budget management
Generation Layer
- Claude with retrieved chunks in context
- Anthropic citations API for source attribution
- System prompt enforcing citation requirement
- Confidence signaling on low-retrieval queries
FAQ
Claude RAG system frequently asked questions
Ready to Build Your Knowledge Base?
Let's design your RAG system.
Document audit, architecture design, retrieval pipeline build, and citation integration. Your knowledge base, accessible to Claude, with 100% grounded answers.
- Document corpus audit and chunking strategy included
- Hybrid search tuned to your query patterns
- Citation integration for grounded, auditable outputs