Architecture¶
RavenRustRAG is a Cargo workspace with 10 crates organized in a layered dependency hierarchy.
Crate Overview¶
raven-core (no internal deps)
↑
raven-embed (depends on raven-core)
raven-split (depends on raven-core)
raven-load (depends on raven-core)
raven-store (depends on raven-core)
↑
raven-search (depends on core, embed, store, split)
↑
raven-server (depends on core, embed, store, split, search)
raven-mcp (depends on core, search, split)
↑
raven-cli (depends on all)
↑
ravenrustrag (top-level facade, re-exports public API)
Crate Responsibilities¶
raven-core¶
Foundation types shared by all crates:
Document— input document with content and metadataChunk— text segment with embedding vector and metadataSearchResult— ranked result with score and source infoConfig— TOML-based configurationRavenError— unified error type usingthiserrorfingerprint()— content hashing for incremental indexingcosine_similarity()— optimized vector comparison
raven-embed¶
Embedding and generation abstraction layer:
Embeddertrait — async interface for text-to-vector conversionGeneratortrait — async interface for LLM text generation (with streaming)OllamaBackend— local embedding via Ollama HTTP APIOpenAIBackend— OpenAI-compatible embedding API (also used by vLLM, LiteLLM)OllamaGenerator— text generation via Ollama/api/generateOpenAIGenerator— text generation via OpenAI Chat Completions (vLLM, LiteLLM, OpenAI, TGI, LocalAI)HttpEmbedder— generic HTTP endpoint for custom backendsOnnxEmbedder— local ONNX Runtime inference (behindonnxfeature)DummyEmbedder— deterministic fake embeddings for testingEmbeddingCache— lock-free DashMap cache with LRU-style evictionCachedEmbedder— transparent caching wrapper for any Embedder
raven-split¶
Text chunking strategies:
Splittertrait — async interface for text splittingTextSplitter— character-based with configurable overlapSentenceSplitter— respects sentence boundariesTokenSplitter— approximate token-based splittingSemanticSplitter— groups sentences by embedding similarity
raven-load¶
Document ingestion:
Loader— detects format by extension and loads documents- Format handlers: text, markdown (with frontmatter), CSV, JSON, JSONL, HTML, PDF, DOCX
LoaderRegistry— plugin system for custom format handlersexport_jsonl()/import_jsonl()— backup and restore- Directory traversal with extension filtering
raven-store¶
Vector storage backends:
VectorStoretrait — async CRUD + similarity searchSqliteStore— persistent storage with WAL, mmap, cosine similarityMemoryStore— in-memory store for testingHnswIndex— approximate nearest neighbor indexMetadataFilter— key-value filtering on search results- Fingerprint tracking for incremental indexing
raven-search¶
Pipeline orchestrator:
DocumentIndex— builder-pattern coordinator (split → embed → store → search)BM25Index— keyword search with TF-IDF scoringKnowledgeGraph/GraphRetriever— entity extraction and graph traversalMultiQueryExpander— query expansion via keyword extractionRerankertrait +KeywordReranker+OnnxReranker— result re-scoringEvalMetrics— MRR, NDCG, Precision@k, Recall@k- Watch mode with debounce for live re-indexing
raven-server¶
HTTP API:
- Axum-based server with Tower middleware
- Bearer token authentication
- CORS configuration
- Rate limiting
- OpenAPI 3.0.3 schema generation
- Prometheus metrics endpoint
- SSE streaming for
/askendpoint (source/token/done events) - WebSocket support for interactive queries
- Read-only mode (
--read-only) - Structured JSON responses
raven-mcp¶
AI assistant integration:
- MCP (Model Context Protocol) server over stdio
- JSON-RPC 2.0 message handling (protocol version 2024-11-05)
- Tools: search, get_prompt, collection_info, index_documents
- Resources: raven://index/stats
- Prompts: rag_answer, summarize_index
--filterflag to restrict exposed tools- Input validation and error reporting
ravenrustrag¶
Top-level library crate:
- Stable public API re-exports from all internal crates
- Single dependency for downstream consumers
raven-cli¶
User-facing binary:
- Clap-derived command parser
- 20 commands: index, query, ask, prompt, serve, watch, graph build/query, info, status, clear, export, import, backup, mcp, doctor, benchmark, init, diff, completions
- Structured and JSON output modes
- Config file loading
Design Principles¶
- Thread-safe by default: All public types are
Send + Sync - No unwrap in library code: Proper error propagation with
? - Batch operations: Embedding and storage work in batches for throughput
- Zero-copy where possible: mmap for SQLite,
&strparameters - Builder pattern: Complex types use builders (DocumentIndex)
- Trait-based abstraction: All major components are behind traits for testability
- Lock-free concurrency: DashMap + atomics where possible, Mutex only when necessary
Data Flow¶
Indexing Pipeline¶
Files → Loader → Documents → Splitter → Chunks → Embedder → Vectors
↓
VectorStore (SQLite)
↓
BM25 terms stored
Query Pipeline¶
Query → Embedder → Query Vector → VectorStore.search() → Results
↓
(optional) → BM25.search() → RRF fusion → Reranker → Final Results