Architecture¶

RavenRustRAG is a Cargo workspace with 10 crates organized in a layered dependency hierarchy.

Crate Overview¶

raven-core  (no internal deps)
  ↑
raven-embed (depends on raven-core)
raven-split (depends on raven-core)
raven-load  (depends on raven-core)
raven-store (depends on raven-core)
  ↑
raven-search (depends on core, embed, store, split)
  ↑
raven-server (depends on core, embed, store, split, search)
raven-mcp    (depends on core, search, split)
  ↑
raven-cli    (depends on all)
  ↑
ravenrustrag (top-level facade, re-exports public API)

Crate Responsibilities¶

raven-core¶

Foundation types shared by all crates:

Document — input document with content and metadata
Chunk — text segment with embedding vector and metadata
SearchResult — ranked result with score and source info
Config — TOML-based configuration
RavenError — unified error type using thiserror
fingerprint() — content hashing for incremental indexing
cosine_similarity() — optimized vector comparison

raven-embed¶

Embedding and generation abstraction layer:

Embedder trait — async interface for text-to-vector conversion
Generator trait — async interface for LLM text generation (with streaming)
OllamaBackend — local embedding via Ollama HTTP API
OpenAIBackend — OpenAI-compatible embedding API (also used by vLLM, LiteLLM)
OllamaGenerator — text generation via Ollama /api/generate
OpenAIGenerator — text generation via OpenAI Chat Completions (vLLM, LiteLLM, OpenAI, TGI, LocalAI)
HttpEmbedder — generic HTTP endpoint for custom backends
OnnxEmbedder — local ONNX Runtime inference (behind onnx feature)
DummyEmbedder — deterministic fake embeddings for testing
EmbeddingCache — lock-free DashMap cache with LRU-style eviction
CachedEmbedder — transparent caching wrapper for any Embedder

raven-split¶

Text chunking strategies:

Splitter trait — async interface for text splitting
TextSplitter — character-based with configurable overlap
SentenceSplitter — respects sentence boundaries
TokenSplitter — approximate token-based splitting
SemanticSplitter — groups sentences by embedding similarity

raven-load¶

Document ingestion:

Loader — detects format by extension and loads documents
Format handlers: text, markdown (with frontmatter), CSV, JSON, JSONL, HTML, PDF, DOCX
LoaderRegistry — plugin system for custom format handlers
export_jsonl() / import_jsonl() — backup and restore
Directory traversal with extension filtering

raven-store¶

Vector storage backends:

VectorStore trait — async CRUD + similarity search
SqliteStore — persistent storage with WAL, mmap, cosine similarity
MemoryStore — in-memory store for testing
HnswIndex — approximate nearest neighbor index
MetadataFilter — key-value filtering on search results
Fingerprint tracking for incremental indexing

raven-search¶

Pipeline orchestrator:

DocumentIndex — builder-pattern coordinator (split → embed → store → search)
BM25Index — keyword search with TF-IDF scoring
KnowledgeGraph / GraphRetriever — entity extraction and graph traversal
MultiQueryExpander — query expansion via keyword extraction
Reranker trait + KeywordReranker + OnnxReranker — result re-scoring
EvalMetrics — MRR, NDCG, Precision@k, Recall@k
Watch mode with debounce for live re-indexing

raven-server¶

HTTP API:

Axum-based server with Tower middleware
Bearer token authentication
CORS configuration
Rate limiting
OpenAPI 3.0.3 schema generation
Prometheus metrics endpoint
SSE streaming for /ask endpoint (source/token/done events)
WebSocket support for interactive queries
Read-only mode (--read-only)
Structured JSON responses

raven-mcp¶

AI assistant integration:

MCP (Model Context Protocol) server over stdio
JSON-RPC 2.0 message handling (protocol version 2024-11-05)
Tools: search, get_prompt, collection_info, index_documents
Resources: raven://index/stats
Prompts: rag_answer, summarize_index
--filter flag to restrict exposed tools
Input validation and error reporting

ravenrustrag¶

Top-level library crate:

Stable public API re-exports from all internal crates
Single dependency for downstream consumers

raven-cli¶

User-facing binary:

Clap-derived command parser
20 commands: index, query, ask, prompt, serve, watch, graph build/query, info, status, clear, export, import, backup, mcp, doctor, benchmark, init, diff, completions
Structured and JSON output modes
Config file loading

Design Principles¶

Thread-safe by default: All public types are Send + Sync
No unwrap in library code: Proper error propagation with ?
Batch operations: Embedding and storage work in batches for throughput
Zero-copy where possible: mmap for SQLite, &str parameters
Builder pattern: Complex types use builders (DocumentIndex)
Trait-based abstraction: All major components are behind traits for testability
Lock-free concurrency: DashMap + atomics where possible, Mutex only when necessary

Data Flow¶

Indexing Pipeline¶

Files → Loader → Documents → Splitter → Chunks → Embedder → Vectors
                                                                  ↓
                                                            VectorStore (SQLite)
                                                                  ↓
                                                            BM25 terms stored

Query Pipeline¶

Query → Embedder → Query Vector → VectorStore.search() → Results
                                                              ↓
                  (optional) → BM25.search() → RRF fusion → Reranker → Final Results

Knowledge Graph¶

Chunks → Entity Extraction → Graph Nodes + Edges → JSON file
                                                        ↓
Query → Entity Extraction → Graph Traversal → Related Chunks → Fusion with vector results