Skip to content

Architecture

RavenRustRAG is a Cargo workspace with 10 crates organized in a layered dependency hierarchy.

Crate Overview

raven-core  (no internal deps)
raven-embed (depends on raven-core)
raven-split (depends on raven-core)
raven-load  (depends on raven-core)
raven-store (depends on raven-core)
raven-search (depends on core, embed, store, split)
raven-server (depends on core, embed, store, split, search)
raven-mcp    (depends on core, search, split)
raven-cli    (depends on all)
ravenrustrag (top-level facade, re-exports public API)

Crate Responsibilities

raven-core

Foundation types shared by all crates:

  • Document — input document with content and metadata
  • Chunk — text segment with embedding vector and metadata
  • SearchResult — ranked result with score and source info
  • Config — TOML-based configuration
  • RavenError — unified error type using thiserror
  • fingerprint() — content hashing for incremental indexing
  • cosine_similarity() — optimized vector comparison

raven-embed

Embedding and generation abstraction layer:

  • Embedder trait — async interface for text-to-vector conversion
  • Generator trait — async interface for LLM text generation (with streaming)
  • OllamaBackend — local embedding via Ollama HTTP API
  • OpenAIBackend — OpenAI-compatible embedding API (also used by vLLM, LiteLLM)
  • OllamaGenerator — text generation via Ollama /api/generate
  • OpenAIGenerator — text generation via OpenAI Chat Completions (vLLM, LiteLLM, OpenAI, TGI, LocalAI)
  • HttpEmbedder — generic HTTP endpoint for custom backends
  • OnnxEmbedder — local ONNX Runtime inference (behind onnx feature)
  • DummyEmbedder — deterministic fake embeddings for testing
  • EmbeddingCache — lock-free DashMap cache with LRU-style eviction
  • CachedEmbedder — transparent caching wrapper for any Embedder

raven-split

Text chunking strategies:

  • Splitter trait — async interface for text splitting
  • TextSplitter — character-based with configurable overlap
  • SentenceSplitter — respects sentence boundaries
  • TokenSplitter — approximate token-based splitting
  • SemanticSplitter — groups sentences by embedding similarity

raven-load

Document ingestion:

  • Loader — detects format by extension and loads documents
  • Format handlers: text, markdown (with frontmatter), CSV, JSON, JSONL, HTML, PDF, DOCX
  • LoaderRegistry — plugin system for custom format handlers
  • export_jsonl() / import_jsonl() — backup and restore
  • Directory traversal with extension filtering

raven-store

Vector storage backends:

  • VectorStore trait — async CRUD + similarity search
  • SqliteStore — persistent storage with WAL, mmap, cosine similarity
  • MemoryStore — in-memory store for testing
  • HnswIndex — approximate nearest neighbor index
  • MetadataFilter — key-value filtering on search results
  • Fingerprint tracking for incremental indexing

Pipeline orchestrator:

  • DocumentIndex — builder-pattern coordinator (split → embed → store → search)
  • BM25Index — keyword search with TF-IDF scoring
  • KnowledgeGraph / GraphRetriever — entity extraction and graph traversal
  • MultiQueryExpander — query expansion via keyword extraction
  • Reranker trait + KeywordReranker + OnnxReranker — result re-scoring
  • EvalMetrics — MRR, NDCG, Precision@k, Recall@k
  • Watch mode with debounce for live re-indexing

raven-server

HTTP API:

  • Axum-based server with Tower middleware
  • Bearer token authentication
  • CORS configuration
  • Rate limiting
  • OpenAPI 3.0.3 schema generation
  • Prometheus metrics endpoint
  • SSE streaming for /ask endpoint (source/token/done events)
  • WebSocket support for interactive queries
  • Read-only mode (--read-only)
  • Structured JSON responses

raven-mcp

AI assistant integration:

  • MCP (Model Context Protocol) server over stdio
  • JSON-RPC 2.0 message handling (protocol version 2024-11-05)
  • Tools: search, get_prompt, collection_info, index_documents
  • Resources: raven://index/stats
  • Prompts: rag_answer, summarize_index
  • --filter flag to restrict exposed tools
  • Input validation and error reporting

ravenrustrag

Top-level library crate:

  • Stable public API re-exports from all internal crates
  • Single dependency for downstream consumers

raven-cli

User-facing binary:

  • Clap-derived command parser
  • 20 commands: index, query, ask, prompt, serve, watch, graph build/query, info, status, clear, export, import, backup, mcp, doctor, benchmark, init, diff, completions
  • Structured and JSON output modes
  • Config file loading

Design Principles

  1. Thread-safe by default: All public types are Send + Sync
  2. No unwrap in library code: Proper error propagation with ?
  3. Batch operations: Embedding and storage work in batches for throughput
  4. Zero-copy where possible: mmap for SQLite, &str parameters
  5. Builder pattern: Complex types use builders (DocumentIndex)
  6. Trait-based abstraction: All major components are behind traits for testability
  7. Lock-free concurrency: DashMap + atomics where possible, Mutex only when necessary

Data Flow

Indexing Pipeline

Files → Loader → Documents → Splitter → Chunks → Embedder → Vectors
                                                            VectorStore (SQLite)
                                                            BM25 terms stored

Query Pipeline

Query → Embedder → Query Vector → VectorStore.search() → Results
                  (optional) → BM25.search() → RRF fusion → Reranker → Final Results

Knowledge Graph

Chunks → Entity Extraction → Graph Nodes + Edges → JSON file
Query → Entity Extraction → Graph Traversal → Related Chunks → Fusion with vector results