ekoDB White Paper
A Multi-Model Database System
By Sean M. Vazquez, Creator of ekoDB
Executive Summary
ekoDB is an in-memory database with disk persistence for unstructured data. Built with Rust, it combines in-memory performance with durability guarantees through Write-Ahead Logging (WAL).
Key Capabilities:
- Multi-Model Architecture: Key-value, document, vector search, and real-time messaging in a single system
- High Performance: Optimized for high-throughput workloads with configurable durability
- Adaptive Scaling: Automatically tunes from IoT devices to enterprise servers
- Secure by Default: HTTPS/WSS-only, AES-GCM encryption at rest, TLS/SSL in transit
- Memory Efficient: Low memory footprint with adaptive allocation
- Distributed: Ripple system for horizontal scaling and cross-database propagation
1. Introduction
1.1 The Problem
Applications often require multiple database systems to handle different workloads:
- MongoDB for document storage
- Redis for caching and real-time data
- Elasticsearch for full-text search
- Pinecone for vector search
- PostgreSQL for relational data
This multi-database approach introduces complexity, operational overhead, and integration challenges.
1.2 The Solution
ekoDB combines features from multiple database types into a single system. It provides document storage, key-value operations, full-text search, and vector search in one database.
1.3 History
ekoDB originated from a practical challenge: integrating multiple databases required complex layers of abstraction to achieve feature parity and consistent usage patterns.
Development Timeline:
- 2013: Initial development as SOLO (Single Object Language Operator), an API gateway for multi-database integration
- 2013-2022: SOLO operated as an API gateway connecting various database systems
- 2022: Decision to eliminate the abstraction layer and build a unified database
- 2022-2025: Complete from-scratch rewrite as ekoDB
- Current: v0.20.0 (October 2025) - Active development
2. Architecture
2.1 Storage Architecture
ekoDB uses a hybrid in-memory architecture with disk persistence:
┌─────────────────────────────────────────┐
│ In-Memory Layer (Hot Data) │
│ ┌────────────────────────────────────┐ │
│ │ Concurrent Hash Maps │ │
│ │ - O(1) lookups │ │
│ │ - Lock-free reads │ │
│ │ - Reader-writer locks │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
↕
┌─────────────────────────────────────────┐
│ Multi-Tier LRU Cache (Warm Data) │
│ ┌────────────────────────────────────┐ │
│ │ - Record Cache │ │
│ │ - Query Cache │ │
│ │ - Search Cache │ │
│ │ - KV Cache │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
↕
┌─────────────────────────────────────────┐
│ Disk Persistence (Cold Data) │
│ ┌────────────────────────────────────┐ │
│ │ Write-Ahead Log (WAL) │ │
│ │ - Encrypted (AES-GCM) │ │
│ │ - Compressed (ZSTD/LZ4) │ │
│ │ - Dual-mode (Fast/Durable) │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
Key Features:
- In-Memory Primary Storage: All active data stored in memory for fast access
- Automatic Eviction: LRU-based eviction when memory limits are reached
- Transparent Reload: Cold data automatically loaded from disk when accessed
- Larger-than-Memory Support: Datasets can exceed available RAM
2.2 Data Model
ekoDB supports multiple data models in a unified system:
Document Model
{
"name": "John Doe",
"email": "john@example.com",
"age": 30,
"tags": ["developer", "rust"],
"address": {
"city": "San Francisco",
"country": "USA"
}
}
Key-Value Model
{
"session:user123": {
"token": "abc123",
"expires": "2025-10-15T00:00:00Z"
}
}
Vector Model
{
"content": "ekoDB is a high-performance database",
"embedding": [0.1, 0.2, 0.3, ...], // 384-dimensional vector
"metadata": {
"source": "documentation"
}
}
2.3 Type System
ekoDB provides a flexible type system with optional type enforcement:
Supported Types:
String- UTF-8 textInteger- 64-bit signed integersFloat- 64-bit floating pointDecimal- High-precision decimalsBoolean- true/falseNull- Null/empty valueArray- Ordered listsSet- Unordered unique valuesMap- Key-value pairsVector- Fixed-dimension float arrays (for embeddings)Object- Nested documentsBinary- Raw binary dataDate- ISO 8601 date stringsDateTime- ISO 8601 date-time strings with timezone
Response Formats:
- Typed: Includes type metadata (e.g.,
{"type": "String", "value": "text"}) - Non-Typed: Traditional NoSQL format (e.g.,
"text")
2.4 Configuration Options
ekoDB provides flexible configuration options to tune behavior for different use cases:
Response Format Configuration
- Typed Responses: Include type metadata for strong typing
- Non-Typed Responses: Traditional NoSQL format for simplicity
- Per-Request Override: Configure via query parameters or headers
Durability Configuration
- Fast WAL Mode: High throughput, buffered writes
- Durable WAL Mode: Guaranteed persistence, immediate fsync
- Per-Operation Override: Choose durability level per write operation
- Collection-Level Settings: Configure default durability per collection
Example Configuration
// Configure client for typed responses
const client = new EkoDBClient({
baseURL: "https://your-instance.ekodb.io",
apiKey: "your-api-key",
useTypedValues: true // Enable typed responses
});
// Override durability per operation
await client.insert("critical_data", record, {
durableWrite: true // Force durable WAL for this write
});
await client.insert("logs", logEntry, {
durableWrite: false // Use fast WAL for high throughput
});
Configuration Benefits:
- Flexibility: Choose the right trade-offs per use case
- Performance: Optimize for throughput or latency
- Type Safety: Enable strong typing when needed
- Durability: Balance speed vs. guaranteed persistence
3. Performance
3.1 Write Performance
ekoDB offers two WAL modes optimized for different use cases:
| Mode | Throughput | Durability | Use Case |
|---|---|---|---|
| Fast WAL | High throughput | Buffered writes | High-throughput ingestion |
| Durable WAL | Moderate throughput | Immediate fsync | Critical data |
Dual-Node Strategy: Deploy a primary node with Fast WAL and a secondary node with Durable WAL (via Ripple replication) to achieve both high performance and durability.
3.2 Read Performance
- Indexed Lookups: O(1) hash index, O(log n) B-tree index
- Point Queries: Sub-millisecond latency
- Range Queries: Logarithmic time complexity
- Full-Text Search: O(k) where k is number of matching terms
- Vector Search: Optimized flat index with early termination
3.3 Memory Efficiency
- Base Memory: Low memory footprint optimized for efficiency
- Compression: Significant space savings with ZSTD (configurable)
- Adaptive Allocation: 1%-80% of available RAM
- LRU Eviction: Automatic memory management
4. Indexing
4.1 Index Types
ekoDB implements multiple index types optimized for different query patterns:
Hash Indexes (Default)
- Complexity: O(1) for equality queries
- Use Case:
WHERE field = value - Automatic: Created for frequently queried fields
B-Tree Indexes
- Complexity: O(log n) for range queries
- Use Case:
WHERE field > value, sorting - Features: Supports
<,>,<=,>=,BETWEEN
Inverted Indexes
- Complexity: O(k) where k = matching terms
- Use Case: Full-text search
- Features: Stemming, fuzzy matching, tokenization
Vector Indexes
- Current: Flat index with optimizations
- Planned: HNSW (Hierarchical Navigable Small World)
- Use Case: Semantic similarity search
- Metrics: Cosine similarity, Euclidean distance, dot product
4.2 Index Management
- Automatic Creation: Indexes created based on query patterns
- Automatic Maintenance: Updated on insert/update/delete
- Concurrent Access: Thread-safe operations
- Memory Efficient: Weak references and LRU eviction
5. Concurrency & Isolation
5.1 Concurrency Control
ekoDB uses Two-Phase Locking (2PL) with reader-writer locks:
- Collection-Level Granularity: Locks per collection
- Multiple Readers: Concurrent reads allowed
- Single Writer: One writer per collection at a time
- Lock-Free Cross-Collection: Operations on different collections don't block each other
5.2 Isolation Levels
Within a Single Collection: Serializable
- Reader-writer locks ensure serializable execution
- No dirty reads, non-repeatable reads, or phantom reads
Across Multiple Collections: Read Uncommitted (effectively)
- No coordination between collections
- Applications must handle cross-collection consistency
- Multi-collection transactions planned for future release
6. Durability & Recovery
6.1 Write-Ahead Logging (WAL)
ekoDB implements a dual-mode WAL system:
Fast WAL Mode:
- Buffered writes with periodic fsync
- High throughput
- Suitable for high-throughput ingestion
Durable WAL Mode:
- Immediate fsync after every write
- Moderate throughput
- Guaranteed persistence
WAL Management:
- Automatic rotation at 50MB-1GB (adaptive)
- Manual rotation available
- Log compaction removes redundant entries
- Automatic cleanup after replication
6.2 Recovery Process
- Scan WAL Files: Identify all WAL files in order
- Validate: Check integrity and checksums
- Replay Entries: Apply operations in order
- Rebuild Indexes: Reconstruct all indexes
- Resume Operations: Database ready for queries
Recovery Time: Depends on WAL size, typically seconds to minutes
7. Search Capabilities
7.1 Full-Text Search
- Inverted Index: Maps terms to documents
- Tokenization: Automatic text processing
- Stemming: Language-aware word normalization
- Fuzzy Matching: Typo tolerance (Levenshtein distance)
- Field Weighting: Prioritize specific fields
- Minimum Score: Filter by relevance threshold
7.2 Vector Search
- Semantic Similarity: Find similar documents by meaning
- Embedding Support: 384, 768, 1536 dimensions (configurable)
- Distance Metrics: Cosine similarity, Euclidean, dot product
- Metadata Filtering: Combine vector search with filters
- Top-K Results: Efficient heap-based selection
7.3 Hybrid Search
Combine text and vector search:
- Text search for keyword matching
- Vector search for semantic similarity
- Unified scoring and ranking
- Combined text and semantic matching
8. Distributed Architecture
8.1 Ripple System
ekoDB's distributed architecture uses the Ripple system for horizontal scaling:
┌─────────────────────────────────────────────────┐
│ Regional Cluster │
│ ┌─────────────────────────────────────────┐ │
│ │ Managed Instance Groups (MIGs) │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Node 1 │ │ Node 2 │ ... │ │
│ │ │ (Primary)│ │(Secondary)│ │ │
│ │ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
↕ Ripple
┌─────────────────────────────────────────────────┐
│ Multi-Tenant Single Nodes │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Tenant A │ │ Tenant B │ │ Tenant C │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘
Features:
- Cross-Database Propagation: Replicate data across ekoDB instances
- Horizontal Scaling: Add nodes for increased capacity
- Regional Distribution: Deploy across geographic regions
- Automatic Failover: Secondary nodes take over on failure
8.2 Replication
- Asynchronous Replication: Non-blocking writes
- Configurable Targets: Replicate to multiple destinations
- Selective Replication: Choose which collections to replicate
- Conflict Resolution: Last-write-wins strategy
9. Security
9.1 Network Security
HTTPS/WSS Only: Unlike traditional databases that use direct TCP connections, ekoDB exclusively uses HTTPS and WSS (Secure WebSocket) protocols.
Benefits:
- No direct TCP exposure
- Built-in encryption (TLS/SSL)
- Standard web protocols
- Firewall friendly (port 443)
- Certificate-based security
- Protection against man-in-the-middle attacks
9.2 Encryption
At Rest:
- AES-GCM encryption for all disk-persisted data
- Encrypted WAL files
- Encrypted indexes
- Configurable encryption keys
In Transit:
- TLS/SSL for all network communication
- Certificate validation
- Perfect forward secrecy
9.3 Authentication
- JWT Tokens: Industry-standard authentication
- API Keys: Simple authentication for services
- Role-Based Access: Planned for future release
- Token Expiration: Configurable TTL
10. Use Cases
10.1 AI Agents & Workflows
- Vector Search: Store and query embeddings
- Chat History: Built-in chat session management
- Context Management: Efficient retrieval of relevant context
- Real-Time Updates: WebSocket for live agent interactions
10.2 Real-Time Analytics
- High Throughput: 1M-2M records/sec ingestion
- In-Memory Processing: Sub-millisecond queries
- Time-Series Data: Efficient storage and retrieval
- Aggregations: Fast analytical queries
10.3 IoT Data Processing
- Adaptive Scaling: Runs on resource-constrained devices
- Edge Computing: Deploy close to data sources
- Efficient Storage: Compression reduces disk usage
- Batch Operations: Handle bursts of sensor data
10.4 Session Management
- TTL Support: Automatic session expiration
- Fast Lookups: O(1) key-value operations
- High Concurrency: Handle thousands of sessions
- Persistence: Optional durability for sessions
10.5 Content Delivery
- Distributed Caching: Ripple for multi-region deployment
- Fast Reads: In-memory performance
- Compression: Reduce bandwidth usage
- Real-Time Updates: WebSocket for live content
11. Roadmap
11.1 Near-Term (2025)
- Multi-Collection Transactions: ACID transactions across collections
- MVCC: Multi-Version Concurrency Control
- HNSW Vector Index: Improved vector search performance
- Stored Procedures: Multi-language support (JavaScript, Python, Rust)
11.2 Mid-Term (2026)
- GPU Acceleration: Hardware acceleration for vector operations
- Columnar Storage: Optional DSM for analytical workloads
- Analytics Functions: Aggregation operations
- Role-Based Access Control: Fine-grained permissions
11.3 Long-Term (2027+)
- Distributed Transactions: Cross-node ACID transactions
- SQL Interface: Optional SQL query support
- Time-Series Optimization: Specialized time-series features
- Machine Learning Integration: Built-in ML model serving
12. Comparison
12.1 vs. MongoDB
| Feature | ekoDB | MongoDB |
|---|---|---|
| Performance | High throughput | Moderate throughput |
| Memory | Low footprint | Higher footprint |
| Vector Search | Native | Atlas Search (separate) |
| Full-Text Search | Native | Atlas Search (separate) |
| Real-Time | WebSocket native | Change Streams |
| Encryption | Default | Enterprise only |
12.2 vs. Redis
| Feature | ekoDB | Redis |
|---|---|---|
| Data Model | Multi-model | Key-value |
| Persistence | WAL (configurable) | RDB/AOF |
| Search | Full-text + Vector | RediSearch module |
| Durability | Configurable | Trade-off with performance |
| Queries | Complex filters | Limited |
| Documents | Native JSON | RedisJSON module |
12.3 vs. Elasticsearch
| Feature | ekoDB | Elasticsearch |
|---|---|---|
| Primary Use | Multi-model database | Search engine |
| Write Performance | High throughput | Moderate throughput |
| Memory | Low footprint | High footprint |
| Vector Search | Native | Dense vector field |
| CRUD Operations | Optimized | Secondary feature |
| Real-Time | WebSocket | Polling |
13. Conclusion
ekoDB is a multi-model database that combines features from document stores, key-value databases, and search engines. Built with Rust, it provides in-memory performance with configurable durability.
Key Features:
- Multi-Model: Document, key-value, vector, and search operations
- In-Memory Architecture: Primary storage in memory with disk persistence
- Adaptive Scaling: Configurable resource allocation
- HTTPS/WSS Communication: Standard web protocols for client connections
- Client Libraries: Rust, Python, TypeScript, Go, Kotlin, and JavaScript support
- Encryption: AES-GCM at rest, TLS/SSL in transit
Target Audience:
- Startups building AI-powered applications
- Enterprises consolidating database infrastructure
- IoT deployments requiring edge computing
- Real-time analytics platforms
- Content delivery networks
- Session management systems
14. Getting Started
14.1 Quick Start
# Install client library
npm install @ekodb/ekodb-client
# Connect to ekoDB
import { EkoDBClient } from "@ekodb/ekodb-client";
const client = new EkoDBClient({
baseURL: "https://your-subdomain.production.aws.ekodb.io",
apiKey: "your-api-key"
});
await client.init();
// Insert a document
await client.insert("users", {
name: "John Doe",
email: "john@example.com"
});
// Query documents using ekoDB query builder
const query = {
filter: {
type: "Condition",
content: {
field: "age",
operator: "Gt",
value: 25
}
}
};
const users = await client.find("users", query);
14.2 Resources
- Homepage: https://ekodb.io
- Documentation: https://docs.ekodb.io
- Management Console: https://app.ekodb.io
- Support: support@ekodb.io
15. About the Author
Sean M. Vazquez is the creator and lead developer of ekoDB. With over a decade of experience in software development, Sean founded ekoDB Inc. to solve the challenges of multi-database integration through a unified database system.
Contact: sean@ekodb.io
This white paper describes ekoDB v0.20.0 (October 2025). Features and specifications are subject to change as the system evolves.