ekoDB White Paper
A Multi-Model Database System
By Sean M. Vazquez, Creator of ekoDB
Executive Summary
ekoDB is an in-memory database with disk persistence for unstructured data. Built with Rust, it combines in-memory performance with durability guarantees through Write-Ahead Logging (WAL).
Key Capabilities:
- Multi-Model Architecture: Key-value, document, vector search, and real-time messaging in a single system
- Configurable Durability: Fast mode for high throughput or durable mode for guaranteed persistence
- Adaptive Scaling: Deployments from IoT devices to enterprise servers
- Secure by Default: HTTPS/WSS-only, AES-GCM encryption at rest, TLS/SSL in transit
- Adaptive Memory: 1%-80% of available RAM with automatic management
- Distributed: Ripple system for horizontal scaling and cross-database propagation
1. Introduction
1.1 The Problem
Applications often require multiple database systems to handle different workloads:
- MongoDB for document storage
- Redis for caching and real-time data
- Elasticsearch for full-text search
- Pinecone for vector search
- PostgreSQL for relational data
This multi-database approach introduces complexity, operational overhead, and integration challenges.
1.2 The Solution
ekoDB combines features from multiple database types into a single system. It provides document storage, key-value operations, full-text search, and vector search in one database.
1.3 History
ekoDB originated from a practical challenge: integrating multiple databases required complex layers of abstraction to achieve feature parity and consistent usage patterns.
Development Timeline:
- 2013: Initial development as SOLO (Single Object Language Operator), an API gateway for multi-database integration
- 2013-2022: SOLO operated as an API gateway connecting various database systems
- 2022: Decision to eliminate the abstraction layer and build a unified database
- 2022-2025: Complete from-scratch rewrite as ekoDB
- Current: Active development and production use
2. Architecture
2.1 Storage Architecture
ekoDB uses a hybrid in-memory architecture with disk persistence:
┌─────────────────────────────────────────┐
│ In-Memory Layer (Hot Data) │
│ ┌────────────────────────────────────┐ │
│ │ Concurrent Hash Maps │ │
│ │ - O(1) lookups │ │
│ │ - Lock-free reads │ │
│ │ - Reader-writer locks │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
↕
┌─────────────────────────────────────────┐
│ Multi-Tier LRU Cache (Warm Data) │
│ ┌────────────────────────────────────┐ │
│ │ - Record Cache │ │
│ │ - Query Cache │ │
│ │ - Search Cache │ │
│ │ - KV Cache │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
↕
┌─────────────────────────────────────────┐
│ Disk Persistence (Cold Data) │
│ ┌────────────────────────────────────┐ │
│ │ Write-Ahead Log (WAL) │ │
│ │ - Encrypted (AES-GCM) │ │
│ │ - Compressed (ZSTD/LZ4) │ │
│ │ - Dual-mode (Fast/Durable) │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
Key Features:
- In-Memory Primary Storage: All active data stored in memory for fast access
- Automatic Eviction: LRU-based eviction when memory limits are reached
- Transparent Reload: Cold data automatically loaded from disk when accessed
- Larger-than-Memory Support: Datasets can exceed available RAM
2.2 Data Model
ekoDB supports multiple data models in a unified system:
Document Model
{
"name": "John Doe",
"email": "john@example.com",
"age": 30,
"tags": ["developer", "rust"],
"address": {
"city": "San Francisco",
"country": "USA"
}
}
Key-Value Model
{
"session:user123": {
"token": "abc123",
"expires": "2025-10-15T00:00:00Z"
}
}
Vector Model
{
"content": "ekoDB is a high-performance database",
"embedding": [0.1, 0.2, 0.3, ...], // 384-dimensional vector
"metadata": {
"source": "documentation"
}
}
2.3 Type System
ekoDB provides a flexible type system with per-collection per-field type enforcement:
Note:
- Type enforcement occurs at the time of write and is not enforced at the time of read.
- Type enforcement is optional at key-value level vs required at document level.
- All types are dynamically inferred at write time or can be explicitly specified via
/schemasendpoint.
Supported Types:
ekoDB supports 16 comprehensive data types organized into four categories:
Basic Types
String- UTF-8 encoded text data for names, descriptions, and general text contentInteger- 64-bit signed integers (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)Float- 64-bit IEEE 754 floating-point numbers for decimal valuesBoolean- Binary true/false values for logical operations
Advanced Numeric Types
Number- Flexible numeric type that automatically handles both integers and floats, inferred at write timeDecimal- Arbitrary-precision decimal numbers that avoid floating-point rounding errors. Essential for financial calculations (e.g., currency, accounting), scientific computations requiring exact decimal representation, and any scenario where0.1 + 0.2must equal exactly0.3. UnlikeFloat,Decimalstores values as mantissa and scale, ensuring mathematical precision without binary floating-point approximation issues
Temporal Types
DateTime- RFC 3339 formatted date-time values with timezone support (e.g.,2024-01-01T00:00:00Z)Duration- Time duration values for representing time spans (e.g.,30s,5m,2h)
Collection Types
Array- Ordered lists of heterogeneous elements, preserving insertion orderSet- Unordered collections of unique values with automatic deduplicationVector- Fixed-dimension numeric arrays optimized for embeddings and vector similarity searchObject- Nested documents/maps with key-value pairs for complex structured data
Specialized Types
UUID- Universally unique identifiers (RFC 4122) for globally unique record identificationBinary- Base64-encoded binary data for images, files, and other binary contentBytes- Raw byte arrays (Vec<u8>) for unencoded binary data storageNull- Explicit null/empty values for optional fields
Response Formats:
- Typed: Includes type metadata (e.g.,
{"type": "String", "value": "text"}) - Non-Typed: Traditional NoSQL format (e.g.,
"text")
2.4 Configuration Options
ekoDB provides flexible configuration options to tune behavior for different use cases:
Response Format Configuration
- Typed Responses: Include type metadata for strong typing
- Non-Typed Responses: Traditional NoSQL format for simplicity
- Default: Typed responses (includes type metadata)
- Configuration: Configure via configuration API or ekoDB App (https://app.ekodb.io)
Durability Configuration
- Fast WAL Mode: Higher throughput, buffered writes, eventual consistency
- Durable WAL Mode: Guaranteed persistence, immediate fsync, strong consistency
3. ACID Compliance
ekoDB is fully ACID-compliant, providing the same data guarantees as traditional relational databases while maintaining NoSQL-level performance.
What is ACID?
ACID stands for Atomicity, Consistency, Isolation, and Durability - the four properties that guarantee reliable database transactions:
- Atomicity: Operations either complete fully or not at all. If any part of a transaction fails, the entire transaction is rolled back.
- Consistency: Data always moves from one valid state to another. Schema constraints and validation rules are enforced.
- Isolation: Concurrent operations don't interfere with each other. Multiple users can work simultaneously without conflicts.
- Durability: Once committed, data persists even if the system crashes. Write-Ahead Logging (WAL) ensures no data loss.
How ekoDB Implements ACID
Atomicity via WAL: Every operation is logged before execution. If a failure occurs, the WAL is replayed to ensure all-or-nothing semantics.
Consistency via Schema Constraints: ekoDB supports 16 field types and 7 constraint types (required, unique, min/max, enum, regex, default, null handling) to maintain data integrity.
Isolation via Concurrency Control: Multi-level locking ensures that concurrent operations don't create race conditions or data corruption.
Durability via Configurable WAL: Choose between Fast WAL (buffered writes) for performance or Durable WAL (immediate fsync) for guaranteed persistence.
Transactions
ekoDB supports multi-document transactions with:
- Multiple isolation levels (ReadUncommitted, ReadCommitted, RepeatableRead, Serializable)
- Savepoints for nested transactions
- Automatic rollback on errors
- Full WAL audit trail
For detailed transaction usage, see Transactions Documentation.
4. Performance
4.1 Write Performance
ekoDB offers two WAL modes optimized for different use cases:
| Mode | Throughput | Durability | Use Case |
|---|---|---|---|
| Fast WAL | High throughput | Buffered writes | High-throughput ingestion |
| Durable WAL | Moderate throughput | Immediate fsync | Critical data |
Dual-Node Strategy: Deploy a primary node with Fast WAL and a secondary node with Durable WAL (via Ripple replication) to achieve both high performance and durability.
3.2 Read Performance
- Indexed Lookups: O(1) hash index, O(log n) B-tree index
- Point Queries: Sub-millisecond latency
- Range Queries: Logarithmic time complexity
- Full-Text Search: O(k) where k is number of matching terms
- Vector Search: Optimized flat index with early termination
3.3 Memory Efficiency
- Base Memory: Low memory footprint optimized for efficiency
- Compression: Significant space savings with ZSTD (configurable)
- Adaptive Allocation: 1%-80% of available RAM
- LRU Eviction: Automatic memory management
4. Indexing
4.1 Index Types
ekoDB implements multiple index types optimized for different query patterns:
Hash Indexes (Default)
- Complexity: O(1) for equality queries
- Use Case:
WHERE field = value - Automatic: Created for frequently queried fields
B-Tree Indexes
- Complexity: O(log n) for range queries
- Use Case:
WHERE field > value, sorting - Features: Supports
<,>,<=,>=,BETWEEN
Inverted Indexes
- Complexity: O(k) where k = matching terms
- Use Case: Full-text search
- Features: Stemming, fuzzy matching, tokenization
Vector Indexes
- Algorithm: HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search
- Fallback: Flat index for exact search
- Use Case: Semantic similarity search, embeddings, AI/ML workloads
- Metrics: Cosine similarity, Euclidean distance, dot product
- Configurable:
m(max connections per layer),ef_construction(candidate list size)
4.2 Index Management
- Automatic Creation: Indexes created based on query patterns
- Automatic Maintenance: Updated on insert/update/delete
- Concurrent Access: Thread-safe operations
- Memory Efficient: Weak references and LRU eviction
5. Concurrency & Isolation
5.1 Concurrency Control
ekoDB implements collection-level concurrency control:
- Collection-Level Granularity: Concurrent access managed per collection
- Concurrent Reads: Multiple readers can access data simultaneously
- Write Coordination: Ensures data consistency during modifications
- Cross-Collection Independence: Operations on different collections don't block each other
5.2 Isolation Levels
Within a Single Collection: Serializable
- Reader-writer locks ensure serializable execution
- No dirty reads, non-repeatable reads, or phantom reads
Across Multiple Collections: Read Uncommitted (effectively)
- No coordination between collections
- Applications must handle cross-collection consistency
- Multi-collection transactions planned for future release
6. Durability & Recovery
6.1 Write-Ahead Logging (WAL)
ekoDB implements a dual-mode WAL system:
Fast WAL Mode:
- Buffered writes with periodic fsync
- High throughput
- Suitable for high-throughput ingestion
Durable WAL Mode:
- Immediate fsync after every write
- Moderate throughput
- Guaranteed persistence
WAL Management:
- Automatic rotation based on size and activity
- Manual rotation available
- Log compaction removes redundant entries
- Automatic cleanup after replication
6.2 Recovery Process
ekoDB performs automatic crash recovery through WAL replay. The system validates data integrity, reconstructs in-memory structures, and rebuilds indexes. Recovery time depends on WAL size and system resources.
7. Search Capabilities
7.1 Full-Text Search
- Inverted Index: Maps terms to documents
- Tokenization: Automatic text processing
- Stemming: Language-aware word normalization
- Fuzzy Matching: Typo tolerance (Levenshtein distance)
- Field Weighting: Prioritize specific fields
- Minimum Score: Filter by relevance threshold
7.2 Vector Search
- Semantic Similarity: Find similar documents by meaning
- Embedding Support: 384, 768, 1536 dimensions (configurable)
- Distance Metrics: Cosine similarity, Euclidean, dot product
- Metadata Filtering: Combine vector search with filters
- Top-K Results: Efficient heap-based selection
7.3 Hybrid Search
Combine text and vector search:
- Text search for keyword matching
- Vector search for semantic similarity
- Unified scoring and ranking
- Combined text and semantic matching
8. Distributed Architecture
8.1 Ripple System
ekoDB's distributed architecture uses the Ripple system for horizontal scaling:
┌─────────────────────────────────────────────────┐
│ Regional Cluster │
│ ┌─────────────────────────────────────────┐ │
│ │ Managed Instance Groups (MIGs) │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Node 1 │ │ Node 2 │ ... │ │
│ │ │ (Primary)│ │(Secondary)│ │ │
│ │ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
↕ Ripple
┌─────────────────────────────────────────────────┐
│ Multi-Tenant Single Nodes │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Tenant A │ │ Tenant B │ │ Tenant C │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘
Features:
- Cross-Database Propagation: Replicate data across ekoDB instances
- Horizontal Scaling: Add nodes for increased capacity
- Regional Distribution: Deploy across geographic regions
- Automatic Failover: Secondary nodes take over on failure
- Loop Prevention: Automatic deduplication prevents infinite operation loops in multi-node deployments
8.1.1 Loop Prevention System
In multi-node deployments (Managed Instance Groups), ekoDB implements a comprehensive loop prevention system to prevent infinite operation loops when nodes ripple to each other:
Operation Metadata:
request_id: UUID for operation deduplicationorigin_node_id: Identifier of the node that originated the operationhop_count: Number of hops the operation has traversed (maximum 10)
Deduplication Cache:
- In-memory cache tracks processed request IDs
- 10-minute TTL per entry for automatic cleanup
- Prevents duplicate processing of the same operation
- Memory-safe with automatic eviction
HTTP Header Transport:
X-Ripple-Request-ID: UUID transmitted in HTTP headersX-Ripple-Origin-Node: Origin node identifierX-Ripple-Hop-Count: Current hop counter
Flow Example (3-Node MIG):
- Client writes to Node 1
- Node 1 generates UUID, sets origin_node_id, hop_count=0
- Node 1 ripples to Node 2 and Node 3 with headers
- Nodes 2 and 3 check: origin ≠ self, UUID not seen, hop_count < 10
- Nodes 2 and 3 process operation and cache request ID
- No re-propagation occurs (only origin propagates)
This ensures operations are processed exactly once across all nodes without infinite loops.
8.2 Replication
- Asynchronous Replication: Non-blocking writes
- Configurable Targets: Replicate to multiple destinations
- Selective Replication: Choose which collections to replicate
- Conflict Resolution: Last-write-wins strategy
9. Security
9.1 Network Security
HTTPS/WSS Only: Unlike traditional databases that use direct TCP connections, ekoDB exclusively uses HTTPS and WSS (Secure WebSocket) protocols.
Benefits:
- No direct TCP exposure
- Built-in encryption (TLS/SSL)
- Standard web protocols
- Firewall friendly (port 443)
- Certificate-based security
- Protection against man-in-the-middle attacks
9.2 Encryption
At Rest:
- AES-GCM encryption for all disk-persisted data
- Encrypted WAL files
- Encrypted indexes
- Configurable encryption keys
In Transit:
- TLS/SSL for all network communication
- Certificate validation
- Perfect forward secrecy
9.3 Authentication
- JWT Tokens: Industry-standard authentication
- API Keys: Simple authentication for services
- Role-Based Access: Planned for future release
- Token Expiration: Configurable TTL
10. Use Cases
10.1 AI Agents & Workflows
- Vector Search: Store and query embeddings
- Chat History: Built-in chat session management
- Context Management: Efficient retrieval of relevant context
- Real-Time Updates: WebSocket for live agent interactions
10.2 Real-Time Analytics
- High Throughput: 5M+ records/sec ingestion (5.48M peak)
- In-Memory Processing: Sub-millisecond queries
- Time-Series Data: Efficient storage and retrieval
- Aggregations: Fast analytical queries
10.3 IoT Data Processing
- Adaptive Scaling: Runs on resource-constrained devices
- Edge Computing: Deploy close to data sources
- Efficient Storage: Compression reduces disk usage
- Batch Operations: Handle bursts of sensor data
10.4 Session Management
- TTL Support: Automatic session expiration
- Fast Lookups: O(1) key-value operations
- High Concurrency: Handle thousands of sessions
- Persistence: Optional durability for sessions
10.5 Content Delivery
- Distributed Caching: Ripple for multi-region deployment
- Fast Reads: In-memory performance
- Compression: Reduce bandwidth usage
- Real-Time Updates: WebSocket for live content
11. Roadmap
Already Implemented ✅
The following advanced features are already available in ekoDB:
- Official Client Libraries: Type-safe SDKs for Rust, Python, TypeScript/JavaScript, Go, and Kotlin with automatic authentication, retry logic, and connection pooling
- Scripts & Functions: Stored procedures system with JSON-based function composition, parameters, and version control
- Multi-Collection Operations: Cross-collection queries via JOINs and Scripts that operate across multiple collections
- Transactions: Single-collection ACID transactions with multiple isolation levels (ReadUncommitted, ReadCommitted, RepeatableRead, Serializable)
- MVCC Foundation: Snapshot versioning infrastructure for Serializable isolation (partial implementation)
- HNSW Vector Search: Full implementation of Hierarchical Navigable Small World algorithm for approximate nearest neighbor search
- Ripple Replication: Cross-instance data propagation and synchronization with loop prevention
- Role-Based Access Control: Collection-level and field-level permissions with admin roles, field selection/exclusion, and writable field restrictions
11.1 Near-Term
- Full MVCC Implementation: Complete Multi-Version Concurrency Control for Serializable isolation (currently has snapshot versioning foundation)
- Deadlock Detection: Automatic detection and resolution for concurrent transactions
- Multi-Language Scripts: Extend Scripts & Functions to support JavaScript, Python, and Rust execution (currently JSON-based)
11.2 Mid-Term
- GPU Acceleration: Hardware acceleration for vector operations
- Columnar Storage: Optional DSM for analytical workloads
- Analytics Functions: Aggregation operations
11.3 Long-Term
- Distributed Transactions: Cross-node ACID transactions
- SQL Interface: Optional SQL query support
- Time-Series Optimization: Specialized time-series features
- Machine Learning Integration: Built-in ML model serving
12. Design Considerations
ekoDB combines features typically found across multiple specialized databases:
Multi-Model Support:
- Native document storage, key-value operations, full-text search, and vector search in a single system
- Eliminates need for separate databases for different data types
API-First Architecture:
- REST and WebSocket protocols instead of proprietary wire protocols
- Standard HTTPS/WSS for simplified deployment and security
Configurable Durability:
- Fast WAL mode for high-throughput scenarios
- Durable WAL mode for critical data
- Application-level choice based on requirements
Embedded Search:
- Full-text search with inverted indexes
- Vector similarity search for semantic queries
- No separate search infrastructure required
13. Getting Started
13.1 Quick Start
# Install client library
npm install @ekodb/ekodb-client
# Connect to ekoDB
import { EkoDBClient } from "@ekodb/ekodb-client";
const client = new EkoDBClient({
baseURL: "https://your-subdomain.production.google.ekodb.net",
apiKey: "your-api-key"
});
await client.init();
// Insert a document
await client.insert("users", {
name: "John Doe",
email: "john@example.com"
});
// Query documents using ekoDB query builder
const query = {
filter: {
type: "Condition",
content: {
field: "age",
operator: "Gt",
value: 25
}
}
};
const users = await client.find("users", query);
13.2 Resources
- Homepage: https://ekodb.io
- Documentation: https://docs.ekodb.io
- Management Console: https://app.ekodb.io
- Support: support@ekodb.io
14. About the Author
Sean M. Vazquez is the creator and lead developer of ekoDB. He founded ekoDB Inc. to address the challenges of multi-database integration.
Contact: sean@ekodb.io
Features and specifications are subject to change as ekoDB continues to evolve.