Query Patterns
ekoDB automatically tracks query access patterns to identify frequently accessed records and intelligently pre-loads them into cache on database startup.
Query pattern tracking with intelligent cache warming is available in ekoDB v0.30.0+.
Overviewβ
The Pattern Logger monitors which records are accessed, how often, and how recently. On database restart, it uses this information to automatically warm the cache with your most-accessed data, eliminating cold-start delays.
Key Benefitsβ
- π 3x Faster Queries: Cached records served in ~109Β΅s vs ~328Β΅s uncached
- π Zero Configuration: Fully automatic - no setup required
- πΎ Sub-Microsecond Lookups:
find_by_idcompletes in ~2.3Β΅s - π Persistent: Patterns survive database restarts
- β‘ Smart Scoring: Recent accesses weighted higher
How It Worksβ
1. Automatic Pattern Trackingβ
Every query operation automatically logs access patterns:
// JavaScript/TypeScript Client
await client.findById('users', userId); // Tracked: FindById
await client.find('users', { limit: 100 }); // Tracked: Find
await client.textSearch('products', 'laptop'); // Tracked: TextSearch
await client.vectorSearch('images', embedding, 10); // Tracked: VectorSearch
# Python Client
await client.find_by_id('users', user_id) # Tracked: FindById
await client.find('users', limit=100) # Tracked: Find
await client.text_search('products', 'laptop') # Tracked: TextSearch
await client.vector_search('images', embedding, 10) # Tracked: VectorSearch
// Go Client
client.FindByID("users", userId) // Tracked: FindById
client.Find("users", filters) // Tracked: Find
client.TextSearch("products", "laptop") // Tracked: TextSearch
client.VectorSearch("images", embedding, 10) // Tracked: VectorSearch
2. Hotness Scoringβ
Records are scored based on frequency and recency:
Hotness Score = (Access Count Γ Recency Factor) / β(Age + 1)
Recency Factor:
< 1 hour ago: 10.0Γ (very hot)
< 24 hours: 5.0Γ (hot)
< 7 days: 2.0Γ (warm)
> 7 days: 1.0Γ (cold)
Example:
- Record A: Accessed 100 times, last access 30 minutes ago β Very Hot
- Record B: Accessed 50 times, last access 3 days ago β Warm
- Record C: Accessed 10 times, last access 30 days ago β Cold
3. Cache Warming on Startupβ
When your database restarts:
- Load Patterns: Read patterns from disk
- Calculate Scores: Compute hotness for all records
- Pre-load Hot Records: Load top N records into cache
- Ready to Serve: Hot queries are instant
Database Restart Flow:
ββ Load pattern file (~1-2ms)
ββ Calculate hotness scores (~5-10ms)
ββ Pre-load 50 hot records into cache (~100-200ms)
ββ Database ready with warm cache β
Query Performance (from benchmarks):
Uncached query (10 records): 328Β΅s
Cached query (10 records): 109Β΅s β 3x faster!
Single find_by_id: 2.3Β΅s (sub-microsecond)
Performance Impactβ
Cache Performanceβ
Benchmarks compare query performance when records are pre-loaded in cache versus cold (uncached) queries. All timings are local embedded operations with no network overhead.
| Query Size | Speedup |
|---|---|
| 10 records | 3.0x faster |
| 100 records | 3.1x faster |
| 1000 records | 3.6x faster |
Cache warming provides consistent 3x speedups across all query sizes. The improvement comes from avoiding disk I/O on hot pathsβrecords are served directly from memory rather than reading from storage on each access.
Single Record Lookupsβ
Single record operations show ekoDB's raw embedded performance. These are point lookups without any query parsing or filtering overhead.
| Operation | Speedup |
|---|---|
| find_by_id | Sub-microsecond (~2Β΅s) |
| kv_get | Sub-microsecond (~200ns) |
| kv_exists | Sub-microsecond (~100ns) |
These numbers reflect the advantage of an embedded database: no serialization, no network round-trips, no connection pooling. The kv_exists check at ~100ns is essentially a hash table lookup, while find_by_id includes document parsing overhead.
For complete benchmark data including comparisons to other embedded databases, see Performance Benchmarks.
Pattern logging adds minimal overhead but provides 3x faster cached query performance. This is an excellent trade-off for most applications.
Configurationβ
Automatic Configurationβ
Pattern tracking is enabled by default with automatic configuration:
- Buffer size auto-scales with system resources
- Flush frequency optimized for your hardware
- Hot record limit adapts to available memory
No configuration needed! Just use ekoDB normally.
Manual Tuning (Advanced)β
For advanced use cases, you can tune behavior via environment variables:
# Adjust cleanup interval (affects buffer size)
export CLEANUP_INTERVAL_SECONDS=300 # Default: varies by tier
# Buffer size calculation:
# buffer = cleanup_interval * 10
# 300s β 3000 entries buffered before flush
Use Casesβ
E-Commerce Platformβ
// Scenario: Product catalog with 1M products
// Reality: 1000 products generate 80% of traffic
// Before (Without Cache Warming):
// - Database restart
// - First queries: ~328Β΅s each (uncached)
// - User experience: Slightly slower initial loads
// After (With Cache Warming):
// - Database restart
// - Pattern logger pre-loads 1000 hot products
// - First queries: ~109Β΅s each (3x faster!)
// - User experience: Fast from the start β
Multi-Tenant SaaSβ
// Scenario: 10,000 tenants
// Reality: 100 active tenants generate 90% of queries
// Cache warming automatically identifies active tenants
// Pre-loads their data on startup
// Active tenants get instant response times
// Inactive tenants use normal lazy-loading
Content Platformβ
// Scenario: News site with trending articles
// Patterns adapt automatically:
// - Morning: Breaking news articles hot
// - Afternoon: Opinion pieces hot
// - Evening: Sports content hot
// Cache warming reflects current trends
// No manual cache management needed
Monitoringβ
Check Pattern Statisticsβ
Pattern statistics are available through server logs on startup and during operation. You can also inspect the pattern file directly:
# Check pattern file size
ls -lh {database_name}/patterns/query_patterns.log
# Count pattern entries
wc -l {database_name}/patterns/query_patterns.log
# View recent patterns
tail -20 {database_name}/patterns/query_patterns.log
Log Outputβ
Pattern Logger provides detailed logs:
[INFO] PatternLogger: Buffer size set to 3000 entries (based on cleanup_interval: 300s)
[INFO] Loading query patterns from "db_name/patterns/query_patterns.log"
[INFO] Loaded 15234 pattern entries
[INFO] Warming cache from query patterns...
[INFO] Cache warming complete: 50 hot records loaded in 125ms
Best Practicesβ
1. Let It Run Automaticallyβ
Don't disable pattern logging unless you have a specific reason. The benefits far outweigh the minimal overhead.
2. Monitor Hot Recordsβ
Periodically review which records are hot:
- Optimize frequently accessed data
- Add indexes for hot queries
- Pre-compute expensive operations
3. Clean Old Patternsβ
Patterns are cleaned automatically based on the cleanup_interval setting. For manual cleanup, you can truncate or remove the pattern file (database restart will recreate it):
# Check pattern file size
ls -lh {database_name}/patterns/query_patterns.log
# If too large, truncate (patterns will rebuild from new access)
truncate -s 0 {database_name}/patterns/query_patterns.log
4. Combine with Indexesβ
Pattern tracking identifies hot records, indexes optimize hot queries:
// Pattern tracking: Pre-loads frequently accessed users
// Index: Makes user lookups by email fast
await client.createIndex('users', ['email']);
// Both work together for optimal performance
Storage & Maintenanceβ
Pattern File Formatβ
Patterns stored in append-only log format:
{database_name}/
βββ patterns/
βββ query_patterns.log
Format (pipe-delimited):
users|user_123|find_by_id|2025-01-22T19:30:00Z
products|prod_456|find|2025-01-22T19:30:01Z
images|img_789|vector_search|2025-01-22T19:30:02Z
Storage Requirementsβ
Very efficient - minimal disk space:
1 pattern entry β 80 bytes (average)
1,000,000 patterns β 80 MB
10,000,000 patterns β 800 MB
Cleanup Strategyβ
Recommended cleanup schedule:
| Data Size | Cleanup Frequency | Keep Days |
|---|---|---|
| < 1GB | Monthly | 30 days |
| 1-10GB | Weekly | 14 days |
| > 10GB | Daily | 7 days |
Troubleshootingβ
Patterns Not Loadingβ
Check pattern file exists:
ls -la {database_name}/patterns/query_patterns.log
Check logs for errors:
[WARN] Failed to parse pattern entry: Invalid timestamp
Solution: Pattern file may be corrupted. Remove and rebuild:
rm {database_name}/patterns/query_patterns.log
# Restart database - new patterns will be tracked
High Memory Usageβ
Check buffer size:
# In database logs
[INFO] PatternLogger: Buffer size set to 10000 entries
Solution: Reduce cleanup interval to decrease buffer:
export CLEANUP_INTERVAL_SECONDS=60 # Smaller buffer
Slow Startupβ
Check hot record count:
# In database logs
[INFO] Cache warming complete: 500 hot records loaded in 2500ms
Solution: Too many hot records being loaded. This is automatically tuned, but if startup is critical:
- Patterns naturally age out
- Clean old patterns more frequently
- Consider if 500 hot records is appropriate for your use case
Integration with Other Featuresβ
With Multi-Region (Ripple)β
Each region maintains its own patterns:
- Region A: Tracks Region A's access patterns
- Region B: Tracks Region B's access patterns
- Benefit: Each region optimizes for its users
With File Poolβ
Uses global file descriptor management:
- Pattern file = 1 FD
- Respects system limits
- Automatic retry/backoff
With Disk Cacheβ
Works seamlessly together:
- Hot records β Memory cache (instant)
- Warm records β Disk cache (fast)
- Cold records β Database (acceptable)
Pattern File Referenceβ
Pattern data is stored on disk and managed automatically. You can access it directly for debugging or analysis:
Pattern File Locationβ
{database_name}/patterns/query_patterns.log
Pattern Entry Formatβ
Each line is pipe-delimited:
{collection}|{record_id}|{operation_type}|{timestamp}
Example:
users|user_123|find_by_id|2025-01-22T19:30:00Z
products|prod_456|find|2025-01-22T19:30:01Z
Useful Commandsβ
# View pattern statistics
wc -l {database_name}/patterns/query_patterns.log # Total entries
ls -lh {database_name}/patterns/query_patterns.log # File size
# Find most accessed collections
cut -d'|' -f1 {database_name}/patterns/query_patterns.log | sort | uniq -c | sort -rn
# Find most accessed records
cut -d'|' -f1,2 {database_name}/patterns/query_patterns.log | sort | uniq -c | sort -rn | head -20
Related Documentationβ
- System Administration - Admin endpoints and monitoring
- Performance Benchmarks - Performance data and tuning
- Indexes - Create indexes for common queries
- White Paper - Technical architecture details
Summaryβ
Query pattern tracking with intelligent cache warming is a game-changer for cold-start performance:
β
Automatic - No configuration required
β
Fast - 92% faster cold starts
β
Efficient - Minimal overhead (0.85%)
β
Smart - Adapts to changing access patterns
β
Persistent - Patterns survive restarts
Your database automatically learns which data is important and ensures it's ready when you need it.