Query Patterns

ekoDB automatically tracks query access patterns to identify frequently accessed records and intelligently pre-loads them into cache on database startup.

New in v0.30.0

Query pattern tracking with intelligent cache warming is available in ekoDB v0.30.0+.

Overview

The Pattern Logger monitors which records are accessed, how often, and how recently. On database restart, it uses this information to automatically warm the cache with your most-accessed data, eliminating cold-start delays.

Key Benefits

🚀 3x Faster Queries: Cached records served in ~109µs vs ~328µs uncached
📊 Zero Configuration: Fully automatic - no setup required
💾 Sub-Microsecond Lookups: find_by_id completes in ~2.3µs
🔄 Persistent: Patterns survive database restarts
⚡ Smart Scoring: Recent accesses weighted higher

How It Works

1. Automatic Pattern Tracking

Every query operation automatically logs access patterns:

// JavaScript/TypeScript Client
await client.findById('users', userId);        // Tracked: FindById
await client.find('users', { limit: 100 });   // Tracked: Find
await client.textSearch('products', 'laptop'); // Tracked: TextSearch
await client.vectorSearch('images', embedding, 10); // Tracked: VectorSearch

# Python Client
await client.find_by_id('users', user_id)           # Tracked: FindById
await client.find('users', limit=100)                # Tracked: Find
await client.text_search('products', 'laptop')       # Tracked: TextSearch
await client.vector_search('images', embedding, 10)  # Tracked: VectorSearch

// Go Client
client.FindByID("users", userId)             // Tracked: FindById
client.Find("users", filters)                // Tracked: Find
client.TextSearch("products", "laptop")      // Tracked: TextSearch
client.VectorSearch("images", embedding, 10) // Tracked: VectorSearch

2. Hotness Scoring

Records are scored based on frequency and recency:

Hotness Score = (Access Count × Recency Factor) / √(Age + 1)

Recency Factor:
  < 1 hour ago:  10.0× (very hot)
  < 24 hours:     5.0× (hot)
  < 7 days:       2.0× (warm)
  > 7 days:       1.0× (cold)

Example:

Record A: Accessed 100 times, last access 30 minutes ago → Very Hot
Record B: Accessed 50 times, last access 3 days ago → Warm
Record C: Accessed 10 times, last access 30 days ago → Cold

3. Cache Warming on Startup

When your database restarts:

Load Patterns: Read patterns from disk
Calculate Scores: Compute hotness for all records
Pre-load Hot Records: Load top N records into cache
Ready to Serve: Hot queries are instant

Database Restart Flow:
├─ Load pattern file (~1-2ms)
├─ Calculate hotness scores (~5-10ms)
├─ Pre-load 50 hot records into cache (~100-200ms)
└─ Database ready with warm cache ✅

Query Performance (from benchmarks):
  Uncached query (10 records): 328µs
  Cached query (10 records):   109µs → 3x faster!
  Single find_by_id:           2.3µs (sub-microsecond)

Performance Impact

Cache Performance

Benchmarks compare query performance when records are pre-loaded in cache versus cold (uncached) queries. All timings are local embedded operations with no network overhead.

Query Size	Speedup
10 records	3.0x faster
100 records	3.1x faster
1000 records	3.6x faster

Cache warming provides consistent 3x speedups across all query sizes. The improvement comes from avoiding disk I/O on hot paths—records are served directly from memory rather than reading from storage on each access.

Single Record Lookups

Single record operations show ekoDB's raw embedded performance. These are point lookups without any query parsing or filtering overhead.

Operation	Speedup
find_by_id	Sub-microsecond (~2µs)
kv_get	Sub-microsecond (~200ns)
kv_exists	Sub-microsecond (~100ns)

These numbers reflect the advantage of an embedded database: no serialization, no network round-trips, no connection pooling. The kv_exists check at ~100ns is essentially a hash table lookup, while find_by_id includes document parsing overhead.

For complete benchmark data including comparisons to other embedded databases, see Performance Benchmarks.

Performance Trade-off

Pattern logging adds minimal overhead but provides 3x faster cached query performance. This is an excellent trade-off for most applications.

Configuration

Automatic Configuration

Pattern tracking is enabled by default with automatic configuration:

Buffer size auto-scales with system resources
Flush frequency optimized for your hardware
Hot record limit adapts to available memory

No configuration needed! Just use ekoDB normally.

Manual Tuning (Advanced)

For advanced use cases, you can tune behavior via environment variables:

# Adjust cleanup interval (affects buffer size)
export CLEANUP_INTERVAL_SECONDS=300  # Default: varies by tier

# Buffer size calculation:
# buffer = cleanup_interval * 10
# 300s → 3000 entries buffered before flush

Use Cases

E-Commerce Platform

// Scenario: Product catalog with 1M products
// Reality: 1000 products generate 80% of traffic

// Before (Without Cache Warming):
// - Database restart
// - First queries: ~328µs each (uncached)
// - User experience: Slightly slower initial loads

// After (With Cache Warming):
// - Database restart
// - Pattern logger pre-loads 1000 hot products
// - First queries: ~109µs each (3x faster!)
// - User experience: Fast from the start ✅

Multi-Tenant SaaS

// Scenario: 10,000 tenants
// Reality: 100 active tenants generate 90% of queries

// Cache warming automatically identifies active tenants
// Pre-loads their data on startup
// Active tenants get instant response times
// Inactive tenants use normal lazy-loading

Content Platform

// Scenario: News site with trending articles
// Patterns adapt automatically:
// - Morning: Breaking news articles hot
// - Afternoon: Opinion pieces hot
// - Evening: Sports content hot

// Cache warming reflects current trends
// No manual cache management needed

Monitoring

Check Pattern Statistics

Pattern statistics are available through server logs on startup and during operation. You can also inspect the pattern file directly:

# Check pattern file size
ls -lh {database_name}/patterns/query_patterns.log

# Count pattern entries
wc -l {database_name}/patterns/query_patterns.log

# View recent patterns
tail -20 {database_name}/patterns/query_patterns.log

Log Output

Pattern Logger provides detailed logs:

[INFO] PatternLogger: Buffer size set to 3000 entries (based on cleanup_interval: 300s)
[INFO] Loading query patterns from "db_name/patterns/query_patterns.log"
[INFO] Loaded 15234 pattern entries
[INFO] Warming cache from query patterns...
[INFO] Cache warming complete: 50 hot records loaded in 125ms

Best Practices

1. Let It Run Automatically

Don't disable pattern logging unless you have a specific reason. The benefits far outweigh the minimal overhead.

2. Monitor Hot Records

Periodically review which records are hot:

Optimize frequently accessed data
Add indexes for hot queries
Pre-compute expensive operations

3. Clean Old Patterns

Patterns are cleaned automatically based on the cleanup_interval setting. For manual cleanup, you can truncate or remove the pattern file (database restart will recreate it):

# Check pattern file size
ls -lh {database_name}/patterns/query_patterns.log

# If too large, truncate (patterns will rebuild from new access)
truncate -s 0 {database_name}/patterns/query_patterns.log

4. Combine with Indexes

Pattern tracking identifies hot records, indexes optimize hot queries:

// Pattern tracking: Pre-loads frequently accessed users
// Index: Makes user lookups by email fast
await client.createIndex('users', ['email']);

// Both work together for optimal performance

Storage & Maintenance

Pattern File Format

Patterns stored in append-only log format:

{database_name}/
└── patterns/
    └── query_patterns.log

Format (pipe-delimited):

users|user_123|find_by_id|2025-01-22T19:30:00Z
products|prod_456|find|2025-01-22T19:30:01Z
images|img_789|vector_search|2025-01-22T19:30:02Z

Storage Requirements

Very efficient - minimal disk space:

1 pattern entry ≈ 80 bytes (average)
1,000,000 patterns ≈ 80 MB
10,000,000 patterns ≈ 800 MB

Cleanup Strategy

Recommended cleanup schedule:

Data Size	Cleanup Frequency	Keep Days
< 1GB	Monthly	30 days
1-10GB	Weekly	14 days
> 10GB	Daily	7 days

Troubleshooting

Patterns Not Loading

Check pattern file exists:

ls -la {database_name}/patterns/query_patterns.log

Check logs for errors:

[WARN] Failed to parse pattern entry: Invalid timestamp

Solution: Pattern file may be corrupted. Remove and rebuild:

rm {database_name}/patterns/query_patterns.log
# Restart database - new patterns will be tracked

High Memory Usage

Check buffer size:

# In database logs
[INFO] PatternLogger: Buffer size set to 10000 entries

Solution: Reduce cleanup interval to decrease buffer:

export CLEANUP_INTERVAL_SECONDS=60  # Smaller buffer

Slow Startup

Check hot record count:

# In database logs
[INFO] Cache warming complete: 500 hot records loaded in 2500ms

Solution: Too many hot records being loaded. This is automatically tuned, but if startup is critical:

Patterns naturally age out
Clean old patterns more frequently
Consider if 500 hot records is appropriate for your use case

Integration with Other Features

With Multi-Region (Ripple)

Each region maintains its own patterns:

Region A: Tracks Region A's access patterns
Region B: Tracks Region B's access patterns
Benefit: Each region optimizes for its users

With File Pool

Uses global file descriptor management:

Pattern file = 1 FD
Respects system limits
Automatic retry/backoff

With Disk Cache

Works seamlessly together:

Hot records → Memory cache (instant)
Warm records → Disk cache (fast)
Cold records → Database (acceptable)

Pattern File Reference

Pattern data is stored on disk and managed automatically. You can access it directly for debugging or analysis:

Pattern File Location

{database_name}/patterns/query_patterns.log

Pattern Entry Format

Each line is pipe-delimited:

{collection}|{record_id}|{operation_type}|{timestamp}

Example:

users|user_123|find_by_id|2025-01-22T19:30:00Z
products|prod_456|find|2025-01-22T19:30:01Z

Useful Commands

# View pattern statistics
wc -l {database_name}/patterns/query_patterns.log  # Total entries
ls -lh {database_name}/patterns/query_patterns.log # File size

# Find most accessed collections
cut -d'|' -f1 {database_name}/patterns/query_patterns.log | sort | uniq -c | sort -rn

# Find most accessed records
cut -d'|' -f1,2 {database_name}/patterns/query_patterns.log | sort | uniq -c | sort -rn | head -20

System Administration - Admin endpoints and monitoring
Performance Benchmarks - Performance data and tuning
Indexes - Create indexes for common queries
White Paper - Technical architecture details

Summary

Query pattern tracking with intelligent cache warming is a game-changer for cold-start performance:

✅ Automatic - No configuration required ✅ Fast - 92% faster cold starts
✅ Efficient - Minimal overhead (0.85%) ✅ Smart - Adapts to changing access patterns ✅ Persistent - Patterns survive restarts

Your database automatically learns which data is important and ensures it's ready when you need it.

Overview​

Key Benefits​

How It Works​

1. Automatic Pattern Tracking​

2. Hotness Scoring​

3. Cache Warming on Startup​

Performance Impact​

Cache Performance​

Single Record Lookups​

Configuration​

Automatic Configuration​

Manual Tuning (Advanced)​

Use Cases​

E-Commerce Platform​

Multi-Tenant SaaS​

Content Platform​

Monitoring​

Check Pattern Statistics​

Log Output​

Best Practices​

1. Let It Run Automatically​

2. Monitor Hot Records​

3. Clean Old Patterns​

4. Combine with Indexes​

Storage & Maintenance​

Pattern File Format​

Storage Requirements​

Cleanup Strategy​

Troubleshooting​

Patterns Not Loading​

High Memory Usage​

Slow Startup​

Integration with Other Features​

With Multi-Region (Ripple)​

With File Pool​

With Disk Cache​

Pattern File Reference​

Pattern File Location​

Pattern Entry Format​

Useful Commands​

Related Documentation​

Summary​