Skip to main content

Query Patterns

ekoDB automatically tracks query access patterns to identify frequently accessed records and intelligently pre-loads them into cache on database startup.

New in v0.30.0

Query pattern tracking with intelligent cache warming is available in ekoDB v0.30.0+.

Overview​

The Pattern Logger monitors which records are accessed, how often, and how recently. On database restart, it uses this information to automatically warm the cache with your most-accessed data, eliminating cold-start delays.

Key Benefits​

  • πŸš€ 3x Faster Queries: Cached records served in ~109Β΅s vs ~328Β΅s uncached
  • πŸ“Š Zero Configuration: Fully automatic - no setup required
  • πŸ’Ύ Sub-Microsecond Lookups: find_by_id completes in ~2.3Β΅s
  • πŸ”„ Persistent: Patterns survive database restarts
  • ⚑ Smart Scoring: Recent accesses weighted higher

How It Works​

1. Automatic Pattern Tracking​

Every query operation automatically logs access patterns:

// JavaScript/TypeScript Client
await client.findById('users', userId); // Tracked: FindById
await client.find('users', { limit: 100 }); // Tracked: Find
await client.textSearch('products', 'laptop'); // Tracked: TextSearch
await client.vectorSearch('images', embedding, 10); // Tracked: VectorSearch
# Python Client
await client.find_by_id('users', user_id) # Tracked: FindById
await client.find('users', limit=100) # Tracked: Find
await client.text_search('products', 'laptop') # Tracked: TextSearch
await client.vector_search('images', embedding, 10) # Tracked: VectorSearch
// Go Client
client.FindByID("users", userId) // Tracked: FindById
client.Find("users", filters) // Tracked: Find
client.TextSearch("products", "laptop") // Tracked: TextSearch
client.VectorSearch("images", embedding, 10) // Tracked: VectorSearch

2. Hotness Scoring​

Records are scored based on frequency and recency:

Hotness Score = (Access Count Γ— Recency Factor) / √(Age + 1)

Recency Factor:
< 1 hour ago: 10.0Γ— (very hot)
< 24 hours: 5.0Γ— (hot)
< 7 days: 2.0Γ— (warm)
> 7 days: 1.0Γ— (cold)

Example:

  • Record A: Accessed 100 times, last access 30 minutes ago β†’ Very Hot
  • Record B: Accessed 50 times, last access 3 days ago β†’ Warm
  • Record C: Accessed 10 times, last access 30 days ago β†’ Cold

3. Cache Warming on Startup​

When your database restarts:

  1. Load Patterns: Read patterns from disk
  2. Calculate Scores: Compute hotness for all records
  3. Pre-load Hot Records: Load top N records into cache
  4. Ready to Serve: Hot queries are instant
Database Restart Flow:
β”œβ”€ Load pattern file (~1-2ms)
β”œβ”€ Calculate hotness scores (~5-10ms)
β”œβ”€ Pre-load 50 hot records into cache (~100-200ms)
└─ Database ready with warm cache βœ…

Query Performance (from benchmarks):
Uncached query (10 records): 328Β΅s
Cached query (10 records): 109Β΅s β†’ 3x faster!
Single find_by_id: 2.3Β΅s (sub-microsecond)

Performance Impact​

Cache Performance​

Benchmarks compare query performance when records are pre-loaded in cache versus cold (uncached) queries. All timings are local embedded operations with no network overhead.

Query SizeSpeedup
10 records3.0x faster
100 records3.1x faster
1000 records3.6x faster

Cache warming provides consistent 3x speedups across all query sizes. The improvement comes from avoiding disk I/O on hot pathsβ€”records are served directly from memory rather than reading from storage on each access.

Single Record Lookups​

Single record operations show ekoDB's raw embedded performance. These are point lookups without any query parsing or filtering overhead.

OperationSpeedup
find_by_idSub-microsecond (~2Β΅s)
kv_getSub-microsecond (~200ns)
kv_existsSub-microsecond (~100ns)

These numbers reflect the advantage of an embedded database: no serialization, no network round-trips, no connection pooling. The kv_exists check at ~100ns is essentially a hash table lookup, while find_by_id includes document parsing overhead.

For complete benchmark data including comparisons to other embedded databases, see Performance Benchmarks.

Performance Trade-off

Pattern logging adds minimal overhead but provides 3x faster cached query performance. This is an excellent trade-off for most applications.

Configuration​

Automatic Configuration​

Pattern tracking is enabled by default with automatic configuration:

  • Buffer size auto-scales with system resources
  • Flush frequency optimized for your hardware
  • Hot record limit adapts to available memory

No configuration needed! Just use ekoDB normally.

Manual Tuning (Advanced)​

For advanced use cases, you can tune behavior via environment variables:

# Adjust cleanup interval (affects buffer size)
export CLEANUP_INTERVAL_SECONDS=300 # Default: varies by tier

# Buffer size calculation:
# buffer = cleanup_interval * 10
# 300s β†’ 3000 entries buffered before flush

Use Cases​

E-Commerce Platform​

// Scenario: Product catalog with 1M products
// Reality: 1000 products generate 80% of traffic

// Before (Without Cache Warming):
// - Database restart
// - First queries: ~328Β΅s each (uncached)
// - User experience: Slightly slower initial loads

// After (With Cache Warming):
// - Database restart
// - Pattern logger pre-loads 1000 hot products
// - First queries: ~109Β΅s each (3x faster!)
// - User experience: Fast from the start βœ…

Multi-Tenant SaaS​

// Scenario: 10,000 tenants
// Reality: 100 active tenants generate 90% of queries

// Cache warming automatically identifies active tenants
// Pre-loads their data on startup
// Active tenants get instant response times
// Inactive tenants use normal lazy-loading

Content Platform​

// Scenario: News site with trending articles
// Patterns adapt automatically:
// - Morning: Breaking news articles hot
// - Afternoon: Opinion pieces hot
// - Evening: Sports content hot

// Cache warming reflects current trends
// No manual cache management needed

Monitoring​

Check Pattern Statistics​

Pattern statistics are available through server logs on startup and during operation. You can also inspect the pattern file directly:

# Check pattern file size
ls -lh {database_name}/patterns/query_patterns.log

# Count pattern entries
wc -l {database_name}/patterns/query_patterns.log

# View recent patterns
tail -20 {database_name}/patterns/query_patterns.log

Log Output​

Pattern Logger provides detailed logs:

[INFO] PatternLogger: Buffer size set to 3000 entries (based on cleanup_interval: 300s)
[INFO] Loading query patterns from "db_name/patterns/query_patterns.log"
[INFO] Loaded 15234 pattern entries
[INFO] Warming cache from query patterns...
[INFO] Cache warming complete: 50 hot records loaded in 125ms

Best Practices​

1. Let It Run Automatically​

Don't disable pattern logging unless you have a specific reason. The benefits far outweigh the minimal overhead.

2. Monitor Hot Records​

Periodically review which records are hot:

  • Optimize frequently accessed data
  • Add indexes for hot queries
  • Pre-compute expensive operations

3. Clean Old Patterns​

Patterns are cleaned automatically based on the cleanup_interval setting. For manual cleanup, you can truncate or remove the pattern file (database restart will recreate it):

# Check pattern file size
ls -lh {database_name}/patterns/query_patterns.log

# If too large, truncate (patterns will rebuild from new access)
truncate -s 0 {database_name}/patterns/query_patterns.log

4. Combine with Indexes​

Pattern tracking identifies hot records, indexes optimize hot queries:

// Pattern tracking: Pre-loads frequently accessed users
// Index: Makes user lookups by email fast
await client.createIndex('users', ['email']);

// Both work together for optimal performance

Storage & Maintenance​

Pattern File Format​

Patterns stored in append-only log format:

{database_name}/
└── patterns/
└── query_patterns.log

Format (pipe-delimited):

users|user_123|find_by_id|2025-01-22T19:30:00Z
products|prod_456|find|2025-01-22T19:30:01Z
images|img_789|vector_search|2025-01-22T19:30:02Z

Storage Requirements​

Very efficient - minimal disk space:

1 pattern entry β‰ˆ 80 bytes (average)
1,000,000 patterns β‰ˆ 80 MB
10,000,000 patterns β‰ˆ 800 MB

Cleanup Strategy​

Recommended cleanup schedule:

Data SizeCleanup FrequencyKeep Days
< 1GBMonthly30 days
1-10GBWeekly14 days
> 10GBDaily7 days

Troubleshooting​

Patterns Not Loading​

Check pattern file exists:

ls -la {database_name}/patterns/query_patterns.log

Check logs for errors:

[WARN] Failed to parse pattern entry: Invalid timestamp

Solution: Pattern file may be corrupted. Remove and rebuild:

rm {database_name}/patterns/query_patterns.log
# Restart database - new patterns will be tracked

High Memory Usage​

Check buffer size:

# In database logs
[INFO] PatternLogger: Buffer size set to 10000 entries

Solution: Reduce cleanup interval to decrease buffer:

export CLEANUP_INTERVAL_SECONDS=60  # Smaller buffer

Slow Startup​

Check hot record count:

# In database logs
[INFO] Cache warming complete: 500 hot records loaded in 2500ms

Solution: Too many hot records being loaded. This is automatically tuned, but if startup is critical:

  • Patterns naturally age out
  • Clean old patterns more frequently
  • Consider if 500 hot records is appropriate for your use case

Integration with Other Features​

With Multi-Region (Ripple)​

Each region maintains its own patterns:

  • Region A: Tracks Region A's access patterns
  • Region B: Tracks Region B's access patterns
  • Benefit: Each region optimizes for its users

With File Pool​

Uses global file descriptor management:

  • Pattern file = 1 FD
  • Respects system limits
  • Automatic retry/backoff

With Disk Cache​

Works seamlessly together:

  • Hot records β†’ Memory cache (instant)
  • Warm records β†’ Disk cache (fast)
  • Cold records β†’ Database (acceptable)

Pattern File Reference​

Pattern data is stored on disk and managed automatically. You can access it directly for debugging or analysis:

Pattern File Location​

{database_name}/patterns/query_patterns.log

Pattern Entry Format​

Each line is pipe-delimited:

{collection}|{record_id}|{operation_type}|{timestamp}

Example:

users|user_123|find_by_id|2025-01-22T19:30:00Z
products|prod_456|find|2025-01-22T19:30:01Z

Useful Commands​

# View pattern statistics
wc -l {database_name}/patterns/query_patterns.log # Total entries
ls -lh {database_name}/patterns/query_patterns.log # File size

# Find most accessed collections
cut -d'|' -f1 {database_name}/patterns/query_patterns.log | sort | uniq -c | sort -rn

# Find most accessed records
cut -d'|' -f1,2 {database_name}/patterns/query_patterns.log | sort | uniq -c | sort -rn | head -20

Summary​

Query pattern tracking with intelligent cache warming is a game-changer for cold-start performance:

βœ… Automatic - No configuration required βœ… Fast - 92% faster cold starts
βœ… Efficient - Minimal overhead (0.85%) βœ… Smart - Adapts to changing access patterns βœ… Persistent - Patterns survive restarts

Your database automatically learns which data is important and ensures it's ready when you need it.