Ripples - Data Propagation
Ripples enable real-time data propagation across multiple ekoDB instances, providing horizontal scaling, geographic distribution, and high availability through automatic replication.
What are Ripples?
Ripples are ekoDB's built-in data propagation system that automatically replicates database operations (inserts, updates, deletes) across connected instances. Unlike traditional replication that operates at the WAL level, Ripples propagate operations in real-time at the application level.
Key Features:
- Real-Time Propagation: Operations replicate immediately as they occur
- Bidirectional Sync: Nodes can both send and receive ripples
- Selective Replication: Configure which nodes receive which operations
- Loop Prevention: Automatic deduplication prevents infinite loops
- Automatic Failover: Secondary nodes can take over if primary fails
How Ripples Work
When a write operation occurs on a node:
- Operation Executes: Record is inserted/updated/deleted locally
- Ripple Generated: Operation metadata is packaged with unique ID
- Propagation: Ripple is sent to configured peer nodes
- Remote Execution: Peer nodes receive and execute the operation
- Loop Prevention: Request ID prevents duplicate processing
┌─────────────┐
│ Node 1 │ ← Client writes here
│ (Primary) │
└──────┬──────┘
│ Ripple (INSERT)
├───────────────────┐
↓ ↓
┌─────────────┐ ┌─────────────┐
│ Node 2 │ │ Node 3 │
│ (Replica) │ │ (Replica) │
└─────────────┘ └─────────────┘
Ripple Propagation Modes
Ripples support multiple propagation modes for different use cases. The mode field in the configuration determines how data is formatted and transmitted.
Quick Mode Selection Guide
| Mode | Primary Use Case | Destination |
|---|---|---|
| Operations (default) | Multi-node database replication | ekoDB instances |
| WAL | High-volume efficient replication | ekoDB instances |
| Snapshot | Periodic backups, data warehousing | ekoDB, storage systems |
| Search | Full-text search integration | Elasticsearch, Meilisearch |
| Analytics | Business intelligence, reporting | BigQuery, Snowflake |
| Stream | Event-driven architectures | Kafka, RabbitMQ |
| Webhook | Third-party integrations | HTTP endpoints |
| Embedding | Semantic search, AI/ML | Vector databases |
| Chat | AI chatbots, LLM context | OpenAI, custom LLMs |
| Custom | Specialized integrations | Custom systems |
For standard multi-node replication, use Operations mode (the default). Other modes are for specialized integrations with external systems.
Operations Mode (Default)
Purpose: Real-time CRUD operation replication between ekoDB instances
Use Case: Multi-node clusters, horizontal scaling, high availability
How it works: Each insert/update/delete is sent as an individual operation to peer nodes in real-time.
Example Configuration:
POST /api/ripples/config
{
"name": "replica1",
"url": "https://replica1.ekodb.net:8080",
"api_key": "replica1-admin-key",
"mode": "Operations", # Default mode
"enabled": true
}
Best for: Standard database replication, active-active clusters, read replicas
WAL Mode
Purpose: Batch Write-Ahead Log shipping for efficient replication
Use Case: Large-scale replication with lower network overhead
How it works: Instead of sending individual operations, batches WAL entries and sends them periodically. More efficient for high-throughput scenarios.
Example Configuration:
POST /api/ripples/config
{
"name": "wal_replica",
"url": "https://replica.ekodb.net:8080",
"api_key": "replica-admin-key",
"mode": "WAL",
"strategy": "batched",
"options": {
"batch_size": 5000,
"interval_secs": 10
},
"enabled": true
}
Best for: High-volume replication, reducing network traffic, eventual consistency scenarios
Snapshot Mode
Purpose: Periodic full database snapshots
Use Case: Backup nodes, reporting databases, point-in-time recovery
How it works: Sends complete snapshots of collections on a schedule rather than individual operations.
Example Configuration:
POST /api/ripples/config
{
"name": "backup_snapshot",
"url": "https://backup.ekodb.net:8080",
"api_key": "backup-admin-key",
"mode": "Snapshot",
"strategy": "scheduled",
"options": {
"cron": "0 2 * * *", # Daily at 2 AM
"collections": ["users", "orders"]
},
"enabled": true
}
Best for: Backup systems, data warehousing, periodic sync to reporting databases
Search Mode
Purpose: Send data formatted for search indexing
Use Case: Elasticsearch, Meilisearch, or other search engine integration
How it works: Transforms operations into search-optimized format and sends to search indices.
Example Configuration:
POST /api/ripples/config
{
"name": "elasticsearch_index",
"url": "https://elasticsearch.example.com:9200",
"api_key": "es-api-key",
"mode": "Search",
"destination": "elasticsearch",
"transform": {
"include_fields": ["title", "content", "author", "tags"],
"exclude_fields": ["internal_id"]
},
"enabled": true
}
Best for: Full-text search, faceted search, search-as-you-type features
Analytics Mode
Purpose: Send data to analytics platforms
Use Case: BigQuery, Snowflake, data warehouses, BI tools
How it works: Formats data for analytics schemas and sends to analytics platforms in batches.
Example Configuration:
POST /api/ripples/config
{
"name": "bigquery_analytics",
"url": "https://bigquery.googleapis.com/v2/projects/my-project",
"api_key": "bq-service-account-key",
"mode": "Analytics",
"destination": "bigquery",
"filter": {
"collections": ["events", "metrics", "user_activity"]
},
"options": {
"dataset": "production_analytics",
"batch_size": 10000
},
"enabled": true
}
Best for: Business intelligence, data analytics, reporting dashboards
Stream Mode
Purpose: Send data to message streams
Use Case: Kafka, RabbitMQ, event-driven architectures
How it works: Publishes operations as messages to streaming platforms for event processing.
Example Configuration:
POST /api/ripples/config
{
"name": "kafka_stream",
"url": "https://kafka.example.com:9092",
"api_key": "kafka-api-key",
"mode": "Stream",
"destination": "kafka",
"options": {
"topic": "ekodb-events",
"partition_key": "collection_name"
},
"enabled": true
}
Best for: Event sourcing, microservices communication, real-time data pipelines
Webhook Mode
Purpose: Trigger HTTP webhooks on data changes
Use Case: Third-party integrations, automation, notifications
How it works: Sends HTTP POST requests to configured webhook URLs when operations occur.
Example Configuration:
POST /api/ripples/config
{
"name": "slack_notifications",
"url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
"mode": "Webhook",
"filter": {
"collections": ["orders"],
"operations": ["insert"]
},
"transform": {
"transform_fn": "format_slack_message"
},
"enabled": true
}
Best for: Slack/Discord notifications, Zapier integration, custom automation
Embedding Mode
Purpose: Generate and send vector embeddings
Use Case: Vector databases, semantic search, AI/ML pipelines
How it works: Processes text fields, generates embeddings, and sends to vector databases.
Example Configuration:
POST /api/ripples/config
{
"name": "pinecone_vectors",
"url": "https://your-index.pinecone.io",
"api_key": "pinecone-api-key",
"mode": "Embedding",
"destination": "vectordb",
"options": {
"embedding_field": "content",
"embedding_model": "text-embedding-ada-002",
"dimension": 1536
},
"enabled": true
}
Best for: Semantic search, RAG (Retrieval Augmented Generation), similarity matching
Chat Mode
Purpose: Send data to chat/LLM systems
Use Case: AI assistants, chatbots, context-aware LLMs
How it works: Formats data for chat context and sends to LLM providers or chat databases.
Example Configuration:
POST /api/ripples/config
{
"name": "openai_context",
"url": "https://api.openai.com/v1/assistants/asst_xxx/messages",
"api_key": "openai-api-key",
"mode": "Chat",
"filter": {
"collections": ["support_tickets", "knowledge_base"]
},
"transform": {
"include_fields": ["question", "answer", "category"]
},
"enabled": true
}
Best for: AI chatbots, customer support automation, context-aware assistants
Custom Mode
Purpose: User-defined custom propagation logic
Use Case: Specialized integrations, custom data pipelines
How it works: Allows you to define custom propagation behavior via configuration.
Example Configuration:
POST /api/ripples/config
{
"name": "custom_pipeline",
"url": "https://custom-endpoint.example.com/ingest",
"api_key": "custom-api-key",
"mode": {
"Custom": "my_custom_handler"
},
"options": {
"custom_param1": "value1",
"custom_param2": "value2"
},
"enabled": true
}
Best for: Proprietary systems, specialized data transformations, unique integrations
Replication Patterns (Send/Receive/Both)
Beyond the propagation mode, nodes can be configured in different replication patterns:
Send Mode
Node sends ripples to peers but doesn't receive.
Use Case: Primary write nodes that propagate to replicas
Configure each peer separately:
# Add first replica peer
POST /api/ripples/config
{
"name": "replica1",
"url": "https://replica1.ekodb.net:8080",
"api_key": "replica1-admin-key",
"mode": "Operations",
"enabled": true
}
# Add second replica peer
POST /api/ripples/config
{
"name": "replica2",
"url": "https://replica2.ekodb.net:8080",
"api_key": "replica2-admin-key",
"mode": "Operations",
"enabled": true
}
Receive Mode
Node receives ripples from peers but doesn't send.
Use Case: Read replicas that stay synchronized
Note: Receive-only nodes don't need to configure peer connections. They simply process incoming ripple requests from nodes that have them configured as peers.
Both Mode (Full Mesh)
Node both sends and receives ripples.
Use Case: Multi-master deployments, peer-to-peer sync
Configure each peer separately:
# On Node 1: Add Node 2 as peer
POST /api/ripples/config
{
"name": "node2",
"url": "https://node2.ekodb.net:8080",
"api_key": "node2-admin-key",
"mode": "Operations",
"enabled": true
}
# On Node 2: Add Node 1 as peer
POST /api/ripples/config
{
"name": "node1",
"url": "https://node1.ekodb.net:8080",
"api_key": "node1-admin-key",
"mode": "Operations",
"enabled": true
}
None Mode
Node operates independently with no ripples.
Use Case: Isolated analytics nodes, testing environments
No configuration needed: Simply don't add any ripple peers to the node.
Real-World Use Cases
1. Geographic Distribution (Multi-Region)
Scenario: E-commerce platform with users in US, EU, and Asia
Architecture:
- Primary node in each region (both mode for bidirectional sync)
- Cross-region ripples for data consistency
- Read replicas in each region (receive mode)
US Region EU Region Asia Region
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Primary │◄──────────►│ Primary │◄──────────►│ Primary │
│ (both) │ │ (both) │ │ (both) │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
├─────┐ ├────┴────┐ ├────┴────┐
↓ ↓ ↓ ↓ ↓ ↓
┌────────┐┌────────┐┌────────┐┌────────┐┌────────┐┌────────┐
│Replica1││Replica2││Replica1││Replica2││Replica1││Replica2│
│(receive)││(receive)││(receive)││(receive)││(receive)││(receive)│
└────────┘└────────┘└────────┘└────────┘└────────┘└────────┘
Benefits:
- Users read from nearest region (low latency)
- Writes propagate globally (eventual consistency)
- Failover to another region if one goes down
2. Read Scaling (Analytics Workload)
Scenario: SaaS application with heavy reporting queries
Architecture:
- Primary write node with durable WAL (send mode)
- Multiple read replicas with fast WAL (receive mode)
- Analytics nodes isolated from production (none mode)
Production Traffic Analytics Traffic
│ │
↓ ↓
┌─────────────┐ ┌─────────────┐
│ Primary │──────────────►│ Analytics │
│ (send) │ WAL Export │ (none) │
│ Durable WAL │ │ Fast WAL │
└──────┬──────┘ └─────────────┘
│ Ripples
├──────────┬──────────┐
↓ ↓ ↓
┌──────────┐┌──────────┐┌──────────┐
│ Replica1 ││ Replica2 ││ Replica3 │
│(receive) ││(receive) ││(receive) │
│ Fast WAL ││ Fast WAL ││ Fast WAL │
└──────────┘└──────────┘└──────────┘
Benefits:
- Production writes don't slow down (durable primary)
- Replicas handle read traffic (fast WAL, high throughput)
- Analytics isolated (no ripple overhead, batch data loads)
Configuration Example:
# Primary (write node) - Add each replica as a peer
curl -X POST https://primary.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "replica1",
"url": "https://replica1.ekodb.net:8080",
"api_key": "replica1-admin-key",
"mode": "Operations",
"enabled": true
}'
curl -X POST https://primary.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "replica2",
"url": "https://replica2.ekodb.net:8080",
"api_key": "replica2-admin-key",
"mode": "Operations",
"enabled": true
}'
curl -X POST https://primary.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "replica3",
"url": "https://replica3.ekodb.net:8080",
"api_key": "replica3-admin-key",
"mode": "Operations",
"enabled": true
}'
# Read Replicas (no peer config needed - they only receive)
# Analytics node (isolated - no peer config needed)
3. High Availability (Active-Active)
Scenario: Financial application requiring 99.99% uptime
Architecture:
- Two active primary nodes (both mode)
- Automatic failover via load balancer
- Conflict resolution: last-write-wins
┌──────────────┐
│Load Balancer │
└──────┬───────┘
│
┌───┴────┐
↓ ↓
┌────────┐┌────────┐
│Primary1││Primary2│
│ (both) ││ (both) │
└────┬───┘└───┬────┘
└────┬───┘
│ Bidirectional Ripples
Benefits:
- Both nodes accept writes (active-active)
- Automatic failover (no manual intervention)
- Zero downtime for maintenance
Example Workflow:
# Node 1 configuration - Add Node 2 as peer
curl -X POST https://node1.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "node2",
"url": "https://node2.ekodb.net:8080",
"api_key": "node2-admin-key",
"mode": "Operations",
"enabled": true
}'
# Node 2 configuration - Add Node 1 as peer
curl -X POST https://node2.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "node1",
"url": "https://node1.ekodb.net:8080",
"api_key": "node1-admin-key",
"mode": "Operations",
"enabled": true
}'
# Write to Node 1
curl -X POST https://node1.ekodb.net/api/insert/orders \
-H "Authorization: Bearer $TOKEN" \
-d '{"data": {"product": "Widget", "amount": 100}}'
# Immediately available on Node 2 via ripple
curl -X POST https://node2.ekodb.net/api/query/orders \
-H "Authorization: Bearer $TOKEN" \
-d '{"filter": {"type": "Condition", "content": {"field": "product", "operator": "Equals", "value": "Widget"}}}'
4. Data Ingestion Pipeline
Scenario: IoT platform collecting sensor data from thousands of devices
Architecture:
- Ingestion nodes accept writes only (send mode)
- Storage nodes receive and persist (receive mode, durable WAL)
- Processing nodes compute analytics (receive mode)
IoT Devices
├────┬────┬────┐
↓ ↓ ↓ ↓
┌────┐┌────┐┌────┐┌────┐
│Ing1││Ing2││Ing3││Ing4│ ← Fast WAL (write-optimized)
│send││send││send││send│
└──┬─┘└──┬─┘└──┬─┘└──┬─┘
│ Ripples │ │
└──────┬──────┴─────┘
│
┌─────┼─────┐
↓ ↓ ↓
┌────────┐┌────────┐┌────────┐
│Storage1││Storage2││Process │ ← Durable WAL (data safety)
│receive ││receive ││receive │
└────────┘└────────┘└────────┘
Benefits:
- Ingestion nodes optimized for write throughput
- Storage nodes ensure data durability
- Processing nodes compute real-time analytics
- Horizontal scaling at each layer
5. Development/Staging Environments
Scenario: Testing environment that mirrors production data
Architecture:
- Production primary (send mode)
- Staging replica (receive mode)
- Staging accepts test writes locally (send mode disabled)
Production Staging
┌──────────┐ ┌──────────┐
│ Primary │──────────────►│ Replica │
│ (send) │ Ripples │(receive) │
└──────────┘ └──────────┘
↑
│ Test writes
│ (not rippled back)
Benefits:
- Staging has real production data
- Test writes don't affect production
- Safe environment for testing features
Configuring Ripples
Basic Configuration
Configure ripples on a node via the ripples API. Each peer requires a separate configuration:
POST https://{EKODB_API_URL}/api/ripples/config
Authorization: Bearer {ADMIN_TOKEN}
Content-Type: application/json
{
"name": "peer1",
"url": "https://peer1.ekodb.net:8080",
"api_key": "peer1-admin-key",
"mode": "Operations",
"enabled": true
}
POST https://{EKODB_API_URL}/api/ripples/config
Authorization: Bearer {ADMIN_TOKEN}
Content-Type: application/json
{
"name": "peer2",
"url": "https://peer2.ekodb.net:8080",
"api_key": "peer2-admin-key",
"mode": "Operations",
"enabled": true
}
List Configured Ripples
GET https://{EKODB_API_URL}/api/ripples/list
Authorization: Bearer {ADMIN_TOKEN}
# Response
{
"ripples": [
{
"url": "https://peer1.ekodb.net:8080",
"enabled": true,
"status": "healthy",
"last_sync": "2024-01-15T15:45:30Z"
},
{
"url": "https://peer2.ekodb.net:8080",
"enabled": true,
"status": "healthy",
"last_sync": "2024-01-15T15:45:28Z"
}
]
}
Remove Ripple Configuration
DELETE https://{EKODB_API_URL}/api/ripples/config
Authorization: Bearer {ADMIN_TOKEN}
Bypassing Ripples
For bulk imports or maintenance operations, you can bypass ripple propagation:
Single Operations
POST https://{EKODB_API_URL}/api/insert/users?bypass_ripple=true
Authorization: Bearer {TOKEN}
Content-Type: application/json
{
"data": {
"name": "John Doe",
"email": "john@example.com"
}
}
Batch Operations
POST https://{EKODB_API_URL}/api/batch/insert/users?bypass_ripple=true
Authorization: Bearer {TOKEN}
Content-Type: application/json
{
"inserts": [
{"data": {"name": "User 1"}},
{"data": {"name": "User 2"}},
{"data": {"name": "User 3"}}
]
}
When to Bypass:
- Initial data seeding
- Large bulk imports
- Database migrations
- Maintenance operations
- Temporary disconnection scenarios
Loop Prevention
ekoDB implements comprehensive loop prevention to avoid infinite ripple cycles in multi-node deployments:
Request Tracking
Every operation gets a unique identifier:
Operation Metadata:
- request_id: UUID (e.g., "a3f5b8c9-d2e1-f4a7-1234-567890abcdef")
- origin_node_id: Node that originated the operation
- hop_count: Number of hops traversed (max 10)
HTTP Headers
Ripple metadata is transmitted via headers:
POST https://peer.ekodb.net/api/insert/users
Authorization: Bearer {TOKEN}
X-Ripple-Request-ID: a3f5b8c9-d2e1-f4a7-1234-567890abcdef
X-Ripple-Origin-Node: node1-primary
X-Ripple-Hop-Count: 0
Deduplication Cache
Each node maintains an in-memory cache:
- Tracks processed request IDs
- 10-minute TTL per entry
- Automatic cleanup
- Prevents duplicate processing
Example Flow (3-Node Full Mesh)
1. Client writes to Node 1
2. Node 1: Generate UUID, set origin=node1, hop=0
3. Node 1: Process locally, cache UUID
4. Node 1: Ripple to Node 2 and Node 3
5. Node 2: Receive ripple
- Check: origin ≠ self ✓
- Check: UUID not in cache ✓
- Check: hop_count < 10 ✓
- Process operation, cache UUID
- DO NOT re-propagate (only origin propagates)
6. Node 3: Same as Node 2
Result: Operation processed once per node, no loops
Performance Considerations
Ripple Overhead
Minimal for typical workloads:
- Request ID generation: ~1µs
- Header addition: ~500 bytes
- Cache lookup: ~2µs
- Network latency: Depends on topology
Batch operations scale well:
- Single ripple for entire batch
- Amortized overhead across records
- 13M+ records/sec with ripples enabled
Network Topology
Full Mesh (N nodes):
- Each write ripples to (N-1) peers
- Best for: Small clusters (2-5 nodes)
- Network traffic: O(N²)
Hub-and-Spoke:
- Primary sends to all replicas
- Replicas don't send to each other
- Best for: Read scaling (1 primary, N replicas)
- Network traffic: O(N)
Hybrid:
- Primaries in full mesh
- Replicas receive only
- Best for: Multi-region with local replicas
- Network traffic: O(P²) + O(R) where P=primaries, R=replicas
Monitoring Ripples
Check Ripple Status
curl -X GET https://{EKODB_API_URL}/api/ripples/list \
-H "Authorization: Bearer {ADMIN_TOKEN}"
Monitor Replication Lag
# Check WAL health for replication status
curl -X GET https://{EKODB_API_URL}/api/wal/health \
-H "Authorization: Bearer {ADMIN_TOKEN}" \
| jq '.last_entry_timestamp'
Health Checks
# Verify all peers are healthy
for peer in peer1 peer2 peer3; do
echo "Checking $peer..."
curl -s https://$peer.ekodb.net/api/health | jq '.status'
done
Best Practices
1. Configure Primary Nodes to Send to Replicas
Why: Prevents write conflicts and simplifies conflict resolution
Add each replica as a peer on the primary node:
POST /api/ripples/config
{
"name": "replica1",
"url": "https://replica1.ekodb.net:8080",
"api_key": "replica1-admin-key",
"mode": "Operations",
"enabled": true
}
2. Bypass Ripples for Bulk Operations
Why: Reduces network overhead during large imports
# Initial data load
curl -X POST .../api/batch/insert/users?bypass_ripple=true \
-d '{"inserts": [...]}'
# After import, sync manually via WAL
3. Monitor Ripple Health
Why: Detect failures early, prevent data divergence
# Daily health check
*/5 * * * * /scripts/check_ripple_health.sh
4. Read Replicas Need No Configuration
Why: Prevents accidental writes from replicating back
Read replicas simply receive ripples from primary nodes that have them configured as peers. No peer configuration is needed on the replica itself.
5. Isolate Analytics Nodes
Why: Prevents ripple overhead from impacting analytics queries
Simply don't add any ripple peers to analytics nodes. They will operate independently with no replication overhead.
Troubleshooting
Ripples Not Propagating
Check ripple configuration:
curl -X GET https://{EKODB_API_URL}/api/ripples/list \
-H "Authorization: Bearer {ADMIN_TOKEN}"
Verify network connectivity:
# Test from source node to peer
curl -k https://peer.ekodb.net:8080/api/health
Check authentication:
# Ensure peer has valid admin token
curl -X POST https://peer.ekodb.net/api/auth/token \
-d '{"api_key": "your-admin-key"}'
Replication Lag
Check WAL entries:
# Compare timestamps between primary and replica
curl -s https://primary.ekodb.net/api/wal/health | jq '.last_entry_timestamp'
curl -s https://replica.ekodb.net/api/wal/health | jq '.last_entry_timestamp'
Monitor network latency:
ping peer.ekodb.net
Duplicate Operations
Verify loop prevention:
# Check that nodes have unique node IDs
curl -s https://node1.ekodb.net/api/health | jq '.node_id'
curl -s https://node2.ekodb.net/api/health | jq '.node_id'
Check request ID cache:
# Ensure cache is functioning
# Recent operations should have cached request IDs
Security
Authentication
All ripple endpoints require admin authentication:
Authorization: Bearer {ADMIN_TOKEN}
TLS/SSL Required
Ripples always use HTTPS:
✓ https://peer.ekodb.net:8080 (secure)
✗ http://peer.ekodb.net:8080 (rejected)
Network Isolation
Recommended: Use private networks (VPC) for ripple traffic
Production: 10.0.0.0/16 (VPC)
Peer URLs: https://10.0.0.2:8080, https://10.0.0.3:8080
Related Documentation
- System Administration - WAL management and replication
- Batch Operations - Bypass ripples for bulk operations
- Basic Operations - Single-record ripple behavior
- White Paper - Architecture deep dive
For questions or support with ripple configuration, contact support@ekodb.io