Skip to main content

Ripples - Data Propagation

Ripples enable real-time data propagation across multiple ekoDB instances, providing horizontal scaling, geographic distribution, and high availability through automatic replication.

What are Ripples?

Ripples are ekoDB's built-in data propagation system that automatically replicates database operations (inserts, updates, deletes) across connected instances. Unlike traditional replication that operates at the WAL level, Ripples propagate operations in real-time at the application level.

Key Features:

  • Real-Time Propagation: Operations replicate immediately as they occur
  • Bidirectional Sync: Nodes can both send and receive ripples
  • Selective Replication: Configure which nodes receive which operations
  • Loop Prevention: Automatic deduplication prevents infinite loops
  • Automatic Failover: Secondary nodes can take over if primary fails

How Ripples Work

When a write operation occurs on a node:

  1. Operation Executes: Record is inserted/updated/deleted locally
  2. Ripple Generated: Operation metadata is packaged with unique ID
  3. Propagation: Ripple is sent to configured peer nodes
  4. Remote Execution: Peer nodes receive and execute the operation
  5. Loop Prevention: Request ID prevents duplicate processing
┌─────────────┐
│ Node 1 │ ← Client writes here
│ (Primary) │
└──────┬──────┘
│ Ripple (INSERT)
├───────────────────┐
↓ ↓
┌─────────────┐ ┌─────────────┐
│ Node 2 │ │ Node 3 │
│ (Replica) │ │ (Replica) │
└─────────────┘ └─────────────┘

Ripple Propagation Modes

Ripples support multiple propagation modes for different use cases. The mode field in the configuration determines how data is formatted and transmitted.

Quick Mode Selection Guide

ModePrimary Use CaseDestination
Operations (default)Multi-node database replicationekoDB instances
WALHigh-volume efficient replicationekoDB instances
SnapshotPeriodic backups, data warehousingekoDB, storage systems
SearchFull-text search integrationElasticsearch, Meilisearch
AnalyticsBusiness intelligence, reportingBigQuery, Snowflake
StreamEvent-driven architecturesKafka, RabbitMQ
WebhookThird-party integrationsHTTP endpoints
EmbeddingSemantic search, AI/MLVector databases
ChatAI chatbots, LLM contextOpenAI, custom LLMs
CustomSpecialized integrationsCustom systems
tip

For standard multi-node replication, use Operations mode (the default). Other modes are for specialized integrations with external systems.

Operations Mode (Default)

Purpose: Real-time CRUD operation replication between ekoDB instances

Use Case: Multi-node clusters, horizontal scaling, high availability

How it works: Each insert/update/delete is sent as an individual operation to peer nodes in real-time.

Example Configuration:

POST /api/ripples/config
{
"name": "replica1",
"url": "https://replica1.ekodb.net:8080",
"api_key": "replica1-admin-key",
"mode": "Operations", # Default mode
"enabled": true
}

Best for: Standard database replication, active-active clusters, read replicas


WAL Mode

Purpose: Batch Write-Ahead Log shipping for efficient replication

Use Case: Large-scale replication with lower network overhead

How it works: Instead of sending individual operations, batches WAL entries and sends them periodically. More efficient for high-throughput scenarios.

Example Configuration:

POST /api/ripples/config
{
"name": "wal_replica",
"url": "https://replica.ekodb.net:8080",
"api_key": "replica-admin-key",
"mode": "WAL",
"strategy": "batched",
"options": {
"batch_size": 5000,
"interval_secs": 10
},
"enabled": true
}

Best for: High-volume replication, reducing network traffic, eventual consistency scenarios


Snapshot Mode

Purpose: Periodic full database snapshots

Use Case: Backup nodes, reporting databases, point-in-time recovery

How it works: Sends complete snapshots of collections on a schedule rather than individual operations.

Example Configuration:

POST /api/ripples/config
{
"name": "backup_snapshot",
"url": "https://backup.ekodb.net:8080",
"api_key": "backup-admin-key",
"mode": "Snapshot",
"strategy": "scheduled",
"options": {
"cron": "0 2 * * *", # Daily at 2 AM
"collections": ["users", "orders"]
},
"enabled": true
}

Best for: Backup systems, data warehousing, periodic sync to reporting databases


Search Mode

Purpose: Send data formatted for search indexing

Use Case: Elasticsearch, Meilisearch, or other search engine integration

How it works: Transforms operations into search-optimized format and sends to search indices.

Example Configuration:

POST /api/ripples/config
{
"name": "elasticsearch_index",
"url": "https://elasticsearch.example.com:9200",
"api_key": "es-api-key",
"mode": "Search",
"destination": "elasticsearch",
"transform": {
"include_fields": ["title", "content", "author", "tags"],
"exclude_fields": ["internal_id"]
},
"enabled": true
}

Best for: Full-text search, faceted search, search-as-you-type features


Analytics Mode

Purpose: Send data to analytics platforms

Use Case: BigQuery, Snowflake, data warehouses, BI tools

How it works: Formats data for analytics schemas and sends to analytics platforms in batches.

Example Configuration:

POST /api/ripples/config
{
"name": "bigquery_analytics",
"url": "https://bigquery.googleapis.com/v2/projects/my-project",
"api_key": "bq-service-account-key",
"mode": "Analytics",
"destination": "bigquery",
"filter": {
"collections": ["events", "metrics", "user_activity"]
},
"options": {
"dataset": "production_analytics",
"batch_size": 10000
},
"enabled": true
}

Best for: Business intelligence, data analytics, reporting dashboards


Stream Mode

Purpose: Send data to message streams

Use Case: Kafka, RabbitMQ, event-driven architectures

How it works: Publishes operations as messages to streaming platforms for event processing.

Example Configuration:

POST /api/ripples/config
{
"name": "kafka_stream",
"url": "https://kafka.example.com:9092",
"api_key": "kafka-api-key",
"mode": "Stream",
"destination": "kafka",
"options": {
"topic": "ekodb-events",
"partition_key": "collection_name"
},
"enabled": true
}

Best for: Event sourcing, microservices communication, real-time data pipelines


Webhook Mode

Purpose: Trigger HTTP webhooks on data changes

Use Case: Third-party integrations, automation, notifications

How it works: Sends HTTP POST requests to configured webhook URLs when operations occur.

Example Configuration:

POST /api/ripples/config
{
"name": "slack_notifications",
"url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
"mode": "Webhook",
"filter": {
"collections": ["orders"],
"operations": ["insert"]
},
"transform": {
"transform_fn": "format_slack_message"
},
"enabled": true
}

Best for: Slack/Discord notifications, Zapier integration, custom automation


Embedding Mode

Purpose: Generate and send vector embeddings

Use Case: Vector databases, semantic search, AI/ML pipelines

How it works: Processes text fields, generates embeddings, and sends to vector databases.

Example Configuration:

POST /api/ripples/config
{
"name": "pinecone_vectors",
"url": "https://your-index.pinecone.io",
"api_key": "pinecone-api-key",
"mode": "Embedding",
"destination": "vectordb",
"options": {
"embedding_field": "content",
"embedding_model": "text-embedding-ada-002",
"dimension": 1536
},
"enabled": true
}

Best for: Semantic search, RAG (Retrieval Augmented Generation), similarity matching


Chat Mode

Purpose: Send data to chat/LLM systems

Use Case: AI assistants, chatbots, context-aware LLMs

How it works: Formats data for chat context and sends to LLM providers or chat databases.

Example Configuration:

POST /api/ripples/config
{
"name": "openai_context",
"url": "https://api.openai.com/v1/assistants/asst_xxx/messages",
"api_key": "openai-api-key",
"mode": "Chat",
"filter": {
"collections": ["support_tickets", "knowledge_base"]
},
"transform": {
"include_fields": ["question", "answer", "category"]
},
"enabled": true
}

Best for: AI chatbots, customer support automation, context-aware assistants


Custom Mode

Purpose: User-defined custom propagation logic

Use Case: Specialized integrations, custom data pipelines

How it works: Allows you to define custom propagation behavior via configuration.

Example Configuration:

POST /api/ripples/config
{
"name": "custom_pipeline",
"url": "https://custom-endpoint.example.com/ingest",
"api_key": "custom-api-key",
"mode": {
"Custom": "my_custom_handler"
},
"options": {
"custom_param1": "value1",
"custom_param2": "value2"
},
"enabled": true
}

Best for: Proprietary systems, specialized data transformations, unique integrations


Replication Patterns (Send/Receive/Both)

Beyond the propagation mode, nodes can be configured in different replication patterns:

Send Mode

Node sends ripples to peers but doesn't receive.

Use Case: Primary write nodes that propagate to replicas

Configure each peer separately:

# Add first replica peer
POST /api/ripples/config
{
"name": "replica1",
"url": "https://replica1.ekodb.net:8080",
"api_key": "replica1-admin-key",
"mode": "Operations",
"enabled": true
}

# Add second replica peer
POST /api/ripples/config
{
"name": "replica2",
"url": "https://replica2.ekodb.net:8080",
"api_key": "replica2-admin-key",
"mode": "Operations",
"enabled": true
}

Receive Mode

Node receives ripples from peers but doesn't send.

Use Case: Read replicas that stay synchronized

Note: Receive-only nodes don't need to configure peer connections. They simply process incoming ripple requests from nodes that have them configured as peers.

Both Mode (Full Mesh)

Node both sends and receives ripples.

Use Case: Multi-master deployments, peer-to-peer sync

Configure each peer separately:

# On Node 1: Add Node 2 as peer
POST /api/ripples/config
{
"name": "node2",
"url": "https://node2.ekodb.net:8080",
"api_key": "node2-admin-key",
"mode": "Operations",
"enabled": true
}

# On Node 2: Add Node 1 as peer
POST /api/ripples/config
{
"name": "node1",
"url": "https://node1.ekodb.net:8080",
"api_key": "node1-admin-key",
"mode": "Operations",
"enabled": true
}

None Mode

Node operates independently with no ripples.

Use Case: Isolated analytics nodes, testing environments

No configuration needed: Simply don't add any ripple peers to the node.

Real-World Use Cases

1. Geographic Distribution (Multi-Region)

Scenario: E-commerce platform with users in US, EU, and Asia

Architecture:

  • Primary node in each region (both mode for bidirectional sync)
  • Cross-region ripples for data consistency
  • Read replicas in each region (receive mode)
US Region                EU Region               Asia Region
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Primary │◄──────────►│ Primary │◄──────────►│ Primary │
│ (both) │ │ (both) │ │ (both) │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
├─────┐ ├────┴────┐ ├────┴────┐
↓ ↓ ↓ ↓ ↓ ↓
┌────────┐┌────────┐┌────────┐┌────────┐┌────────┐┌────────┐
│Replica1││Replica2││Replica1││Replica2││Replica1││Replica2│
│(receive)││(receive)││(receive)││(receive)││(receive)││(receive)│
└────────┘└────────┘└────────┘└────────┘└────────┘└────────┘

Benefits:

  • Users read from nearest region (low latency)
  • Writes propagate globally (eventual consistency)
  • Failover to another region if one goes down

2. Read Scaling (Analytics Workload)

Scenario: SaaS application with heavy reporting queries

Architecture:

  • Primary write node with durable WAL (send mode)
  • Multiple read replicas with fast WAL (receive mode)
  • Analytics nodes isolated from production (none mode)
Production Traffic            Analytics Traffic
│ │
↓ ↓
┌─────────────┐ ┌─────────────┐
│ Primary │──────────────►│ Analytics │
│ (send) │ WAL Export │ (none) │
│ Durable WAL │ │ Fast WAL │
└──────┬──────┘ └─────────────┘
│ Ripples
├──────────┬──────────┐
↓ ↓ ↓
┌──────────┐┌──────────┐┌──────────┐
│ Replica1 ││ Replica2 ││ Replica3 │
│(receive) ││(receive) ││(receive) │
│ Fast WAL ││ Fast WAL ││ Fast WAL │
└──────────┘└──────────┘└──────────┘

Benefits:

  • Production writes don't slow down (durable primary)
  • Replicas handle read traffic (fast WAL, high throughput)
  • Analytics isolated (no ripple overhead, batch data loads)

Configuration Example:

# Primary (write node) - Add each replica as a peer
curl -X POST https://primary.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "replica1",
"url": "https://replica1.ekodb.net:8080",
"api_key": "replica1-admin-key",
"mode": "Operations",
"enabled": true
}'

curl -X POST https://primary.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "replica2",
"url": "https://replica2.ekodb.net:8080",
"api_key": "replica2-admin-key",
"mode": "Operations",
"enabled": true
}'

curl -X POST https://primary.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "replica3",
"url": "https://replica3.ekodb.net:8080",
"api_key": "replica3-admin-key",
"mode": "Operations",
"enabled": true
}'

# Read Replicas (no peer config needed - they only receive)
# Analytics node (isolated - no peer config needed)

3. High Availability (Active-Active)

Scenario: Financial application requiring 99.99% uptime

Architecture:

  • Two active primary nodes (both mode)
  • Automatic failover via load balancer
  • Conflict resolution: last-write-wins
┌──────────────┐
│Load Balancer │
└──────┬───────┘

┌───┴────┐
↓ ↓
┌────────┐┌────────┐
│Primary1││Primary2│
│ (both) ││ (both) │
└────┬───┘└───┬────┘
└────┬───┘
│ Bidirectional Ripples

Benefits:

  • Both nodes accept writes (active-active)
  • Automatic failover (no manual intervention)
  • Zero downtime for maintenance

Example Workflow:

# Node 1 configuration - Add Node 2 as peer
curl -X POST https://node1.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "node2",
"url": "https://node2.ekodb.net:8080",
"api_key": "node2-admin-key",
"mode": "Operations",
"enabled": true
}'

# Node 2 configuration - Add Node 1 as peer
curl -X POST https://node2.ekodb.net/api/ripples/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "node1",
"url": "https://node1.ekodb.net:8080",
"api_key": "node1-admin-key",
"mode": "Operations",
"enabled": true
}'

# Write to Node 1
curl -X POST https://node1.ekodb.net/api/insert/orders \
-H "Authorization: Bearer $TOKEN" \
-d '{"data": {"product": "Widget", "amount": 100}}'

# Immediately available on Node 2 via ripple
curl -X POST https://node2.ekodb.net/api/query/orders \
-H "Authorization: Bearer $TOKEN" \
-d '{"filter": {"type": "Condition", "content": {"field": "product", "operator": "Equals", "value": "Widget"}}}'

4. Data Ingestion Pipeline

Scenario: IoT platform collecting sensor data from thousands of devices

Architecture:

  • Ingestion nodes accept writes only (send mode)
  • Storage nodes receive and persist (receive mode, durable WAL)
  • Processing nodes compute analytics (receive mode)
IoT Devices
├────┬────┬────┐
↓ ↓ ↓ ↓
┌────┐┌────┐┌────┐┌────┐
│Ing1││Ing2││Ing3││Ing4│ ← Fast WAL (write-optimized)
│send││send││send││send│
└──┬─┘└──┬─┘└──┬─┘└──┬─┘
│ Ripples │ │
└──────┬──────┴─────┘

┌─────┼─────┐
↓ ↓ ↓
┌────────┐┌────────┐┌────────┐
│Storage1││Storage2││Process │ ← Durable WAL (data safety)
│receive ││receive ││receive │
└────────┘└────────┘└────────┘

Benefits:

  • Ingestion nodes optimized for write throughput
  • Storage nodes ensure data durability
  • Processing nodes compute real-time analytics
  • Horizontal scaling at each layer

5. Development/Staging Environments

Scenario: Testing environment that mirrors production data

Architecture:

  • Production primary (send mode)
  • Staging replica (receive mode)
  • Staging accepts test writes locally (send mode disabled)
Production                  Staging
┌──────────┐ ┌──────────┐
│ Primary │──────────────►│ Replica │
│ (send) │ Ripples │(receive) │
└──────────┘ └──────────┘

│ Test writes
│ (not rippled back)

Benefits:

  • Staging has real production data
  • Test writes don't affect production
  • Safe environment for testing features

Configuring Ripples

Basic Configuration

Configure ripples on a node via the ripples API. Each peer requires a separate configuration:

POST https://{EKODB_API_URL}/api/ripples/config
Authorization: Bearer {ADMIN_TOKEN}
Content-Type: application/json

{
"name": "peer1",
"url": "https://peer1.ekodb.net:8080",
"api_key": "peer1-admin-key",
"mode": "Operations",
"enabled": true
}
POST https://{EKODB_API_URL}/api/ripples/config
Authorization: Bearer {ADMIN_TOKEN}
Content-Type: application/json

{
"name": "peer2",
"url": "https://peer2.ekodb.net:8080",
"api_key": "peer2-admin-key",
"mode": "Operations",
"enabled": true
}

List Configured Ripples

GET https://{EKODB_API_URL}/api/ripples/list
Authorization: Bearer {ADMIN_TOKEN}

# Response
{
"ripples": [
{
"url": "https://peer1.ekodb.net:8080",
"enabled": true,
"status": "healthy",
"last_sync": "2024-01-15T15:45:30Z"
},
{
"url": "https://peer2.ekodb.net:8080",
"enabled": true,
"status": "healthy",
"last_sync": "2024-01-15T15:45:28Z"
}
]
}

Remove Ripple Configuration

DELETE https://{EKODB_API_URL}/api/ripples/config
Authorization: Bearer {ADMIN_TOKEN}

Bypassing Ripples

For bulk imports or maintenance operations, you can bypass ripple propagation:

Single Operations

POST https://{EKODB_API_URL}/api/insert/users?bypass_ripple=true
Authorization: Bearer {TOKEN}
Content-Type: application/json

{
"data": {
"name": "John Doe",
"email": "john@example.com"
}
}

Batch Operations

POST https://{EKODB_API_URL}/api/batch/insert/users?bypass_ripple=true
Authorization: Bearer {TOKEN}
Content-Type: application/json

{
"inserts": [
{"data": {"name": "User 1"}},
{"data": {"name": "User 2"}},
{"data": {"name": "User 3"}}
]
}

When to Bypass:

  • Initial data seeding
  • Large bulk imports
  • Database migrations
  • Maintenance operations
  • Temporary disconnection scenarios

Loop Prevention

ekoDB implements comprehensive loop prevention to avoid infinite ripple cycles in multi-node deployments:

Request Tracking

Every operation gets a unique identifier:

Operation Metadata:
- request_id: UUID (e.g., "a3f5b8c9-d2e1-f4a7-1234-567890abcdef")
- origin_node_id: Node that originated the operation
- hop_count: Number of hops traversed (max 10)

HTTP Headers

Ripple metadata is transmitted via headers:

POST https://peer.ekodb.net/api/insert/users
Authorization: Bearer {TOKEN}
X-Ripple-Request-ID: a3f5b8c9-d2e1-f4a7-1234-567890abcdef
X-Ripple-Origin-Node: node1-primary
X-Ripple-Hop-Count: 0

Deduplication Cache

Each node maintains an in-memory cache:

  • Tracks processed request IDs
  • 10-minute TTL per entry
  • Automatic cleanup
  • Prevents duplicate processing

Example Flow (3-Node Full Mesh)

1. Client writes to Node 1
2. Node 1: Generate UUID, set origin=node1, hop=0
3. Node 1: Process locally, cache UUID
4. Node 1: Ripple to Node 2 and Node 3

5. Node 2: Receive ripple
- Check: origin ≠ self ✓
- Check: UUID not in cache ✓
- Check: hop_count < 10 ✓
- Process operation, cache UUID
- DO NOT re-propagate (only origin propagates)

6. Node 3: Same as Node 2

Result: Operation processed once per node, no loops

Performance Considerations

Ripple Overhead

Minimal for typical workloads:

  • Request ID generation: ~1µs
  • Header addition: ~500 bytes
  • Cache lookup: ~2µs
  • Network latency: Depends on topology

Batch operations scale well:

  • Single ripple for entire batch
  • Amortized overhead across records
  • 13M+ records/sec with ripples enabled

Network Topology

Full Mesh (N nodes):

  • Each write ripples to (N-1) peers
  • Best for: Small clusters (2-5 nodes)
  • Network traffic: O(N²)

Hub-and-Spoke:

  • Primary sends to all replicas
  • Replicas don't send to each other
  • Best for: Read scaling (1 primary, N replicas)
  • Network traffic: O(N)

Hybrid:

  • Primaries in full mesh
  • Replicas receive only
  • Best for: Multi-region with local replicas
  • Network traffic: O(P²) + O(R) where P=primaries, R=replicas

Monitoring Ripples

Check Ripple Status

curl -X GET https://{EKODB_API_URL}/api/ripples/list \
-H "Authorization: Bearer {ADMIN_TOKEN}"

Monitor Replication Lag

# Check WAL health for replication status
curl -X GET https://{EKODB_API_URL}/api/wal/health \
-H "Authorization: Bearer {ADMIN_TOKEN}" \
| jq '.last_entry_timestamp'

Health Checks

# Verify all peers are healthy
for peer in peer1 peer2 peer3; do
echo "Checking $peer..."
curl -s https://$peer.ekodb.net/api/health | jq '.status'
done

Best Practices

1. Configure Primary Nodes to Send to Replicas

Why: Prevents write conflicts and simplifies conflict resolution

Add each replica as a peer on the primary node:

POST /api/ripples/config
{
"name": "replica1",
"url": "https://replica1.ekodb.net:8080",
"api_key": "replica1-admin-key",
"mode": "Operations",
"enabled": true
}

2. Bypass Ripples for Bulk Operations

Why: Reduces network overhead during large imports

# Initial data load
curl -X POST .../api/batch/insert/users?bypass_ripple=true \
-d '{"inserts": [...]}'

# After import, sync manually via WAL

3. Monitor Ripple Health

Why: Detect failures early, prevent data divergence

# Daily health check
*/5 * * * * /scripts/check_ripple_health.sh

4. Read Replicas Need No Configuration

Why: Prevents accidental writes from replicating back

Read replicas simply receive ripples from primary nodes that have them configured as peers. No peer configuration is needed on the replica itself.

5. Isolate Analytics Nodes

Why: Prevents ripple overhead from impacting analytics queries

Simply don't add any ripple peers to analytics nodes. They will operate independently with no replication overhead.

Troubleshooting

Ripples Not Propagating

Check ripple configuration:

curl -X GET https://{EKODB_API_URL}/api/ripples/list \
-H "Authorization: Bearer {ADMIN_TOKEN}"

Verify network connectivity:

# Test from source node to peer
curl -k https://peer.ekodb.net:8080/api/health

Check authentication:

# Ensure peer has valid admin token
curl -X POST https://peer.ekodb.net/api/auth/token \
-d '{"api_key": "your-admin-key"}'

Replication Lag

Check WAL entries:

# Compare timestamps between primary and replica
curl -s https://primary.ekodb.net/api/wal/health | jq '.last_entry_timestamp'
curl -s https://replica.ekodb.net/api/wal/health | jq '.last_entry_timestamp'

Monitor network latency:

ping peer.ekodb.net

Duplicate Operations

Verify loop prevention:

# Check that nodes have unique node IDs
curl -s https://node1.ekodb.net/api/health | jq '.node_id'
curl -s https://node2.ekodb.net/api/health | jq '.node_id'

Check request ID cache:

# Ensure cache is functioning
# Recent operations should have cached request IDs

Security

Authentication

All ripple endpoints require admin authentication:

Authorization: Bearer {ADMIN_TOKEN}

TLS/SSL Required

Ripples always use HTTPS:

✓ https://peer.ekodb.net:8080  (secure)
✗ http://peer.ekodb.net:8080 (rejected)

Network Isolation

Recommended: Use private networks (VPC) for ripple traffic

Production: 10.0.0.0/16 (VPC)
Peer URLs: https://10.0.0.2:8080, https://10.0.0.3:8080

For questions or support with ripple configuration, contact support@ekodb.io