Skip to main content

Chat & RAG (Retrieval-Augmented Generation)

Build intelligent conversational applications that combine LLMs with your data for context-aware, accurate responses.

Integrated AI

ekoDB provides built-in chat session management and RAG capabilities - no separate infrastructure needed.

Quick Start

use ekodb_client::{Client, CreateChatSessionRequest, ChatMessageRequest, CollectionConfig};

let client = Client::builder()
.base_url("https://my-first-db.development.google.ekodb.net")
.api_key("your-api-key")
.build()?;

// 1. Create a chat session
let session = client.create_chat_session(CreateChatSessionRequest {
collections: vec![CollectionConfig {
collection_name: "knowledge_base".to_string(),
fields: vec![],
search_options: None,
}],
llm_provider: "openai".to_string(),
llm_model: Some("gpt-4".to_string()),
system_prompt: Some("You are a helpful assistant.".to_string()),
..Default::default()
}).await?;

// 2. Send a message
let response = client.chat_message(
&session.chat_id,
ChatMessageRequest::new("How do I optimize database queries?")
).await?;

println!("AI: {:?}", response.responses);

Core Concepts

Centralized Architecture

All chat sessions and messages are stored in two database-wide collections:

  • chat_configurations_{database} - Session metadata and configuration
  • chat_messages_{database} - All messages from all sessions

Benefits:

  • ✅ Scalable to millions of sessions
  • ✅ No per-session collection management
  • ✅ Easy cross-session querying
  • ✅ Simplified data model

Chat Sessions

A chat session represents a conversation thread:

{
id: 'session_uuid',
llm_provider: 'openai', // or 'anthropic', 'perplexity'
llm_model: 'gpt-4',
collections: [...], // Data sources to search
system_prompt: '...',
max_context_messages: 10,
created_at: '2025-01-22T...',
updated_at: '2025-01-22T...',
parent_id: null, // For branching conversations
branch_point_idx: null,
title: null // User-set session title
}

Message Flow

User Message

Search Collections (Semantic + Text)

Retrieve Relevant Context

Build Prompt (System + Context + History + User Message)

LLM Generation

Store User Message + AI Response

Return Response with Context

Creating Chat Sessions

Basic Chat Session

const session = await client.createChatSession({
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt: "You are a helpful assistant.",
});

RAG Chat Session

const session = await client.createChatSession({
collections: [
{
collection_name: "documentation",
fields: [
{
field_name: "content",
search_options: {
weight: 1.0, // Search relevance weight
language: "english",
},
},
{
field_name: "title",
search_options: {
weight: 0.5,
},
},
],
},
{
collection_name: "faqs",
fields: ["question", "answer"],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt:
"Answer questions based on the provided documentation and FAQs.",
max_context_messages: 10, // Include last 10 messages in context
});

Multi-Provider Support

// OpenAI
const openaiChat = await client.createChatSession({
llm_provider: "openai",
llm_model: "gpt-4-turbo",
});

// Anthropic
const claudeChat = await client.createChatSession({
llm_provider: "anthropic",
llm_model: "claude-3-opus-20240229",
});

// Perplexity
const perplexityChat = await client.createChatSession({
llm_provider: "perplexity",
llm_model: "pplx-70b-online",
});

Sending Messages

Simple Message

const response = await client.sendChatMessage(sessionId, {
message: "What is ekoDB?",
});

console.log(response.responses); // AI response (array of strings)

With Context

When collections are configured, ekoDB automatically:

  1. Searches collections for relevant context
  2. Ranks results by relevance
  3. Includes context in the LLM prompt
  4. Returns both response and context used
const response = await client.sendChatMessage(sessionId, {
message: 'How do vector searches work?',
});

// Response includes:
{
chat_id: 'session_uuid',
message_id: 'msg_uuid',
responses: ['Vector searches work by...'], // AI response (array of strings)
context_snippets: [ // Retrieved documents
{
collection: 'documentation',
record: { title: 'Vector Search Guide', content: '...' },
score: 0.95,
matched_fields: ['content']
}
],
execution_time_ms: 142,
token_usage: { prompt_tokens: 512, completion_tokens: 88, total_tokens: 600 }
}

Message Management

List Messages

const messages = await client.getChatMessages(sessionId, {
limit: 50,
skip: 0,
sort: "asc", // chronological order
});

Update Message

await client.updateChatMessage(sessionId, messageId, {
content: "Updated message content",
});

Delete Message

await client.deleteChatMessage(sessionId, messageId);

Mark as Forgotten

Exclude specific messages from context window:

await client.toggleMessageForgotten(sessionId, messageId, true);

Regenerate Response

Generate a new AI response for the same user message:

const newResponse = await client.regenerateResponse(sessionId, messageId);

Advanced Features

Branching Conversations

Create alternative conversation paths from any point:

// Branch from message 5 in parent session
const branchSession = await client.createChatSession({
parent_id: parentSessionId,
branch_point_idx: 5, // Branch from 5th message
llm_provider: "openai",
llm_model: "gpt-4",
});

// New session starts with messages 0-5 from parent
// Can explore different conversation paths

Merging Sessions

Combine multiple conversation threads:

const mergedSession = await client.mergeChatSessions({
session_ids: [sessionId1, sessionId2],
strategy: "chronological", // or 'interleaved'
llm_provider: "openai",
llm_model: "gpt-4",
});

History Compaction

Fold a session's older messages into a single summary message to reclaim context-window budget. The most-recent messages are kept verbatim; everything older is summarized and the originals are marked forgotten so they stop being replayed on subsequent turns.

POST /api/chat/{chat_id}/compact
Content-Type: application/json

{
"keep_recent": 50 // Optional. Defaults to the session's
// max_context_messages (or 50). 0 compacts all.
}

Response:

{
folded: 120, // Older messages folded into the summary
kept_recent: 50, // Recent messages kept verbatim
summary_chars: 842, // Length of the inserted summary (0 if none)
summary_message_id: 'msg_uuid', // ID of the synthetic summary message
already_compact: false // true when nothing needed folding
}

Every client library exposes this directly — for example, in TypeScript:

const result = await client.compactChat(chatId, 50); // keepRecent optional
console.log(`Folded ${result.folded}, kept ${result.kept_recent} recent`);

The same method ships in every client: compactChat (TypeScript, Kotlin), compact_chat (Rust, Python), and CompactChat (Go), each returning the response shape above.

Real-World Examples

Customer Support Bot

// 1. Create knowledge base
await client.batchInsert("support_articles", articles);
await client.createIndex("support_articles", ["title", "content"]);

// 2. Create support chat session
const supportSession = await client.createChatSession({
collections: [
{
collection_name: "support_articles",
fields: ["title", "content", "category"],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt: `You are a customer support agent.
Answer questions based on our support documentation.
Be helpful, concise, and professional.`,
});

// 3. Handle customer query
const response = await client.sendChatMessage(supportSession.id, {
message: "How do I reset my password?",
});

// Response includes relevant support articles as context

Document Q&A

// RAG over internal documents
const docSession = await client.createChatSession({
collections: [
{
collection_name: "company_docs",
fields: [
{
field_name: "content",
search_options: { weight: 1.0 },
},
],
},
],
llm_provider: "anthropic",
llm_model: "claude-3-sonnet-20240229",
system_prompt:
"Answer questions about company policies and procedures based on the provided documents.",
});

const answer = await client.sendChatMessage(docSession.id, {
message: "What is our vacation policy?",
});

Code Assistant

// Code documentation chatbot
const codeSession = await client.createChatSession({
collections: [
{
collection_name: "code_docs",
fields: ["description", "code", "examples"],
},
{
collection_name: "api_reference",
fields: ["method", "parameters", "returns"],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt: `You are a code assistant. Help developers by:
- Providing accurate code examples
- Explaining concepts clearly
- Referencing official documentation`,
max_context_messages: 15,
});

Hybrid Search Integration

Combine text search with vector similarity:

// Store embeddings with documents
await client.insert("knowledge_base", {
title: "Vector Search Guide",
content: "Vector search enables...",
embedding: vectorEmbedding, // From OpenAI, Cohere, etc.
});

// Chat session uses hybrid search automatically
const session = await client.createChatSession({
collections: [
{
collection_name: "knowledge_base",
fields: [
{ field_name: "content", search_options: { weight: 0.6 } },
{
field_name: "embedding",
search_options: { weight: 0.4, type: "vector" },
},
],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
});

// Searches use both text relevance and semantic similarity

Performance Optimization

1. Limit Context Messages

const session = await client.createChatSession({
max_context_messages: 5, // Only include last 5 messages
// ... other config
});

2. Use Targeted Collections

// Only search relevant collections
const session = await client.createChatSession({
collections: [
{
collection_name: "recent_docs", // Smaller, focused collection
fields: ["content"],
},
],
// ... other config
});

3. Index Your Data

// Create indexes for faster search
await client.createIndex("knowledge_base", ["title", "content"]);

4. Use Efficient Models

// Balance cost/performance
const session = await client.createChatSession({
llm_provider: "openai",
llm_model: "gpt-3.5-turbo", // Faster, cheaper for simple queries
});

Best Practices

  1. System Prompts: Be specific about behavior and constraints
  2. Context Limits: Balance context quality vs token costs
  3. Collection Design: Structure data for efficient retrieval
  4. Error Handling: Handle LLM failures gracefully
  5. Rate Limiting: Respect provider rate limits
  6. Cost Monitoring: Track token usage and costs
  7. Caching: Cache common responses when appropriate
  8. Testing: Test with real user queries

API Reference

createChatSession()

client.createChatSession(options: {
collections?: CollectionConfig[],
llm_provider: 'openai' | 'anthropic' | 'perplexity',
llm_model: string,
system_prompt?: string,
max_context_messages?: number,
parent_id?: string,
branch_point_idx?: number,
}): Promise<ChatSession>

sendChatMessage()

client.sendChatMessage(
sessionId: string,
options: {
message: string,
}
): Promise<{
content: string,
context: ContextDocument[],
message_id: string,
created_at: string,
}>

getChatMessages()

client.getChatMessages(
sessionId: string,
options?: {
limit?: number,
skip?: number,
sort?: 'asc' | 'desc',
}
): Promise<Message[]>

getChatModels()

Get all available chat models organized by provider:

client.getChatModels(): Promise<Record<string, string[]>>

Response Example:

{
"openai": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo", "gpt-4o"],
"anthropic": ["claude-3-opus-20240229", "claude-3-sonnet-20240229"],
"perplexity": ["llama-3.1-sonar-small-128k-online"]
}

getChatModel()

Get available models for a specific provider:

client.getChatModel(provider: string): Promise<string[]>

Example:

const openaiModels = await client.getChatModel("openai");
// ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo", "gpt-4o", "gpt-4o-mini"]

REST API:

# List all models by provider
GET /api/chat_models

# Get models for a specific provider
GET /api/chat_models/openai

Troubleshooting

No Context Retrieved

Problem: AI responses don't use your data

Solutions:

  • Verify collections are configured correctly
  • Check collection has data
  • Ensure search fields exist
  • Try different search weights

Token Limit Errors

Problem: Context too large for LLM

Solutions:

  • Reduce max_context_messages
  • Limit collection search results
  • Use shorter documents
  • Switch to model with larger context window

Slow Responses

Problem: Chat responses are slow

Solutions:

  • Create indexes on search fields
  • Reduce number of collections searched
  • Use faster LLM model
  • Limit context size

Chat Models API Examples:

  • Rust - client_chat_models.rs
  • Python - client_chat_models.py
  • TypeScript - client_chat_models.ts
  • Go - client_chat_models.go
  • Kotlin - ClientChatModels.kt

Summary

Chat & RAG in ekoDB enables:

Conversational AI - Natural language interactions ✅ Context-aware responses - Answers based on your data ✅ Multi-provider support - OpenAI, Anthropic, Perplexity ✅ Branching conversations - Explore alternative paths ✅ Hybrid search - Text + vector semantic matching ✅ Integrated - No separate infrastructure needed ✅ Production-ready - Scalable and reliable