Chat & RAG (Retrieval-Augmented Generation)
Build intelligent conversational applications that combine LLMs with your data for context-aware, accurate responses.
ekoDB provides built-in chat session management and RAG capabilities - no separate infrastructure needed.
Quick Start
- Client Libraries (Recommended)
- Direct API
- 🦀 Rust
- 🐍 Python
- 📘 TypeScript
- 📦 JavaScript
- 🟣 Kotlin
- 🔷 Go
use ekodb_client::{Client, CreateChatSessionRequest, ChatMessageRequest, CollectionConfig};
let client = Client::builder()
.base_url("https://my-first-db.development.google.ekodb.net")
.api_key("your-api-key")
.build()?;
// 1. Create a chat session
let session = client.create_chat_session(CreateChatSessionRequest {
collections: vec![CollectionConfig {
collection_name: "knowledge_base".to_string(),
fields: vec![],
search_options: None,
}],
llm_provider: "openai".to_string(),
llm_model: Some("gpt-4".to_string()),
system_prompt: Some("You are a helpful assistant.".to_string()),
..Default::default()
}).await?;
// 2. Send a message
let response = client.chat_message(
&session.chat_id,
ChatMessageRequest::new("How do I optimize database queries?")
).await?;
println!("AI: {:?}", response.responses);
from ekodb_client import Client
client = Client.new(
"https://my-first-db.development.google.ekodb.net",
"your-api-key"
)
# 1. Create a chat session
session = await client.create_chat_session(
collections=[{
'collection_name': 'knowledge_base',
'fields': ['content', 'title']
}],
llm_provider='openai',
llm_model='gpt-4',
system_prompt='You are a helpful assistant.'
)
# 2. Send a message - automatically retrieves relevant context
response = await client.chat_message(
session['chat_id'],
'How do I optimize database queries?'
)
print(response['responses']) # AI response
print(response['context_snippets']) # Retrieved documents
import { EkoDBClient } from "@ekodb/ekodb-client";
const client = new EkoDBClient({
baseURL: process.env.EKODB_URL,
apiKey: process.env.EKODB_API_KEY,
});
await client.init();
// 1. Create a chat session
const session = await client.createChatSession({
collections: [
{
collection_name: "knowledge_base",
fields: ["content", "title"],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt:
"You are a helpful assistant that answers questions based on the provided context.",
});
// 2. Send a message - automatically retrieves relevant context
const response = await client.chatMessage(session.chat_id, {
message: "How do I optimize database queries?",
});
console.log(response.responses); // AI response
console.log(response.context_snippets); // Retrieved documents used for context
const { EkoDBClient } = require("@ekodb/ekodb-client");
const client = new EkoDBClient({
baseURL: process.env.EKODB_URL,
apiKey: process.env.EKODB_API_KEY,
});
await client.init();
// 1. Create a chat session
const session = await client.createChatSession({
collections: [
{
collection_name: "knowledge_base",
fields: ["content", "title"],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt: "You are a helpful assistant.",
});
// 2. Send a message
const response = await client.chatMessage(session.chat_id, {
message: "How do I optimize database queries?",
});
console.log(response.responses);
console.log(response.context_snippets);
import io.ekodb.client.EkoDBClient
val client = EkoDBClient.builder()
.baseUrl("https://my-first-db.development.google.ekodb.net")
.apiKey("your-api-key")
.build()
// 1. Create a chat session
val session = client.createChatSession(
collections = listOf(
CollectionConfig(
collectionName = "knowledge_base",
fields = listOf("content", "title")
)
),
llmProvider = "openai",
llmModel = "gpt-4",
systemPrompt = "You are a helpful assistant."
)
// 2. Send a message
val response = client.chatMessage(session.chatId, buildJsonObject {
put("message", "How do I optimize database queries?")
})
println(response["responses"])
import "github.com/ekoDB/ekodb-client-go"
client := ekodb.NewClient(
"https://my-first-db.development.google.ekodb.net",
"your-api-key",
)
// 1. Create a chat session
llmModel := "gpt-4"
systemPrompt := "You are a helpful assistant."
session, err := client.CreateChatSession(ekodb.CreateChatSessionRequest{
Collections: []ekodb.CollectionConfig{{
CollectionName: "knowledge_base",
Fields: []interface{}{"content", "title"},
}},
LLMProvider: "openai",
LLMModel: &llmModel,
SystemPrompt: &systemPrompt,
})
// 2. Send a message
response, err := client.ChatMessage(session.ChatID, ekodb.ChatMessageRequest{
Message: "How do I optimize database queries?",
})
fmt.Println(response.Responses)
# 1. Create a chat session
SESSION=$(curl -X POST https://{EKODB_API_URL}/api/chat \
-H "Authorization: Bearer {TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"collections": [{
"collection_name": "knowledge_base",
"fields": [{"field_name": "content"}, {"field_name": "title"}]
}],
"llm_provider": "openai",
"llm_model": "gpt-4",
"system_prompt": "You are a helpful assistant."
}' | jq -r '.id')
# 2. Send a message
curl -X POST https://{EKODB_API_URL}/api/chat/$SESSION/messages \
-H "Authorization: Bearer {TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"message": "How do I optimize database queries?"
}'
Core Concepts
Centralized Architecture
All chat sessions and messages are stored in two database-wide collections:
chat_configurations_{database}- Session metadata and configurationchat_messages_{database}- All messages from all sessions
Benefits:
- ✅ Scalable to millions of sessions
- ✅ No per-session collection management
- ✅ Easy cross-session querying
- ✅ Simplified data model
Chat Sessions
A chat session represents a conversation thread:
{
id: 'session_uuid',
llm_provider: 'openai', // or 'anthropic', 'perplexity'
llm_model: 'gpt-4',
collections: [...], // Data sources to search
system_prompt: '...',
max_context_messages: 10,
created_at: '2025-01-22T...',
updated_at: '2025-01-22T...',
parent_id: null, // For branching conversations
branch_point_idx: null,
title: null // User-set session title
}
Message Flow
User Message
↓
Search Collections (Semantic + Text)
↓
Retrieve Relevant Context
↓
Build Prompt (System + Context + History + User Message)
↓
LLM Generation
↓
Store User Message + AI Response
↓
Return Response with Context
Creating Chat Sessions
Basic Chat Session
const session = await client.createChatSession({
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt: "You are a helpful assistant.",
});
RAG Chat Session
const session = await client.createChatSession({
collections: [
{
collection_name: "documentation",
fields: [
{
field_name: "content",
search_options: {
weight: 1.0, // Search relevance weight
language: "english",
},
},
{
field_name: "title",
search_options: {
weight: 0.5,
},
},
],
},
{
collection_name: "faqs",
fields: ["question", "answer"],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt:
"Answer questions based on the provided documentation and FAQs.",
max_context_messages: 10, // Include last 10 messages in context
});
Multi-Provider Support
// OpenAI
const openaiChat = await client.createChatSession({
llm_provider: "openai",
llm_model: "gpt-4-turbo",
});
// Anthropic
const claudeChat = await client.createChatSession({
llm_provider: "anthropic",
llm_model: "claude-3-opus-20240229",
});
// Perplexity
const perplexityChat = await client.createChatSession({
llm_provider: "perplexity",
llm_model: "pplx-70b-online",
});
Sending Messages
Simple Message
const response = await client.sendChatMessage(sessionId, {
message: "What is ekoDB?",
});
console.log(response.responses); // AI response (array of strings)
With Context
When collections are configured, ekoDB automatically:
- Searches collections for relevant context
- Ranks results by relevance
- Includes context in the LLM prompt
- Returns both response and context used
const response = await client.sendChatMessage(sessionId, {
message: 'How do vector searches work?',
});
// Response includes:
{
chat_id: 'session_uuid',
message_id: 'msg_uuid',
responses: ['Vector searches work by...'], // AI response (array of strings)
context_snippets: [ // Retrieved documents
{
collection: 'documentation',
record: { title: 'Vector Search Guide', content: '...' },
score: 0.95,
matched_fields: ['content']
}
],
execution_time_ms: 142,
token_usage: { prompt_tokens: 512, completion_tokens: 88, total_tokens: 600 }
}
Message Management
List Messages
const messages = await client.getChatMessages(sessionId, {
limit: 50,
skip: 0,
sort: "asc", // chronological order
});
Update Message
await client.updateChatMessage(sessionId, messageId, {
content: "Updated message content",
});
Delete Message
await client.deleteChatMessage(sessionId, messageId);
Mark as Forgotten
Exclude specific messages from context window:
await client.toggleMessageForgotten(sessionId, messageId, true);
Regenerate Response
Generate a new AI response for the same user message:
const newResponse = await client.regenerateResponse(sessionId, messageId);
Advanced Features
Branching Conversations
Create alternative conversation paths from any point:
// Branch from message 5 in parent session
const branchSession = await client.createChatSession({
parent_id: parentSessionId,
branch_point_idx: 5, // Branch from 5th message
llm_provider: "openai",
llm_model: "gpt-4",
});
// New session starts with messages 0-5 from parent
// Can explore different conversation paths
Merging Sessions
Combine multiple conversation threads:
const mergedSession = await client.mergeChatSessions({
session_ids: [sessionId1, sessionId2],
strategy: "chronological", // or 'interleaved'
llm_provider: "openai",
llm_model: "gpt-4",
});
History Compaction
Fold a session's older messages into a single summary message to reclaim
context-window budget. The most-recent messages are kept verbatim;
everything older is summarized and the originals are marked forgotten
so they stop being replayed on subsequent turns.
POST /api/chat/{chat_id}/compact
Content-Type: application/json
{
"keep_recent": 50 // Optional. Defaults to the session's
// max_context_messages (or 50). 0 compacts all.
}
Response:
{
folded: 120, // Older messages folded into the summary
kept_recent: 50, // Recent messages kept verbatim
summary_chars: 842, // Length of the inserted summary (0 if none)
summary_message_id: 'msg_uuid', // ID of the synthetic summary message
already_compact: false // true when nothing needed folding
}
Every client library exposes this directly — for example, in TypeScript:
const result = await client.compactChat(chatId, 50); // keepRecent optional
console.log(`Folded ${result.folded}, kept ${result.kept_recent} recent`);
The same method ships in every client: compactChat (TypeScript, Kotlin),
compact_chat (Rust, Python), and CompactChat (Go), each returning the
response shape above.
Real-World Examples
Customer Support Bot
// 1. Create knowledge base
await client.batchInsert("support_articles", articles);
await client.createIndex("support_articles", ["title", "content"]);
// 2. Create support chat session
const supportSession = await client.createChatSession({
collections: [
{
collection_name: "support_articles",
fields: ["title", "content", "category"],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt: `You are a customer support agent.
Answer questions based on our support documentation.
Be helpful, concise, and professional.`,
});
// 3. Handle customer query
const response = await client.sendChatMessage(supportSession.id, {
message: "How do I reset my password?",
});
// Response includes relevant support articles as context
Document Q&A
// RAG over internal documents
const docSession = await client.createChatSession({
collections: [
{
collection_name: "company_docs",
fields: [
{
field_name: "content",
search_options: { weight: 1.0 },
},
],
},
],
llm_provider: "anthropic",
llm_model: "claude-3-sonnet-20240229",
system_prompt:
"Answer questions about company policies and procedures based on the provided documents.",
});
const answer = await client.sendChatMessage(docSession.id, {
message: "What is our vacation policy?",
});
Code Assistant
// Code documentation chatbot
const codeSession = await client.createChatSession({
collections: [
{
collection_name: "code_docs",
fields: ["description", "code", "examples"],
},
{
collection_name: "api_reference",
fields: ["method", "parameters", "returns"],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
system_prompt: `You are a code assistant. Help developers by:
- Providing accurate code examples
- Explaining concepts clearly
- Referencing official documentation`,
max_context_messages: 15,
});
Hybrid Search Integration
Combine text search with vector similarity:
// Store embeddings with documents
await client.insert("knowledge_base", {
title: "Vector Search Guide",
content: "Vector search enables...",
embedding: vectorEmbedding, // From OpenAI, Cohere, etc.
});
// Chat session uses hybrid search automatically
const session = await client.createChatSession({
collections: [
{
collection_name: "knowledge_base",
fields: [
{ field_name: "content", search_options: { weight: 0.6 } },
{
field_name: "embedding",
search_options: { weight: 0.4, type: "vector" },
},
],
},
],
llm_provider: "openai",
llm_model: "gpt-4",
});
// Searches use both text relevance and semantic similarity
Performance Optimization
1. Limit Context Messages
const session = await client.createChatSession({
max_context_messages: 5, // Only include last 5 messages
// ... other config
});
2. Use Targeted Collections
// Only search relevant collections
const session = await client.createChatSession({
collections: [
{
collection_name: "recent_docs", // Smaller, focused collection
fields: ["content"],
},
],
// ... other config
});
3. Index Your Data
// Create indexes for faster search
await client.createIndex("knowledge_base", ["title", "content"]);
4. Use Efficient Models
// Balance cost/performance
const session = await client.createChatSession({
llm_provider: "openai",
llm_model: "gpt-3.5-turbo", // Faster, cheaper for simple queries
});
Best Practices
- System Prompts: Be specific about behavior and constraints
- Context Limits: Balance context quality vs token costs
- Collection Design: Structure data for efficient retrieval
- Error Handling: Handle LLM failures gracefully
- Rate Limiting: Respect provider rate limits
- Cost Monitoring: Track token usage and costs
- Caching: Cache common responses when appropriate
- Testing: Test with real user queries
API Reference
createChatSession()
client.createChatSession(options: {
collections?: CollectionConfig[],
llm_provider: 'openai' | 'anthropic' | 'perplexity',
llm_model: string,
system_prompt?: string,
max_context_messages?: number,
parent_id?: string,
branch_point_idx?: number,
}): Promise<ChatSession>
sendChatMessage()
client.sendChatMessage(
sessionId: string,
options: {
message: string,
}
): Promise<{
content: string,
context: ContextDocument[],
message_id: string,
created_at: string,
}>
getChatMessages()
client.getChatMessages(
sessionId: string,
options?: {
limit?: number,
skip?: number,
sort?: 'asc' | 'desc',
}
): Promise<Message[]>
getChatModels()
Get all available chat models organized by provider:
client.getChatModels(): Promise<Record<string, string[]>>
Response Example:
{
"openai": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo", "gpt-4o"],
"anthropic": ["claude-3-opus-20240229", "claude-3-sonnet-20240229"],
"perplexity": ["llama-3.1-sonar-small-128k-online"]
}
getChatModel()
Get available models for a specific provider:
client.getChatModel(provider: string): Promise<string[]>
Example:
const openaiModels = await client.getChatModel("openai");
// ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo", "gpt-4o", "gpt-4o-mini"]
REST API:
# List all models by provider
GET /api/chat_models
# Get models for a specific provider
GET /api/chat_models/openai
Troubleshooting
No Context Retrieved
Problem: AI responses don't use your data
Solutions:
- Verify collections are configured correctly
- Check collection has data
- Ensure search fields exist
- Try different search weights
Token Limit Errors
Problem: Context too large for LLM
Solutions:
- Reduce
max_context_messages - Limit collection search results
- Use shorter documents
- Switch to model with larger context window
Slow Responses
Problem: Chat responses are slow
Solutions:
- Create indexes on search fields
- Reduce number of collections searched
- Use faster LLM model
- Limit context size
Related Documentation
- Vector Search - Semantic search with embeddings
- Indexes - Optimize search performance
- Client Libraries - Full API examples
- System Administration - Monitor and manage
Chat Models API Examples:
- Rust -
client_chat_models.rs - Python -
client_chat_models.py - TypeScript -
client_chat_models.ts - Go -
client_chat_models.go - Kotlin -
ClientChatModels.kt
Summary
Chat & RAG in ekoDB enables:
✅ Conversational AI - Natural language interactions ✅ Context-aware responses - Answers based on your data ✅ Multi-provider support - OpenAI, Anthropic, Perplexity ✅ Branching conversations - Explore alternative paths ✅ Hybrid search - Text + vector semantic matching ✅ Integrated - No separate infrastructure needed ✅ Production-ready - Scalable and reliable