Skip to main content

Vector Search & Embeddings

Build intelligent search and recommendation systems with ekoDB's integrated vector search capabilities.

Production-Ready

Vector search is built directly into ekoDB - no separate deployment needed. Performance competitive with specialized vector databases.

Quick Start

use ekodb_client::Client;

// 1. Store vectors with your data
let mut record = Record::new();
record.insert("name", "Ergonomic Chair");
record.insert("description", "Comfortable office chair...");
record.insert("embedding", vec![0.12, 0.34, 0.56]); // 1536-dim vector

client.insert("products", record).await?;

// 2. Search by similarity
let results = client.vector_search(
"products",
query_embedding,
10,
Some(VectorSearchOptions {
metric: SimilarityMetric::Cosine,
..Default::default()
})
).await?;

Performance

Dataset SizeAverage LatencyThroughput
1K vectors51ms379 RPS
10K vectors322ms56 RPS
Hybrid searchText + vector combinedConfigurable weights

Competitive Performance:

  • 6x faster than PostgreSQL pgvector
  • Competitive with Milvus, Elasticsearch
  • Integrated - no separate deployment

Vector Types

ekoDB supports vector fields for storing embeddings from AI models:

// Standard vector field
const record = {
title: 'Database Performance Tips',
content: 'This article discusses...',
embedding: {
type: 'Vector',
value: [0.12, 0.34, 0.56, ...], // Array of numbers
metadata: {
model: 'text-embedding-ada-002',
dimensions: 1536
}
}
};

Schema Definition

Define vector fields in your schema with dimension validation:

const schema = {
title: { field_type: 'String', required: true },
content: { field_type: 'String', required: true },
embedding: {
field_type: 'Vector',
dimensions: 1536, // Enforce dimensionality
required: true,
// Optional: Vector index configuration
index: {
algorithm: 'Flat', // Exact search
metric: 'Cosine', // Similarity metric
}
}
};

await client.createSchema('articles', schema);

Distance Metrics

Choose the right metric for your use case:

Measures the angle between vectors. Range: [-1, 1]

const results = await client.vectorSearch(
'articles',
queryVector,
10,
{ metric: 'Cosine' }
);

Best for:

  • ✅ Text embeddings and semantic search
  • ✅ When vector magnitude should be ignored
  • ✅ Most AI/ML embeddings (OpenAI, Cohere, etc.)

Euclidean Distance

Measures straight-line (L2) distance. Lower = more similar.

const results = await client.vectorSearch(
'locations',
queryVector,
10,
{ metric: 'Euclidean' }
);

Best for:

  • ✅ Spatial data and coordinates
  • ✅ When both magnitude and direction matter
  • ✅ Physical measurements

Dot Product

Calculates inner product. Higher = more similar.

const results = await client.vectorSearch(
'recommendations',
queryVector,
10,
{ metric: 'DotProduct' }
);

Best for:

  • ✅ When vectors are pre-normalized
  • ✅ Certain recommendation systems
  • ✅ When magnitude contains meaningful information

Search Methods

ekoDB performs exact vector search - comparing the query vector against all vectors in the collection. This guarantees finding the true nearest neighbors.

const results = await client.vectorSearch(
'products',
queryVector,
10,
{
metric: 'Cosine',
threshold: 0.7, // Minimum similarity score
filters: { category: 'electronics' }, // Metadata filtering
select_fields: ['title', 'price'], // Field projection
}
);

Performance:

  • Small-medium collections (< 100K): 51ms average
  • Exact results guaranteed
  • Optimized with vector indexes (when defined in schema)

Combine text search with vector similarity for powerful hybrid queries:

// Hybrid: Text + Vector search
const results = await client.hybridSearch(
'articles',
{
text_query: 'database performance',
vector: queryEmbedding,
text_weight: 0.3, // 30% text relevance
vector_weight: 0.7, // 70% semantic similarity
limit: 10
}
);

Use cases:

  • Semantic search with keyword filtering
  • Recommendations with category constraints
  • RAG systems with both keyword and semantic matching

Real-World Examples

// Generate embedding from OpenAI
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function semanticSearch(query: string) {
// 1. Get query embedding
const response = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: query,
});
const queryVector = response.data[0].embedding;

// 2. Search by similarity
const results = await client.vectorSearch(
'articles',
queryVector,
5,
{ metric: 'Cosine' }
);

return results;
}

// Search: "How to optimize database queries"
const articles = await semanticSearch('database optimization tips');

Product Recommendations

async function getSimilarProducts(productId: string) {
// 1. Get product's embedding
const product = await client.findById('products', productId);
const embedding = product.embedding;

// 2. Find similar products
const similar = await client.vectorSearch(
'products',
embedding,
10,
{ metric: 'Cosine' }
);

// Filter out the original product
return similar.filter(p => p.id !== productId);
}

Image Similarity

// Store image embeddings from CLIP or similar model
async function findSimilarImages(imageEmbedding: number[], category?: string) {
const results = await client.vectorSearch(
'images',
imageEmbedding,
20,
{
metric: 'Cosine',
threshold: 0.7,
// Use metadata filters to reduce search space
filters: category ? { category } : undefined
}
);

return results.map(r => ({
url: r.url,
similarity: r.score,
metadata: r.metadata
}));
}

Performance Optimization

1. Use Vector Indexes

Define vector indexes in your schema for automatic indexing:

// Define index in schema for automatic indexing
const schema = {
embedding: {
field_type: 'Vector',
dimensions: 768,
index: {
algorithm: 'Flat', // Exact search
metric: 'Cosine',
}
}
};

2. Use Metadata Filters

Reduce search space with metadata filters:

const results = await client.vectorSearch(
'products',
queryVector,
10,
{
metric: 'Cosine',
filters: {
category: 'electronics',
in_stock: true
}
}
);

3. Set Similarity Threshold

Filter out low-relevance results:

const results = await client.vectorSearch(
'articles',
queryVector,
10,
{
metric: 'Cosine',
threshold: 0.75 // Only return results with > 75% similarity
}
);

4. Batch Insert Vectors

// More efficient than individual inserts
await client.batchInsert('articles', articlesWithEmbeddings);

5. Use Field Projection

Return only needed fields to reduce data transfer:

const results = await client.vectorSearch(
'products',
queryVector,
10,
{
metric: 'Cosine',
select_fields: ['title', 'price', 'image_url'] // Only these fields
}
);

Embedding Models

ekoDB works with any embedding model. Popular choices:

OpenAI

import OpenAI from 'openai';

const openai = new OpenAI();
const response = await openai.embeddings.create({
model: 'text-embedding-ada-002', // 1536 dimensions
input: 'Your text here',
});
const embedding = response.data[0].embedding;

Cohere

import { CohereClient } from 'cohere-ai';

const cohere = new CohereClient({ token: process.env.COHERE_API_KEY });
const response = await cohere.embed({
texts: ['Your text here'],
model: 'embed-english-v3.0', // 1024 dimensions
});
const embedding = response.embeddings[0];

Local Models (Transformers.js)

import { pipeline } from '@xenova/transformers';

const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const output = await extractor('Your text here', { pooling: 'mean', normalize: true });
const embedding = Array.from(output.data); // 384 dimensions

API Reference

vectorSearch()

client.vectorSearch(
collection: string,
vector: number[],
limit: number,
options?: {
metric?: 'Cosine' | 'Euclidean' | 'DotProduct',
threshold?: number, // Minimum similarity score (0.0-1.0)
vector_field?: string, // Default: 'embedding'
filters?: Record<string, any>, // Metadata filters
normalize?: boolean, // Auto-normalize vectors (default: true)
bypass_cache?: boolean,
select_fields?: string[], // Field projection
exclude_fields?: string[],
}
): Promise<SearchResult[]>

hybridSearch()

client.hybridSearch(
collection: string,
options: {
text_query: string,
vector: number[],
text_weight: number,
vector_weight: number,
limit: number,
metric?: 'Cosine' | 'Euclidean' | 'DotProduct'
}
): Promise<SearchResult[]>

Best Practices

  1. Match Dimensions: Ensure vector dimensions match your embedding model
  2. Use Schemas: Define dimensions in schema for validation
  3. Choose Right Metric: Cosine for most AI embeddings (OpenAI, Cohere, etc.)
  4. Batch Operations: Use batchInsert for multiple vectors
  5. Set Thresholds: Filter low-relevance results with threshold parameter
  6. Use Metadata Filters: Reduce search space when possible
  7. Field Projection: Only return fields you need
  8. Monitor Performance: Track query latency (< 100ms target for < 100K vectors)

Summary

Vector search in ekoDB enables:

Semantic search - Find by meaning, not just keywords ✅ Recommendations - Product, content, and user similarity ✅ Image search - Visual similarity matching ✅ RAG systems - Retrieval-augmented generation ✅ Integrated - No separate vector database needed ✅ Production-ready - Competitive performance and reliability