Skip to main content

Vector Search & Embeddings

Build intelligent search and recommendation systems with ekoDB's integrated vector search capabilities.

Production-Ready

Vector search is built directly into ekoDB - no separate deployment needed. Performance competitive with specialized vector databases.

Quick Start

use ekodb_client::{Client, Record, SearchQuery};

let client = Client::builder()
.base_url("https://my-first-db.development.google.ekodb.net")
.api_key("your-api-key")
.build()?;

// 1. Store vectors with your data
let mut record = Record::new();
record.insert("name", "Ergonomic Chair");
record.insert("description", "Comfortable office chair...");
record.insert("embedding", vec![0.12, 0.34, 0.56]); // 1536-dim vector

client.insert("products", record).await?;

// 2. Search by similarity
let query_embedding: Vec<f64> = vec![0.12, 0.34, 0.56]; // from your embedding model
let query = SearchQuery {
query: String::new(),
vector: Some(query_embedding),
vector_field: Some("embedding".to_string()),
vector_metric: Some("cosine".to_string()),
vector_k: Some(10),
..Default::default()
};
let results = client.search("products", query).await?;

Performance

Dataset SizeAverage LatencyThroughput
1K vectors51ms379 RPS
10K vectors322ms56 RPS
Hybrid searchText + vector combinedConfigurable weights

Competitive Performance:

  • 6x faster than PostgreSQL pgvector
  • Competitive with Milvus, Elasticsearch
  • Integrated - no separate deployment

Vector Types

ekoDB supports vector fields for storing embeddings from AI models:

// Standard vector field
const record = {
title: 'Database Performance Tips',
content: 'This article discusses...',
embedding: {
type: 'Vector',
value: [0.12, 0.34, 0.56, ...], // Array of numbers
metadata: {
model: 'text-embedding-ada-002',
dimensions: 1536
}
}
};

Schema Definition

Define vector fields in your schema. ekoDB infers the vector dimension from the first inserted vector and rejects later vectors of a different length, so you do not declare the dimension yourself:

const schema = {
fields: {
title: { field_type: 'String', required: true },
content: { field_type: 'String', required: true },
embedding: {
field_type: 'Vector',
required: true,
// Optional: Vector index configuration
index: {
type: 'vector',
algorithm: 'flat', // exact search
metric: 'cosine', // similarity metric
}
}
}
};

await client.createCollection('articles', schema);

Distance Metrics

Choose the right metric for your use case:

Measures the angle between vectors. Range: [-1, 1]

const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorMetric('cosine')
.vectorK(10)
.build();
const results = await client.search('articles', query);

Best for:

  • ✅ Text embeddings and semantic search
  • ✅ When vector magnitude should be ignored
  • ✅ Most AI/ML embeddings (OpenAI, Cohere, etc.)

Euclidean Distance

Measures straight-line (L2) distance. Lower = more similar.

const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorMetric('euclidean')
.vectorK(10)
.build();
const results = await client.search('locations', query);

Best for:

  • ✅ Spatial data and coordinates
  • ✅ When both magnitude and direction matter
  • ✅ Physical measurements

Dot Product

Calculates inner product. Higher = more similar.

const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorMetric('dotproduct')
.vectorK(10)
.build();
const results = await client.search('recommendations', query);

Best for:

  • ✅ When vectors are pre-normalized
  • ✅ Certain recommendation systems
  • ✅ When magnitude contains meaningful information

Search Methods

ekoDB's vector search finds the true nearest neighbors for your query, with performance optimized by vector indexes when defined in your schema.

const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorMetric('cosine')
.vectorK(10)
.vectorThreshold(0.7) // minimum similarity score
.build();
const results = await client.search('products', query);

Performance:

  • Low-latency similarity search across collections
  • Accurate results guaranteed
  • Further optimized with vector indexes (when defined in schema)

Combine text search with vector similarity for powerful hybrid queries:

// Hybrid: Text + Vector search
const query = new SearchQueryBuilder('database performance')
.fields(['title', 'content'])
.vector(queryEmbedding)
.vectorField('embedding')
.textWeight(0.3) // 30% text relevance
.vectorWeight(0.7) // 70% semantic similarity
.limit(10)
.build();
const results = await client.search('articles', query);

Use cases:

  • Semantic search with keyword filtering
  • Recommendations with category constraints
  • RAG systems with both keyword and semantic matching

Filtering by Metadata

Restrict vector and hybrid search to a subset of records with a metadata pre-filter. Only records matching the filter are considered as candidates before similarity ranking, so a query like "find the nearest in-stock electronics over $100" never scores the rest of the collection.

import { SearchQueryBuilder, QueryBuilder } from '@ekodb/ekodb-client';

const filter = new QueryBuilder()
.eq('category', 'electronics')
.gte('price', 100)
.build().filter;

const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorK(10)
.filters(filter) // only electronics priced >= 100 are ranked
.build();

const results = await client.search('products', query);

The filter uses the same Query Expression syntax as find, including Logical And / Or / Not combinations. The pre-filter is uniform across every search mode: full-text search, brute-force vector search, the indexed vector paths, and hybrid search. It is always applied to the candidate set before ranking, so results are never silently truncated by an index window. In hybrid search it governs the entire result, so a record that matches the text query but fails the filter is excluded, not surfaced on its text score alone.

Indexed vector search with a filter is exact. A Flat index ranks only matching records directly. An HNSW index runs a fast approximate filtered traversal, but if a selective filter starves that traversal of candidates, ekoDB automatically falls back to an exact scan — so you always get the true matches, never a silently truncated set. The only cost of a highly selective filter is a little extra latency on that fallback.

The same .filters(...) works on a pure text search too:

// Full-text search restricted to one category
const query = new SearchQueryBuilder("introduction")
.fields(["title", "content"])
.filters(new QueryBuilder().eq("category", "ml").build().filter)
.build();

const results = await client.search("documents", query);
// Hybrid search constrained to a single tenant
const query = new SearchQueryBuilder('annual report')
.fields(['title', 'body'])
.vector(queryVector)
.textWeight(0.3)
.vectorWeight(0.7)
.filters(new QueryBuilder().eq('tenant_id', tenantId).build().filter)
.build();

const results = await client.search('documents', query);

Real-World Examples

// Generate embedding from OpenAI
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function semanticSearch(query: string) {
// 1. Get query embedding
const response = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: query,
});
const queryVector = response.data[0].embedding;

// 2. Search by similarity
const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorMetric('cosine')
.vectorK(5)
.build();
const results = await client.search('articles', query);

return results;
}

// Search: "How to optimize database queries"
const articles = await semanticSearch('database optimization tips');

Product Recommendations

async function getSimilarProducts(productId: string) {
// 1. Get product's embedding
const product = await client.findById('products', productId);
const embedding = product.embedding;

// 2. Find similar products
const query = new SearchQueryBuilder('')
.vector(embedding)
.vectorMetric('cosine')
.vectorK(10)
.build();
const similar = await client.search('products', query);

// Filter out the original product
return similar.filter(p => p.id !== productId);
}

Image Similarity

// Store image embeddings from CLIP or similar model
async function findSimilarImages(imageEmbedding: number[], category?: string) {
const query = new SearchQueryBuilder('')
.vector(imageEmbedding)
.vectorMetric('cosine')
.vectorK(20)
.vectorThreshold(0.7)
.build();
const results = await client.search('images', query);

return results.map(r => ({
url: r.url,
similarity: r.score,
metadata: r.metadata
}));
}

Deletion & Index Maintenance

When you delete a record that contains a vector field, ekoDB immediately removes it from vector search results. There's no need to manually update the index — deleted records won't appear in searches.

// Insert a record with a vector
await client.insert('products', {
name: 'Discontinued Widget',
embedding: [0.12, 0.34, 0.56, ...]
});

// Delete it — immediately excluded from vector search results
await client.delete('products', recordId);

Reindexing

Over time, frequent deletions can degrade search performance, because a deleted vector is marked rather than removed from the graph. After heavy delete churn, reindex to rebuild the search graph and restore optimal performance. Reindexing is manual (call the reindex endpoint or your client's reindex method); ekoDB does not auto-trigger it.

# Reindex a collection's vector index
curl -X POST https://{EKODB_API_URL}/api/indexes/search/{collection}/reindex \
-H "Authorization: Bearer {TOKEN}" \
-H "Content-Type: application/json" \
-d '{}'

# Optionally specify which vector field to reindex
curl -X POST https://{EKODB_API_URL}/api/indexes/search/{collection}/reindex \
-H "Authorization: Bearer {TOKEN}" \
-H "Content-Type: application/json" \
-d '{"field": "embedding"}'

Response:

{
"status": "ok",
"collection": "products",
"field": "embedding",
"vectors_reindexed": 4850,
"duration_ms": 127.5
}
When to Reindex

Reindexing is only needed after heavy deletion workloads. For most applications with occasional deletes, the index stays performant without manual intervention.

Search Tuning

ef_search (Beam Width)

The ef_search parameter controls the search beam width — higher values explore more of the graph, improving accuracy at the cost of latency. ekoDB resolves ef_search with a 3-tier fallback:

  1. Per-query override — pass ef_search in the search request
  2. Collection-level config — set in the vector index configuration
  3. Heuristic defaultmax(k * 2, 64)

Both vector_k and ef_search are clamped to server caps (max_vector_k, default 1000; max_ef_search, default 4000) so a single query cannot exhaust memory or CPU. Raise them via PUT /api/config. See Configuration — Search.

// Per-query override for a high-precision search.
// The builder covers the common parameters; ef_search is set via the Direct API (below).
const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorMetric('cosine')
.vectorK(10)
.build();
const results = await client.search('articles', query);
# Direct API with ef_search override
curl -X POST https://{EKODB_API_URL}/api/search/{collection} \
-H "Authorization: Bearer {TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"query": "",
"vector": [0.12, 0.34, 0.56, ...],
"vector_metric": "cosine",
"vector_k": 10,
"ef_search": 200
}'
ef_searchAccuracyLatencyUse Case
32-64GoodLowReal-time recommendations
64-128HighMediumSemantic search (default range)
200+Very HighHigherPrecision-critical applications

Performance Optimization

1. Use Vector Indexes

Define vector indexes in your schema for automatic indexing:

// Define index in schema for automatic indexing
const schema = {
fields: {
embedding: {
field_type: 'Vector',
index: {
type: 'vector',
algorithm: 'flat', // exact search
metric: 'cosine',
}
}
}
};

2. Set Similarity Threshold

Filter out low-relevance results:

const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorMetric('cosine')
.vectorK(10)
.vectorThreshold(0.75) // only return results with > 75% similarity
.build();
const results = await client.search('articles', query);

3. Batch Insert Vectors

// More efficient than individual inserts
await client.batchInsert('articles', articlesWithEmbeddings);

4. Use Field Projection

Return only needed fields to reduce data transfer:

const query = new SearchQueryBuilder('')
.vector(queryVector)
.vectorMetric('cosine')
.vectorK(10)
.selectFields(['title', 'price', 'image_url']) // only these fields
.build();
const results = await client.search('products', query);

Embedding Models

ekoDB works with any embedding model. Popular choices:

OpenAI

import OpenAI from 'openai';

const openai = new OpenAI();
const response = await openai.embeddings.create({
model: 'text-embedding-ada-002', // 1536 dimensions
input: 'Your text here',
});
const embedding = response.data[0].embedding;

Cohere

import { CohereClient } from 'cohere-ai';

const cohere = new CohereClient({ token: process.env.COHERE_API_KEY });
const response = await cohere.embed({
texts: ['Your text here'],
model: 'embed-english-v3.0', // 1024 dimensions
});
const embedding = response.embeddings[0];

Local Models (Transformers.js)

import { pipeline } from '@xenova/transformers';

const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const output = await extractor('Your text here', { pooling: 'mean', normalize: true });
const embedding = Array.from(output.data); // 384 dimensions

API Reference

Vector, text, and hybrid search all go through client.search(collection, query), where query is built with SearchQueryBuilder. Which fields you set decides the search type.

import { SearchQueryBuilder } from '@ekodb/ekodb-client';

const query = new SearchQueryBuilder('') // text query ('' for vector-only)
.vector(queryVector) // query vector (presence triggers vector search)
.vectorField('embedding') // field holding vectors (default: 'embedding')
.vectorMetric('cosine') // 'cosine' | 'euclidean' | 'dotproduct'
.vectorK(10) // number of nearest neighbors
.vectorThreshold(0.7) // minimum similarity score (0.0-1.0)
.filters(metadataFilter) // optional metadata pre-filter (QueryExpression)
.selectFields(['title', 'price']) // field projection (or .excludeFields([...]))
.build();

const results = await client.search('products', query);

Hybrid search

Set a text query plus a vector and weight them:

const query = new SearchQueryBuilder('database performance')
.fields(['title', 'content'])
.vector(queryVector)
.textWeight(0.3)
.vectorWeight(0.7)
.limit(10)
.build();

const results = await client.search('articles', query);

ef_search and other less-common parameters can be sent directly in the body of POST /api/search/{collection} (see the Direct API examples above).

Best Practices

  1. Match Dimensions: Keep every vector in a field the same length as your embedding model's output — ekoDB locks the dimension to the first inserted vector and rejects mismatched lengths
  2. Use Schemas: Declare the field as Vector and attach a vector index for automatic similarity indexing
  3. Choose Right Metric: Cosine for most AI embeddings (OpenAI, Cohere, etc.)
  4. Batch Operations: Use batchInsert for multiple vectors
  5. Set Thresholds: Filter low-relevance results with threshold parameter
  6. Tune ef_search: Raise it for precision-critical queries, lower it for latency-sensitive ones
  7. Field Projection: Only return fields you need
  8. Monitor Performance: Track query latency and optimize with vector indexes as needed

Summary

Vector search in ekoDB enables:

Semantic search - Find by meaning, not just keywords ✅ Recommendations - Product, content, and user similarity ✅ Image search - Visual similarity matching ✅ RAG systems - Retrieval-augmented generation ✅ Integrated - No separate vector database needed ✅ Production-ready - Competitive performance and reliability