What Is a Vector Database?
A vector database stores high-dimensional numerical representations (embeddings) of unstructured data – text, images, sensor readings – and retrieves the most semantically similar items via approximate nearest neighbor (ANN) search.
When You Actually Need One
You need a vector database when:
- Building semantic search (not keyword search)
- Implementing RAG (Retrieval-Augmented Generation) with LLMs
- Detecting anomalies in sensor data using embedding similarity
- Recommending similar content at scale
pgvector: The Pragmatic Choice
For most teams, adding pgvector to your existing PostgreSQL is the right first step:
CREATE EXTENSION vector;
CREATE TABLE documents (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
content text,
embedding vector(1536) -- OpenAI ada-002 dimension
);
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Semantic search query
SELECT content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;
Scaling Considerations
pgvector handles up to ~1M vectors well. Beyond that, consider Qdrant or Weaviate as dedicated solutions. But start simple.
Conclusion
Vector search is not magic – it is distance math. Understand the embedding model, index type, and dimensionality tradeoffs before committing to a specific solution.