Add Family vector search function call / support for document vault (#961)

* Add SearchFamilyImportedFiles assistant function with vector store support Implement per-Family document search using OpenAI vector stores, allowing the AI assistant to search through uploaded financial documents (tax returns, statements, contracts, etc.). The architecture is modular with a provider- agnostic VectorStoreConcept interface so other RAG backends can be added. Key components: - Assistant::Function::SearchFamilyImportedFiles - tool callable from any LLM - Provider::VectorStoreConcept - abstract vector store interface - Provider::Openai vector store methods (create, upload, search, delete) - Family::VectorSearchable concern with document management - FamilyDocument model for tracking uploaded files - Migration adding vector_store_id to families and family_documents table https://claude.ai/code/session_01TSkKc7a9Yu2ugm1RvSf4dh * Extract VectorStore adapter layer for swappable backends Replace the Provider::VectorStoreConcept mixin with a standalone adapter architecture under VectorStore::. This cleanly separates vector store concerns from the LLM provider and makes it trivial to swap backends. Components: - VectorStore::Base — abstract interface (create/delete/upload/remove/search) - VectorStore::Openai — uses ruby-openai gem's native vector_stores.search - VectorStore::Pgvector — skeleton for local pgvector + embedding model - VectorStore::Qdrant — skeleton for Qdrant vector DB - VectorStore::Registry — resolves adapter from VECTOR_STORE_PROVIDER env - VectorStore::Response — success/failure wrapper (like Provider::Response) Consumers updated to go through VectorStore.adapter: - Family::VectorSearchable - Assistant::Function::SearchFamilyImportedFiles - FamilyDocument Removed: Provider::VectorStoreConcept, vector store methods from Provider::Openai https://claude.ai/code/session_01TSkKc7a9Yu2ugm1RvSf4dh * Add Vector Store configuration docs to ai.md Documents how to configure the document search feature, covering all three supported backends (OpenAI, pgvector, Qdrant), environment variables, Docker Compose examples, supported file types, and privacy considerations. https://claude.ai/code/session_01TSkKc7a9Yu2ugm1RvSf4dh * No need to specify `imported` in code * Missed a couple more places * Tiny reordering for the human OCD * Update app/models/assistant/function/search_family_files.rb Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Juan José Mata <jjmata@jjmata.com> * PR comments * More PR comments --------- Signed-off-by: Juan José Mata <jjmata@jjmata.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-04-19 12:04:08 +00:00 · 2026-02-11 15:22:56 +01:00
parent 1ebbd5bbc5
commit 9e57954a99
20 changed files with 1212 additions and 6 deletions
--- a/docs/hosting/ai.md
+++ b/docs/hosting/ai.md
@@ -633,6 +633,100 @@ The assistant uses OpenAI's function calling (tool use) to access user data:

 These are defined in `app/models/assistant/function/`.

+### Vector Store (Document Search)
+
+Sure's AI assistant can search documents that have been uploaded to a family's vault. Under the hood, documents are indexed in a **vector store** so the assistant can retrieve relevant passages when answering questions (Retrieval-Augmented Generation).
+
+#### How It Works
+
+1. When a user uploads a document to their family vault, it is automatically pushed to the configured vector store.
+2. When the assistant needs financial context from uploaded files, it calls the `search_family_files` function.
+3. The vector store returns the most relevant passages, which the assistant uses to answer the question.
+
+#### Supported Backends
+
+| Backend | Best For | Requirements |
+|---------|----------|--------------|
+| **OpenAI** (default) | Cloud deployments, zero setup | `OPENAI_ACCESS_TOKEN` |
+| **Pgvector** | Self-hosted, full data privacy | PostgreSQL with `pgvector` extension |
+| **Qdrant** | Self-hosted, dedicated vector DB | Running Qdrant instance |
+
+#### Configuration
+
+##### OpenAI (Default)
+
+No extra configuration is needed. If you already have `OPENAI_ACCESS_TOKEN` set for the AI assistant, document search works automatically. OpenAI manages chunking, embedding, and retrieval.
+
+```bash
+# Already set for AI chat — document search uses the same token
+OPENAI_ACCESS_TOKEN=sk-proj-...
+```
+
+##### Pgvector (Self-Hosted)
+
+Use PostgreSQL's pgvector extension for fully local document search:
+
+```bash
+VECTOR_STORE_PROVIDER=pgvector
+```
+
+> **Note:** The pgvector adapter is currently a skeleton. A future release will add full support including embedding model configuration.
+
+##### Qdrant (Self-Hosted)
+
+Use a dedicated Qdrant vector database:
+
+```bash
+VECTOR_STORE_PROVIDER=qdrant
+QDRANT_URL=http://localhost:6333   # Default if not set
+QDRANT_API_KEY=your-api-key        # Optional, for authenticated instances
+```
+
+Docker Compose example:
+
+```yaml
+services:
+  sure:
+    environment:
+      - VECTOR_STORE_PROVIDER=qdrant
+      - QDRANT_URL=http://qdrant:6333
+    depends_on:
+      - qdrant
+
+  qdrant:
+    image: qdrant/qdrant:latest
+    ports:
+      - "6333:6333"
+    volumes:
+      - qdrant_data:/qdrant/storage
+
+volumes:
+  qdrant_data:
+```
+
+> **Note:** The Qdrant adapter is currently a skeleton. A future release will add full support including collection management and embedding configuration.
+
+#### Verifying the Configuration
+
+You can check whether a vector store is properly configured from the Rails console:
+
+```ruby
+VectorStore.configured?          # => true / false
+VectorStore.adapter              # => #<VectorStore::Openai:...>
+VectorStore.adapter.class.name   # => "VectorStore::Openai"
+```
+
+#### Supported File Types
+
+The following file extensions are supported for document upload and search:
+
+`.pdf`, `.txt`, `.md`, `.csv`, `.json`, `.xml`, `.html`, `.css`, `.js`, `.rb`, `.py`, `.docx`, `.pptx`, `.xlsx`, `.yaml`, `.yml`, `.log`, `.sh`
+
+#### Privacy Notes
+
+- **OpenAI backend:** Document content is sent to OpenAI's API for indexing and search. The same privacy considerations as the AI chat apply.
+- **Pgvector / Qdrant backends:** All data stays on your infrastructure. No external API calls are made for document search.
+
 ### Multi-Model Setup

 Currently not supported out of the box, but you could: