* feat: Add PDF import with AI-powered document analysis This enhances the import functionality to support PDF files with AI-powered document analysis. When a PDF is uploaded, it is processed by AI to: - Identify the document type (bank statement, credit card statement, etc.) - Generate a summary of the document contents - Extract key metadata (institution, dates, balances, transaction count) After processing, an email is sent to the user asking for next steps. Key changes: - Add PdfImport model for handling PDF document imports - Add Provider::Openai::PdfProcessor for AI document analysis - Add ProcessPdfJob for async PDF processing - Add PdfImportMailer for user notification emails - Update imports controller to detect and handle PDF uploads - Add PDF import option to the new import page - Add i18n translations for all new strings - Add comprehensive tests for the new functionality * Add bank statement import with AI extraction - Create ImportBankStatement assistant function for MCP - Add BankStatementExtractor with chunked processing for small context windows - Register function in assistant configurable - Make PdfImport#pdf_file_content public for extractor access - Increase OpenAI request timeout to 600s for slow local models - Increase DB connection pool to 20 for concurrent operations Tested with M-Pesa bank statement via remote Ollama (qwen3:8b): - Successfully extracted 18 transactions - Generated CSV and created TransactionImport - Works with 3000 char chunks for small context windows * Add pdf-reader gem dependency The BankStatementExtractor uses PDF::Reader to parse bank statement PDFs, but the gem was not properly declared in the Gemfile. This would cause NameError in production when processing bank statements. Added pdf-reader ~> 2.12 to Gemfile dependencies. * Fix transaction deduplication to preserve legitimate duplicates The previous deduplication logic removed ALL duplicate transactions based on [date, amount, name], which would drop legitimate same-day duplicates like multiple ATM withdrawals or card authorizations. Changed to only deduplicate transactions that appear in consecutive chunks (chunking artifacts) while preserving all legitimate duplicates within the same chunk or non-adjacent chunks. * Refactor bank statement extraction to use public provider method Address code review feedback: - Add public extract_bank_statement method to Provider::Openai - Remove direct access to private client via send(:client) - Update ImportBankStatement to use new public method - Add require 'set' to BankStatementExtractor - Remove PII-sensitive content from error logs - Add defensive check for nil response.error - Handle oversized PDF pages in chunking logic - Remove unused process_native and process_generic methods - Update email copy to reflect feature availability - Add guard for nil document_type in email template - Document pdf-reader gem rationale in Gemfile Tested with both OpenAI (gpt-4o) and Ollama (qwen3:8b): - OpenAI: 49 transactions extracted in 30s - Ollama: 40 transactions extracted in 368s - All encapsulation and error handling working correctly * Update schema.rb with ai_summary and document_type columns * Address PR #808 review comments - Rename :csv_file to :import_file across controllers/views/tests - Add PDF test fixture (sample_bank_statement.pdf) - Add supports_pdf_processing? method for graceful degradation - Revert unrelated database.yml pool change (600->3) - Remove month_start_day schema bleed from other PR - Fix PdfProcessor: use .strip instead of .strip_heredoc - Add server-side PDF magic byte validation - Conditionally show PDF import option when AI provider available - Fix ProcessPdfJob: sanitize errors, handle update failure - Move pdf_file attachment from Import to PdfImport - Document deduplication logic limitations - Fix ImportBankStatement: catch specific exceptions only - Remove unnecessary require 'set' - Remove dead json_schema method from PdfProcessor - Reduce default OpenAI timeout from 600s to 60s - Fix nil guard in text mailer template - Add require 'csv' to ImportBankStatement - Remove Gemfile pdf-reader comment * Fix RuboCop indentation in ProcessPdfJob * Refactor PDF import check to use model predicate method Replace is_a?(PdfImport) type check with requires_csv_workflow? predicate that leverages STI inheritance for cleaner controller logic. * Fix missing 'unknown' locale key and schema version mismatch - Add 'unknown: Unknown Document' to document_types locale - Fix schema version to match latest migration (2026_01_24_180211) * Document OPENAI_REQUEST_TIMEOUT env variable Added to .env.local.example and docs/hosting/ai.md * Rename ALLOWED_MIME_TYPES to ALLOWED_CSV_MIME_TYPES for clarity * Add comment explaining requires_csv_workflow? predicate * Remove redundant required_column_keys from PdfImport Base class already returns [] by default * Add ENV toggle to disable PDF processing for non-vision endpoints OPENAI_SUPPORTS_PDF_PROCESSING=false can be used for OpenAI-compatible endpoints (e.g., Ollama) that don't support vision/PDF processing. * Wire up transaction extraction for PDF bank statements - Add extracted_data JSONB column to imports - Add extract_transactions method to PdfImport - Call extraction in ProcessPdfJob for bank statements - Store transactions in extracted_data for later review * Fix ProcessPdfJob retry logic, sanitize and localize errors - Allow retries after partial success (classification ok, extraction failed) - Log sanitized error message instead of raw message to avoid data leakage - Use i18n for user-facing error messages * Add vision-capable model validation for PDF processing * Fix drag-and-drop test to use correct field name csv_file * Schema bleedover from another branch * Fix drag-drop import form field name to match controller * Add vision capability guard to process_pdf method --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com> Co-authored-by: Juan José Mata <jjmata@jjmata.com>
20 KiB
LLM Configuration Guide
This document explains how Sure uses Large Language Models (LLMs) for AI features and how to configure them for your deployment.
Overview
Sure includes an AI assistant that can help users understand their financial data by answering questions about accounts, transactions, income, expenses, net worth, and more. The assistant uses LLMs to process natural language queries and provide insights based on the user's financial data.
Caution
Only
gpt-4.1was ever supported prior tov0.6.5-alpha*builds!
👉 Help us by taking a structured approach to your issue reporting. 🙏
Quickstart: OpenAI Token
The easiest way to get started with AI features in Sure is to use OpenAI:
- Get an API key from OpenAI
- Set the environment variable:
OPENAI_ACCESS_TOKEN=sk-proj-...your-key-here... - (Re-)Start Sure (both
webandworkerservices!) and the AI assistant will be available to use after you agree/allow via UI option
That's it! Sure will use OpenAI's with a default model (currently gpt-4.1) for all AI operations.
Local vs. Cloud Inference
Cloud Inference (Recommended for Most Users)
What it means: The LLM runs on remote servers (like OpenAI's infrastructure), and your app sends requests over the internet.
| Pros | Cons |
|---|---|
| Zero setup - works immediately | Requires internet connection |
| Always uses the latest models | Data leaves your infrastructure (though transmitted securely) |
| No hardware requirements | Per-request costs |
| Scales automatically | Dependent on provider availability |
| Regular updates and improvements |
When to use:
- You're new to LLMs
- You want the best performance without setup
- You don't have powerful hardware (GPU with large VRAM)
- You're okay with cloud-based processing
- You're running a managed instance
Local Inference (Self-Hosted)
What it means: The LLM runs on your own hardware using tools like Ollama, LM Studio, or LocalAI.
| Pros | Cons |
|---|---|
| Complete data privacy - nothing leaves your network | Requires significant hardware (see below) |
| No per-request costs after initial setup | Setup and maintenance overhead |
| Works offline | Models may be less capable than latest cloud offerings |
| Full control over models and updates | You manage updates and improvements |
| Can be more cost-effective at scale | Performance depends on your hardware |
Hardware Requirements:
The amount of VRAM (GPU memory) you need depends on the model size:
-
Minimum (8GB VRAM): Can run 7B parameter models like
llama3.2:7borgemma2:7b- Works for basic chat functionality
- May struggle with complex financial analysis
-
Recommended (16GB+ VRAM): Can run 13B-14B parameter models like
llama3.1:13borqwen2.5:14b- Good balance of performance and hardware requirements
- Handles most financial queries well
-
Ideal (24GB+ VRAM): Can run 30B+ parameter models or run smaller models with higher precision
- Best quality responses
- Complex reasoning about financial data
CPU-only inference: Possible but extremely slow (10-100x slower). Not recommended for production use.
When to use:
- Privacy is critical (regulated industries, sensitive financial data)
- You have the required hardware
- You're comfortable with technical setup
- You want to minimize ongoing costs
- You need offline functionality
Cloud Providers
Sure supports any OpenAI-compatible API endpoint. Here are tested providers:
OpenAI (Primary Support)
OPENAI_ACCESS_TOKEN=sk-proj-...
# No other configuration needed
# Optional: Request timeout in seconds (default: 60)
# OPENAI_REQUEST_TIMEOUT=60
Recommended models:
gpt-4.1- Default, best balance of speed and qualitygpt-5- Latest model, highest quality (more expensive)gpt-4o-mini- Cheaper, good quality
Pricing: See OpenAI Pricing
Google Gemini (via OpenRouter)
OpenRouter provides access to many models including Gemini:
OPENAI_ACCESS_TOKEN=your-openrouter-api-key
OPENAI_URI_BASE=https://openrouter.ai/api/v1
OPENAI_MODEL=google/gemini-2.0-flash-exp
Why OpenRouter?
- Single API for multiple providers
- Competitive pricing
- Automatic fallbacks
- Usage tracking
Recommended Gemini models via OpenRouter:
google/gemini-2.5-flash- Fast and capablegoogle/gemini-2.5-pro- High quality, good for complex queries
Anthropic Claude (via OpenRouter)
OPENAI_ACCESS_TOKEN=your-openrouter-api-key
OPENAI_URI_BASE=https://openrouter.ai/api/v1
OPENAI_MODEL=anthropic/claude-3.5-sonnet
Recommended Claude models:
anthropic/claude-sonnet-4.5- Excellent reasoning, good with financial dataanthropic/claude-haiku-4.5- Fast and cost-effective
Other Providers
Any service offering an OpenAI-compatible API should work:
- Groq - Fast inference, free tier available
- Together AI - Various open models
- Anyscale - Llama models
- Replicate - Various models
Local LLM Setup (Ollama)
Ollama is the recommended tool for running LLMs locally.
Installation
-
Install Ollama:
# macOS brew install ollama # Linux curl -fsSL https://ollama.com/install.sh | sh # Windows # Download from https://ollama.com/download -
Start Ollama:
ollama serve -
Pull a model:
# Smaller, faster (requires 8GB VRAM) ollama pull gemma2:7b # Balanced (requires 16GB VRAM) ollama pull llama3.1:13b # Larger, more capable (requires 24GB+ VRAM) ollama pull qwen2.5:32b
Configuration
Configure Sure to use Ollama:
# Dummy token (Ollama doesn't need authentication)
OPENAI_ACCESS_TOKEN=ollama-local
# Ollama API endpoint
OPENAI_URI_BASE=http://localhost:11434/v1
# Model you pulled
OPENAI_MODEL=llama3.1:13b
# Optional: enable debug logging in the AI chat
AI_DEBUG_MODE=true
Important: When using Ollama or any custom provider:
- You must set
OPENAI_MODEL- the system cannot default togpt-4.1as that model won't exist in Ollama - The
OPENAI_ACCESS_TOKENcan be any non-empty value (Ollama ignores it) - If you don't set a model, chats will fail with a validation error
Docker Compose Example
services:
sure:
environment:
- OPENAI_ACCESS_TOKEN=ollama-local
- OPENAI_URI_BASE=http://ollama:11434/v1
- OPENAI_MODEL=llama3.1:13b
- AI_DEBUG_MODE=true # Optional: enable debug logging in the AI chat
depends_on:
- ollama
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
# Uncomment if you have an NVIDIA GPU
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
volumes:
ollama_data:
Model Recommendations
Caution
REMINDER: Only
gpt-4.1was ever supported prior tov0.6.5-alpha*builds!
👉 Help us by taking a structured approach to your testing of the models mentioned below. 🙏
For Chat Assistant
The AI assistant needs to understand financial context and perform function/tool calling:
Cloud:
- Best:
gpt-4.1orgpt-5- Most reliable, best function calling - Good:
anthropic/claude-4.5-sonnet- Excellent reasoning - Budget:
google/gemini-2.5-flash- Fast and affordable
Local:
- Best:
qwen3-30b- Strong function calling and reasoning (24GB+ VRAM, 14GB at 3bit quantised ) - Good:
openai/gpt-oss-20b- Solid performance (12GB VRAM) - Budget:
qwen3-8b,llama3.1-8b- Minimal hardware (8GB VRAM), still supports tool calling
For Auto-Categorization
Transaction categorization doesn't require function calling:
Cloud:
- Best: Same as chat -
gpt-4.1orgpt-5 - Budget:
gpt-4o-mini- Much cheaper, still very accurate
Local:
- Any model that works for chat will work for categorization
- This is less demanding than chat, so smaller models may suffice
- Some models don't support structured outputs, please validate when using.
For Merchant Detection
Similar requirements to categorization:
Cloud:
- Same recommendations as auto-categorization
Local:
- Same recommendations as auto-categorization
Configuration via Settings UI
For self-hosted deployments, you can configure AI settings through the web interface:
- Go to Settings → Self-Hosting
- Scroll to the AI Provider section
- Configure:
- OpenAI Access Token - Your API key
- OpenAI URI Base - Custom endpoint (leave blank for OpenAI)
- OpenAI Model - Model name (required for custom endpoints)
Note: Settings in the UI override environment variables. If you change settings in the UI, those values take precedence.
AI Cache Management
Sure caches AI-generated results (like auto-categorization and merchant detection) to avoid redundant API calls and costs. However, there are situations where you may want to clear this cache.
What is the AI Cache?
When AI rules process transactions, Sure stores:
- Enrichment records: Which attributes were set by AI (category, merchant, etc.)
- Attribute locks: Prevents rules from re-processing already-handled transactions
This caching means:
- Transactions won't be sent to the LLM repeatedly
- Your API costs are minimized
- Processing is faster on subsequent rule runs
When to Reset the AI Cache
You might want to reset the cache when:
- Switching LLM models: Different models may produce better categorizations
- Improving prompts: After system updates with better prompts
- Fixing miscategorizations: When AI made systematic errors
- Testing: During development or evaluation of AI features
Caution
Resetting the AI cache will cause all transactions to be re-processed by AI rules on the next run. This will incur API costs if using a cloud provider.
How to Reset the AI Cache
Via UI (Recommended):
- Go to Settings → Rules
- Click the menu button (three dots)
- Select Reset AI cache
- Confirm the action
The cache is cleared asynchronously in the background. You'll see a confirmation message when the process starts.
Automatic Reset: The AI cache is automatically cleared for all users when the OpenAI model setting is changed. This ensures that the new model processes transactions fresh.
What Happens When Cache is Reset
- AI-locked attributes are unlocked: Transactions can be re-enriched
- AI enrichment records are deleted: The history of AI changes is cleared
- User edits are preserved: If you manually changed a category after AI set it, your change is kept
Cost Implications
Before resetting the cache, consider:
| Scenario | Approximate Cost |
|---|---|
| 100 transactions | $0.05-0.20 |
| 1,000 transactions | $0.50-2.00 |
| 10,000 transactions | $5.00-20.00 |
Costs vary by model. Use gpt-4o-mini for lower costs.
Tips to minimize costs:
- Use narrow rule filters before running AI actions
- Reset cache only when necessary
- Consider using local LLMs for bulk re-processing
Observability with Langfuse
Sure includes built-in support for Langfuse, an open-source LLM observability platform.
What is Langfuse?
Langfuse helps you:
- Track all LLM requests and responses
- Monitor costs per request
- Measure response latency
- Debug failed requests
- Analyze usage patterns
- Optimize prompts based on real data
Setup
-
Create a free account at Langfuse Cloud or self-host Langfuse
-
Get your API keys from the Langfuse dashboard
-
Configure Sure:
LANGFUSE_PUBLIC_KEY=pk-lf-... LANGFUSE_SECRET_KEY=sk-lf-... LANGFUSE_HOST=https://cloud.langfuse.com # or your self-hosted URL -
Restart Sure
All LLM operations will now be logged to Langfuse, including:
- Chat messages and responses
- Auto-categorization requests
- Merchant detection
- Token usage and costs
- Response times
Langfuse Features in Sure
- Automatic tracing: Every LLM call is automatically traced
- Session tracking: Chat sessions are tracked with a unique session ID
- User anonymization: User IDs are hashed before sending to Langfuse
- Cost tracking: Token usage is logged for cost analysis
- Error tracking: Failed requests are logged with error details
Viewing Traces
- Go to your Langfuse dashboard
- Navigate to Traces
- You'll see traces for:
openai.chat_response- Chat assistant interactionsopenai.auto_categorize- Transaction categorizationopenai.auto_detect_merchants- Merchant detection
Privacy Considerations
What's sent to Langfuse:
- Prompts and responses
- Model names
- Token counts
- Timestamps
- Session IDs
- Hashed user IDs (not actual user data)
What's NOT sent:
- User email addresses
- User names
- Unhashed user IDs
- Account credentials
For maximum privacy: Self-host Langfuse on your own infrastructure.
Testing and Evaluation
Manual Testing
Test your AI configuration:
-
Go to the Chat interface in Sure
-
Try these test prompts:
- "Show me my total spending this month"
- "What are my top 5 spending categories?"
- "How much do I have in savings?"
-
Verify:
- Responses are relevant
- Function calls work (you should see "Analyzing your data..." briefly)
- Numbers match your actual data
Automated Evaluation
Sure doesn't currently include automated evals, but you can build them using Langfuse:
- Collect baseline responses: Run test prompts and save responses
- Create evaluation dataset: Use Langfuse datasets feature
- Run evaluations: Test new models/prompts against the dataset
- Compare results: Use Langfuse's comparison tools
Benchmarking Models
To compare models for your use case:
-
Speed Test:
- Send the same prompt to different models
- Measure time to first token (TTFT)
- Measure overall response time
-
Quality Test:
- Create a set of 10-20 realistic financial questions
- Get responses from each model
- Manually rate accuracy and helpfulness
-
Cost Test:
- Calculate cost per interaction based on token usage
- Factor in your expected usage volume
- Consider speed vs. cost tradeoffs
Example Evaluation Queries
Good test queries that exercise different capabilities:
- Simple retrieval: "What's my checking account balance?"
- Aggregation: "Total spending on restaurants last month?"
- Comparison: "Am I spending more or less than last year?"
- Analysis: "What are my biggest expenses this quarter?"
- Forecasting: "Based on my spending, when will I reach $10k savings?"
Cost Considerations
Cloud Costs
Typical costs for OpenAI (as of early 2025):
- gpt-4.1: ~$5-15 per 1M input tokens, ~$15-60 per 1M output tokens
- gpt-5: ~2-3x more expensive than gpt-4.1
- gpt-4o-mini: ~$0.15 per 1M input tokens (very cheap)
Typical usage:
- Chat message: 500-2000 tokens (input) + 100-500 tokens (output)
- Auto-categorization: 1000-3000 tokens per 25 transactions
- Cost per chat message: $0.01-0.05 for gpt-4.1
Optimization tips:
- Use
gpt-4o-minifor categorization - Use Langfuse to identify expensive prompts
- Cache results when possible
- Consider local LLMs for high-volume operations
Local Costs
One-time costs:
- GPU hardware: $500-2000+ depending on VRAM needs
- Setup time: 2-8 hours
Ongoing costs:
- Electricity: ~$0.10-0.50 per hour of GPU usage
- Maintenance: Occasional updates and monitoring
Break-even analysis:
If you process 10,000 messages/month:
- Cloud (gpt-4.1): ~$200-500/month
- Local (amortized): ~$50-100/month after hardware cost
- Break-even: 6-12 months depending on hardware cost
Recommendation: Start with cloud, switch to local if costs exceed $100-200/month.
Hybrid Approach
You can mix providers:
# Example: Use local for categorization, cloud for chat
# Categorization (high volume, lower complexity)
CATEGORIZATION_PROVIDER=ollama
CATEGORIZATION_MODEL=gemma2:7b
# Chat (lower volume, higher complexity)
CHAT_PROVIDER=openai
CHAT_MODEL=gpt-4.1
Note: Sure currently uses a single provider for all operations, but this could be customized.
Troubleshooting
"Messages is invalid" Error
Symptom: Cannot start a chat, see validation error
Cause: Using a custom provider (like Ollama) without setting OPENAI_MODEL
Fix:
# Make sure all three are set for custom providers
OPENAI_ACCESS_TOKEN=ollama-local # Any non-empty value
OPENAI_URI_BASE=http://localhost:11434/v1
OPENAI_MODEL=your-model-name # REQUIRED!
Model Not Found
Symptom: Error about model not being available
Cloud: Check that you're using a valid model name for your provider
Local: Make sure you've pulled the model:
ollama list # See what's installed
ollama pull model-name # Install a model
Slow Responses
Symptom: Long wait times for AI responses
Cloud:
- Switch to a faster model (e.g.,
gpt-4o-miniorgemini-2.0-flash-exp) - Check your internet connection
- Verify provider status page
Local:
- Check GPU utilization (should be near 100% during inference)
- Try a smaller model
- Ensure you're using GPU, not CPU
- Check for thermal throttling
No Provider Available
Symptom: "Provider not found" or similar error
Fix:
- Check
OPENAI_ACCESS_TOKENis set - For custom providers, verify
OPENAI_URI_BASEandOPENAI_MODEL - Restart Sure after changing environment variables
- Check logs for specific error messages
High Costs
Symptom: Unexpected bills from cloud provider
Analysis:
- Check Langfuse for usage patterns
- Look for unusually long conversations
- Check if you're using an expensive model
Optimization:
- Switch to cheaper model for categorization
- Consider local LLM for high-volume tasks
- Implement rate limiting if needed
- Review and optimize system prompts
Advanced Topics
Custom System Prompts
Sure's AI assistant uses a system prompt that defines its behavior. The prompt is defined in app/models/assistant/configurable.rb.
To customize:
- Fork the repository
- Edit the
default_instructionsmethod - Rebuild and deploy
What you can customize:
- Tone and personality
- Response format
- Rules and constraints
- Domain expertise
Function Calling
The assistant uses OpenAI's function calling (tool use) to access user data:
Available functions:
get_transactions- Retrieve transaction historyget_accounts- Get account informationget_balance_sheet- Current financial positionget_income_statement- Income and expenses
These are defined in app/models/assistant/function/.
Multi-Model Setup
Currently not supported out of the box, but you could:
- Create multiple provider instances
- Add routing logic to select provider based on task
- Update controllers to specify which provider to use
Rate Limiting
To prevent abuse or runaway costs:
- Use Rack::Attack (already included)
- Configure in
config/initializers/rack_attack.rb - Limit requests per user or globally
Example:
# Limit chat creation to 10 per minute per user
throttle('chats/create', limit: 10, period: 1.minute) do |req|
req.session[:user_id] if req.path == '/chats' && req.post?
end
Resources
- OpenAI Documentation
- Ollama Documentation
- OpenRouter Documentation
- Langfuse Documentation
- Sure GitHub Repository
Support
For issues with AI features:
- Check this documentation first
- Search existing GitHub issues
- Open a new issue with:
- Your configuration (redact API keys!)
- Error messages
- Steps to reproduce
- Expected vs. actual behavior
Last Updated: October 2025