Commit Graph

9 Commits

Author SHA1 Message Date
sentry[bot]
d09765a14c Add mailer subject tests and refine i18n keys (#910)
* Add mailer subject tests and refine i18n keys

* Linter

* Fix test

* More fixes

* More fixes

---------

Co-authored-by: sentry[bot] <39604003+sentry[bot]@users.noreply.github.com>
Co-authored-by: Juan José Mata <juanjo.mata@gmail.com>
2026-02-05 19:47:01 +01:00
MkDev11
6f8858b1a6 feat/Add AI-Powered Bank Statement Import (step 1, PDF import & analysis) (#808)
* feat: Add PDF import with AI-powered document analysis

This enhances the import functionality to support PDF files with AI-powered
document analysis. When a PDF is uploaded, it is processed by AI to:
- Identify the document type (bank statement, credit card statement, etc.)
- Generate a summary of the document contents
- Extract key metadata (institution, dates, balances, transaction count)

After processing, an email is sent to the user asking for next steps.

Key changes:
- Add PdfImport model for handling PDF document imports
- Add Provider::Openai::PdfProcessor for AI document analysis
- Add ProcessPdfJob for async PDF processing
- Add PdfImportMailer for user notification emails
- Update imports controller to detect and handle PDF uploads
- Add PDF import option to the new import page
- Add i18n translations for all new strings
- Add comprehensive tests for the new functionality

* Add bank statement import with AI extraction

- Create ImportBankStatement assistant function for MCP
- Add BankStatementExtractor with chunked processing for small context windows
- Register function in assistant configurable
- Make PdfImport#pdf_file_content public for extractor access
- Increase OpenAI request timeout to 600s for slow local models
- Increase DB connection pool to 20 for concurrent operations

Tested with M-Pesa bank statement via remote Ollama (qwen3:8b):
- Successfully extracted 18 transactions
- Generated CSV and created TransactionImport
- Works with 3000 char chunks for small context windows

* Add pdf-reader gem dependency

The BankStatementExtractor uses PDF::Reader to parse bank statement
PDFs, but the gem was not properly declared in the Gemfile. This would
cause NameError in production when processing bank statements.

Added pdf-reader ~> 2.12 to Gemfile dependencies.

* Fix transaction deduplication to preserve legitimate duplicates

The previous deduplication logic removed ALL duplicate transactions based
on [date, amount, name], which would drop legitimate same-day duplicates
like multiple ATM withdrawals or card authorizations.

Changed to only deduplicate transactions that appear in consecutive chunks
(chunking artifacts) while preserving all legitimate duplicates within the
same chunk or non-adjacent chunks.

* Refactor bank statement extraction to use public provider method

Address code review feedback:
- Add public extract_bank_statement method to Provider::Openai
- Remove direct access to private client via send(:client)
- Update ImportBankStatement to use new public method
- Add require 'set' to BankStatementExtractor
- Remove PII-sensitive content from error logs
- Add defensive check for nil response.error
- Handle oversized PDF pages in chunking logic
- Remove unused process_native and process_generic methods
- Update email copy to reflect feature availability
- Add guard for nil document_type in email template
- Document pdf-reader gem rationale in Gemfile

Tested with both OpenAI (gpt-4o) and Ollama (qwen3:8b):
- OpenAI: 49 transactions extracted in 30s
- Ollama: 40 transactions extracted in 368s
- All encapsulation and error handling working correctly

* Update schema.rb with ai_summary and document_type columns

* Address PR #808 review comments

- Rename :csv_file to :import_file across controllers/views/tests
- Add PDF test fixture (sample_bank_statement.pdf)
- Add supports_pdf_processing? method for graceful degradation
- Revert unrelated database.yml pool change (600->3)
- Remove month_start_day schema bleed from other PR
- Fix PdfProcessor: use .strip instead of .strip_heredoc
- Add server-side PDF magic byte validation
- Conditionally show PDF import option when AI provider available
- Fix ProcessPdfJob: sanitize errors, handle update failure
- Move pdf_file attachment from Import to PdfImport
- Document deduplication logic limitations
- Fix ImportBankStatement: catch specific exceptions only
- Remove unnecessary require 'set'
- Remove dead json_schema method from PdfProcessor
- Reduce default OpenAI timeout from 600s to 60s
- Fix nil guard in text mailer template
- Add require 'csv' to ImportBankStatement
- Remove Gemfile pdf-reader comment

* Fix RuboCop indentation in ProcessPdfJob

* Refactor PDF import check to use model predicate method

Replace is_a?(PdfImport) type check with requires_csv_workflow? predicate
that leverages STI inheritance for cleaner controller logic.

* Fix missing 'unknown' locale key and schema version mismatch

- Add 'unknown: Unknown Document' to document_types locale
- Fix schema version to match latest migration (2026_01_24_180211)

* Document OPENAI_REQUEST_TIMEOUT env variable

Added to .env.local.example and docs/hosting/ai.md

* Rename ALLOWED_MIME_TYPES to ALLOWED_CSV_MIME_TYPES for clarity

* Add comment explaining requires_csv_workflow? predicate

* Remove redundant required_column_keys from PdfImport

Base class already returns [] by default

* Add ENV toggle to disable PDF processing for non-vision endpoints

OPENAI_SUPPORTS_PDF_PROCESSING=false can be used for OpenAI-compatible
endpoints (e.g., Ollama) that don't support vision/PDF processing.

* Wire up transaction extraction for PDF bank statements

- Add extracted_data JSONB column to imports
- Add extract_transactions method to PdfImport
- Call extraction in ProcessPdfJob for bank statements
- Store transactions in extracted_data for later review

* Fix ProcessPdfJob retry logic, sanitize and localize errors

- Allow retries after partial success (classification ok, extraction failed)
- Log sanitized error message instead of raw message to avoid data leakage
- Use i18n for user-facing error messages

* Add vision-capable model validation for PDF processing

* Fix drag-and-drop test to use correct field name csv_file

* Schema bleedover from another branch

* Fix drag-drop import form field name to match controller

* Add vision capability guard to process_pdf method

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com>
Co-authored-by: Juan José Mata <jjmata@jjmata.com>
2026-01-30 20:44:25 +01:00
Juan José Mata
7c5ddd674d Make branding configurable (#173)
* Remove orphan function

* Add centralized branding helpers and update locales

* Remove _plus and add (proper) brand

* No longer Sure, configurable

* Consistency with compose file naming

* Missed `product_name` mapping

* Fix brand/product name in mailers

* Product name in email reset flow

* Fix i18n errors/tests

* Fix password mailer brand/product name (again)

* Missed hardcoded `Sure` in onboarding goals

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Juan José Mata <jjmata@jjmata.com>

* PR nitpick on documentation

* Missing interpolation key for invited UI

* Orphan assets

* New logos

---------

Signed-off-by: Juan José Mata <jjmata@jjmata.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-10-22 19:14:03 +02:00
Juan José Mata
5706280dd7 More rebranding changes (#159)
* Replace Maybe for Sure in select code areas

* Make sure passwords are consistent

* Remove (admin|member) from demo data first name

* Database and schema names finally to `sure`

* Fix broken test

* Another (benchmarking) database name to `sure_*`

* More rebranding to Sure

* Missed this Maybe mention in the same page

* Random nitpicks and more Maybes

* Demo data accounts and more Maybes

* Test data account updates

* Impersonation test accounts

* Consistency with `compose.example.yml`
2025-09-24 00:19:51 +02:00
Josh Pigford
41873de11d Allow users to update their email address (#1745)
* Change email address

* Email confirmation

* Email change test

* Lint

* Schema reset

* Set test email sender

* Select specific user fixture

* Refactor/cleanup

* Remove unused email_confirmation_token

* Current user would never be true

* Fix translation test failures
2025-01-31 11:29:49 -06:00
Zach Gollwitzer
7fabca4679 Simplify self host settings controller (#1230) 2024-10-02 12:07:56 -04:00
Zach Gollwitzer
5dfbba403a Test environment stability improvements (#703)
* Add climate_control gem and test helper

* Replace ENV mods in upgrades test

* Replace ENV mods in registrations test

* Remove ENV references in hostings controller

* Update ENV refs in mailer test

* ActiveStorage cleanup

* Consolidate queue config so appropriate adapter runs in test environment

* Make test environment more explicit

* Centralize self hosting config

* Remove flaky system test
2024-05-02 13:18:18 -04:00
Thibaut Gorioux
6fdb8e8d69 Allow a self-hosted user to configure their SMTP settings directly from within the UI (#682)
* Add setting fields to model

* Allow to configure SMTP settings

* Normalize locales

* Cleanup locales

* Remove 'coming soon'

* fix test

* Reset credentials

* Reset development config

* Check smtp spelling

* Use post instead of get method

* TLS ENV variable is more descriptive

* Rework application mailer

* Follow rails convention for mailer action params

* Reset schema.rb to main

* Test WIP

* Add test for controller and mailer

* Move tests from controller to model

* Custom error message if settings are not all present

* Comment smtp config in development env

* Add default tls enabled value

* Rubocop

* Fix controller test

* Reset credentials

* Normalize locales

* Test

* fix test

* Fix application mailer test that fails randomly

* Error flash message instead of notice

* Rework application mailer tests
2024-04-29 16:44:24 -04:00
Josh Pigford
99de24ac70 Initial commit 2024-02-02 09:05:04 -06:00