Commit Graph

14 Commits

Author SHA1 Message Date
MkDev11
6f8858b1a6 feat/Add AI-Powered Bank Statement Import (step 1, PDF import & analysis) (#808)
* feat: Add PDF import with AI-powered document analysis

This enhances the import functionality to support PDF files with AI-powered
document analysis. When a PDF is uploaded, it is processed by AI to:
- Identify the document type (bank statement, credit card statement, etc.)
- Generate a summary of the document contents
- Extract key metadata (institution, dates, balances, transaction count)

After processing, an email is sent to the user asking for next steps.

Key changes:
- Add PdfImport model for handling PDF document imports
- Add Provider::Openai::PdfProcessor for AI document analysis
- Add ProcessPdfJob for async PDF processing
- Add PdfImportMailer for user notification emails
- Update imports controller to detect and handle PDF uploads
- Add PDF import option to the new import page
- Add i18n translations for all new strings
- Add comprehensive tests for the new functionality

* Add bank statement import with AI extraction

- Create ImportBankStatement assistant function for MCP
- Add BankStatementExtractor with chunked processing for small context windows
- Register function in assistant configurable
- Make PdfImport#pdf_file_content public for extractor access
- Increase OpenAI request timeout to 600s for slow local models
- Increase DB connection pool to 20 for concurrent operations

Tested with M-Pesa bank statement via remote Ollama (qwen3:8b):
- Successfully extracted 18 transactions
- Generated CSV and created TransactionImport
- Works with 3000 char chunks for small context windows

* Add pdf-reader gem dependency

The BankStatementExtractor uses PDF::Reader to parse bank statement
PDFs, but the gem was not properly declared in the Gemfile. This would
cause NameError in production when processing bank statements.

Added pdf-reader ~> 2.12 to Gemfile dependencies.

* Fix transaction deduplication to preserve legitimate duplicates

The previous deduplication logic removed ALL duplicate transactions based
on [date, amount, name], which would drop legitimate same-day duplicates
like multiple ATM withdrawals or card authorizations.

Changed to only deduplicate transactions that appear in consecutive chunks
(chunking artifacts) while preserving all legitimate duplicates within the
same chunk or non-adjacent chunks.

* Refactor bank statement extraction to use public provider method

Address code review feedback:
- Add public extract_bank_statement method to Provider::Openai
- Remove direct access to private client via send(:client)
- Update ImportBankStatement to use new public method
- Add require 'set' to BankStatementExtractor
- Remove PII-sensitive content from error logs
- Add defensive check for nil response.error
- Handle oversized PDF pages in chunking logic
- Remove unused process_native and process_generic methods
- Update email copy to reflect feature availability
- Add guard for nil document_type in email template
- Document pdf-reader gem rationale in Gemfile

Tested with both OpenAI (gpt-4o) and Ollama (qwen3:8b):
- OpenAI: 49 transactions extracted in 30s
- Ollama: 40 transactions extracted in 368s
- All encapsulation and error handling working correctly

* Update schema.rb with ai_summary and document_type columns

* Address PR #808 review comments

- Rename :csv_file to :import_file across controllers/views/tests
- Add PDF test fixture (sample_bank_statement.pdf)
- Add supports_pdf_processing? method for graceful degradation
- Revert unrelated database.yml pool change (600->3)
- Remove month_start_day schema bleed from other PR
- Fix PdfProcessor: use .strip instead of .strip_heredoc
- Add server-side PDF magic byte validation
- Conditionally show PDF import option when AI provider available
- Fix ProcessPdfJob: sanitize errors, handle update failure
- Move pdf_file attachment from Import to PdfImport
- Document deduplication logic limitations
- Fix ImportBankStatement: catch specific exceptions only
- Remove unnecessary require 'set'
- Remove dead json_schema method from PdfProcessor
- Reduce default OpenAI timeout from 600s to 60s
- Fix nil guard in text mailer template
- Add require 'csv' to ImportBankStatement
- Remove Gemfile pdf-reader comment

* Fix RuboCop indentation in ProcessPdfJob

* Refactor PDF import check to use model predicate method

Replace is_a?(PdfImport) type check with requires_csv_workflow? predicate
that leverages STI inheritance for cleaner controller logic.

* Fix missing 'unknown' locale key and schema version mismatch

- Add 'unknown: Unknown Document' to document_types locale
- Fix schema version to match latest migration (2026_01_24_180211)

* Document OPENAI_REQUEST_TIMEOUT env variable

Added to .env.local.example and docs/hosting/ai.md

* Rename ALLOWED_MIME_TYPES to ALLOWED_CSV_MIME_TYPES for clarity

* Add comment explaining requires_csv_workflow? predicate

* Remove redundant required_column_keys from PdfImport

Base class already returns [] by default

* Add ENV toggle to disable PDF processing for non-vision endpoints

OPENAI_SUPPORTS_PDF_PROCESSING=false can be used for OpenAI-compatible
endpoints (e.g., Ollama) that don't support vision/PDF processing.

* Wire up transaction extraction for PDF bank statements

- Add extracted_data JSONB column to imports
- Add extract_transactions method to PdfImport
- Call extraction in ProcessPdfJob for bank statements
- Store transactions in extracted_data for later review

* Fix ProcessPdfJob retry logic, sanitize and localize errors

- Allow retries after partial success (classification ok, extraction failed)
- Log sanitized error message instead of raw message to avoid data leakage
- Use i18n for user-facing error messages

* Add vision-capable model validation for PDF processing

* Fix drag-and-drop test to use correct field name csv_file

* Schema bleedover from another branch

* Fix drag-drop import form field name to match controller

* Add vision capability guard to process_pdf method

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com>
Co-authored-by: Juan José Mata <jjmata@jjmata.com>
2026-01-30 20:44:25 +01:00
Copilot
5b736bf691 Fix CSV import for non-UTF-8 encodings (Windows-1250, ISO-8859-2, etc.) (#617)
* Initial plan

* Add encoding detection for CSV imports to handle Windows-1250 and other non-UTF-8 encodings

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Improve encoding detection: prioritize Windows-1250 and increase confidence threshold

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Update Gemfile.lock with rchardet dependency

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Refactor: Extract common encodings to constant and deduplicate code

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix Rubocop style violations

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix linter violations and encoding detection logic

- Remove trailing whitespace from test file (9 lines)
- Fix ensure_utf8_encoding to handle binary strings properly by checking bytesize instead of blank?
- Add error handling for ArgumentError and Encoding::CompatibilityError
- Add invalid/undef replacement options to encode calls for robustness

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix encoding error in ensure_utf8_encoding method

- Use will_save_change_to_raw_file_str? instead of raw_file_str_changed? to avoid encoding errors when checking if attribute changed
- Wrap UTF-8 validation check in begin/rescue to handle ArgumentError from invalid encodings
- This fixes the test failure: "ArgumentError: invalid byte sequence in UTF-8"

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix test: add missing column labels and reload import before checking rows

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix test: ensure import is reloaded before checking rows_count and accessing rows

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>
2026-01-12 10:17:55 +01:00
Zach Gollwitzer
4d0df9b950 Escape quotations in CSV imports properly (#1929)
* Parse quotes in imports

* Update invalid CSV for test
2025-02-28 12:21:07 -05:00
Zach Gollwitzer
6baffe7539 Beta Testing Round 3 Bug Fixes (#1357)
* Clean up env example files

* Fix duplicate category creations

* Fix duplicate tag and merchant creation

* Add initial valuation to imported accounts

* Add upgrade modal prompt

* Don't hide content on billing page

* Add temporary session for new customers

* Lint fixes

* Fix unused translations

* Fix system tests
2024-10-24 11:02:27 -04:00
Zach Gollwitzer
398b246965 CSV Imports Overhaul (Transactions, Trades, Accounts, and Mint import support) (#1209)
* Remove stale 1.0 import logic and model

* Fresh start

* Checkpoint before removing nav

* First working prototype

* Add trade, account, and mint import flows

* Basic working version with tests

* System tests for each import type

* Clean up mappings flow

* Clean up PR, refactor stale code, tests

* Add back row validations

* Row validations

* Fix import job test

* Fix import navigation

* Fix mint import configuration form

* Currency preset for new accounts
2024-10-01 10:47:59 -04:00
Zach Gollwitzer
707c5ca0ca Account Issue Model and Resolution Flow + Troubleshooting guides (#1090)
* Rough draft of issue system

* Simplify design

* Remove stale files from merge conflicts

* STI for issues

* Cleanup

* Improve Synth api key flow

* Stub api key for test
2024-08-16 12:13:48 -04:00
Zach Gollwitzer
fa08f027c7 Sync notifications and troubleshooting guides (#998)
* Add help articles

* Broadcast sync messages as notifications

* Lint fixes

* more lint fixes

* Remove redundant code
2024-07-18 14:39:38 -04:00
Tony Vincent
cdbca5aff3 Allow CSV file upload in import flow (#986)
* Add .tool-versions to gitignore

* Add dropzone js for drag and drop file uploads

* UI for csv file uploads for import

* dropzone controller and use lucide_icon instead of svg

* Preview for file chosen

* File upload

* Remove dropzone

* Normalize I18n keys and fix lint issues

* Add system tests

* Cleanup

* Remove unwanted
2024-07-16 09:23:45 -04:00
Zach Gollwitzer
c6bdf49f10 Account::Sync model and test fixture simplifications (#968)
* Add sync model

* Fresh fixtures for sync tests

* Sync tests overhaul

* Fix entry tests

* Complete remaining model test updates

* Update system tests

* Update demo data task

* Add system tests back to PR checks

* More simplifications, add empty family to fixtures for easier testing
2024-07-10 11:22:59 -04:00
Zach Gollwitzer
c3314e62d1 Account::Entry Delegated Type (namespace updates part 7) (#923)
* Initial entryable models

* Update transfer and tests

* Update transaction controllers and tests

* Update sync process to use new entries model

* Get dashboard working again

* Update transfers, imports, and accounts to use Account::Entry

* Update system tests

* Consolidate transaction management into entries controller

* Add permitted partial key helper

* Move account transactions list to entries controller

* Delegate transaction entries search

* Move transfer relation to entry

* Update bulk transaction management flows to use entries

* Remove test code

* Test fix attempt

* Update demo data script

* Consolidate remaining transaction partials to entries

* Consolidate valuations controller to entries controller

* Lint fix

* Remove unused files, additional cleanup

* Add back valuation creation

* Make migrations fully reversible

* Stale routes cleanup

* Migrations reversible fix

* Move types to entryable concern

* Fix search when no entries found

* Remove more unused code
2024-07-01 10:49:43 -04:00
Zach Gollwitzer
ca39b26070 Transaction transfers, payments, and matching (#883)
* Add transfer model and clean up family snapshot fixtures

* Ignore transfers in income and expense snapshots

* Add transfer validations

* Implement basic transfer matching UI

* Fix merge conflicts

* Add missing translations

* Tweak selection states for transfer types

* Add missing i18n translation
2024-06-19 06:52:08 -04:00
Zach Gollwitzer
9956a9540e Add institution management and account editing controls (#868)
* Add institution management

* Allow user to select institution on create or edit

* Improve redirect behavior

* Final cleanup

* i18n normalization
2024-06-13 14:37:27 -04:00
Christian
dc024d63b0 Feature/profile image uploads (#687)
* Introduce ActiveStorage

* Add active storage related service gems

* Update storage.yml

* Install image processing gem
- sudo apt-get install libvips (required dependency)

* Set default active storage service

* Add profile image to user model

* Amend form to allow profile images to be saved, introduce stimulus controller.

* Purge image when form is blank

* Update markup/stimulus controller

* Add test for profile image uplaods

* Add profile image validation

* Use rails guide gem versions

* Use correct ERB syntax and make all storage options configurable

* Ensure form submits when user clears profile image

* Add profile image thumbnail method

* Extract profile image to a partial

* Updates env.example and storage.yml

* Fix bug with double form save

* Add profile image to the sidenav

* Update production config

* Fix ERB formatting

* normalize en.yml

* Handle non-square images

* Use pre-processing on thumbnail variant

* Resovle gemfile.lock issues

* Rubocop style changes

---------

Signed-off-by: Christian <47796704+crobbo@users.noreply.github.com>
Co-authored-by: Christian Robinson <christian@robbo.dev>
2024-04-30 13:38:33 -04:00
Josh Pigford
99de24ac70 Initial commit 2024-02-02 09:05:04 -06:00