Commit Graph

46 Commits

Author SHA1 Message Date
MkDev11
6f8858b1a6 feat/Add AI-Powered Bank Statement Import (step 1, PDF import & analysis) (#808)
* feat: Add PDF import with AI-powered document analysis

This enhances the import functionality to support PDF files with AI-powered
document analysis. When a PDF is uploaded, it is processed by AI to:
- Identify the document type (bank statement, credit card statement, etc.)
- Generate a summary of the document contents
- Extract key metadata (institution, dates, balances, transaction count)

After processing, an email is sent to the user asking for next steps.

Key changes:
- Add PdfImport model for handling PDF document imports
- Add Provider::Openai::PdfProcessor for AI document analysis
- Add ProcessPdfJob for async PDF processing
- Add PdfImportMailer for user notification emails
- Update imports controller to detect and handle PDF uploads
- Add PDF import option to the new import page
- Add i18n translations for all new strings
- Add comprehensive tests for the new functionality

* Add bank statement import with AI extraction

- Create ImportBankStatement assistant function for MCP
- Add BankStatementExtractor with chunked processing for small context windows
- Register function in assistant configurable
- Make PdfImport#pdf_file_content public for extractor access
- Increase OpenAI request timeout to 600s for slow local models
- Increase DB connection pool to 20 for concurrent operations

Tested with M-Pesa bank statement via remote Ollama (qwen3:8b):
- Successfully extracted 18 transactions
- Generated CSV and created TransactionImport
- Works with 3000 char chunks for small context windows

* Add pdf-reader gem dependency

The BankStatementExtractor uses PDF::Reader to parse bank statement
PDFs, but the gem was not properly declared in the Gemfile. This would
cause NameError in production when processing bank statements.

Added pdf-reader ~> 2.12 to Gemfile dependencies.

* Fix transaction deduplication to preserve legitimate duplicates

The previous deduplication logic removed ALL duplicate transactions based
on [date, amount, name], which would drop legitimate same-day duplicates
like multiple ATM withdrawals or card authorizations.

Changed to only deduplicate transactions that appear in consecutive chunks
(chunking artifacts) while preserving all legitimate duplicates within the
same chunk or non-adjacent chunks.

* Refactor bank statement extraction to use public provider method

Address code review feedback:
- Add public extract_bank_statement method to Provider::Openai
- Remove direct access to private client via send(:client)
- Update ImportBankStatement to use new public method
- Add require 'set' to BankStatementExtractor
- Remove PII-sensitive content from error logs
- Add defensive check for nil response.error
- Handle oversized PDF pages in chunking logic
- Remove unused process_native and process_generic methods
- Update email copy to reflect feature availability
- Add guard for nil document_type in email template
- Document pdf-reader gem rationale in Gemfile

Tested with both OpenAI (gpt-4o) and Ollama (qwen3:8b):
- OpenAI: 49 transactions extracted in 30s
- Ollama: 40 transactions extracted in 368s
- All encapsulation and error handling working correctly

* Update schema.rb with ai_summary and document_type columns

* Address PR #808 review comments

- Rename :csv_file to :import_file across controllers/views/tests
- Add PDF test fixture (sample_bank_statement.pdf)
- Add supports_pdf_processing? method for graceful degradation
- Revert unrelated database.yml pool change (600->3)
- Remove month_start_day schema bleed from other PR
- Fix PdfProcessor: use .strip instead of .strip_heredoc
- Add server-side PDF magic byte validation
- Conditionally show PDF import option when AI provider available
- Fix ProcessPdfJob: sanitize errors, handle update failure
- Move pdf_file attachment from Import to PdfImport
- Document deduplication logic limitations
- Fix ImportBankStatement: catch specific exceptions only
- Remove unnecessary require 'set'
- Remove dead json_schema method from PdfProcessor
- Reduce default OpenAI timeout from 600s to 60s
- Fix nil guard in text mailer template
- Add require 'csv' to ImportBankStatement
- Remove Gemfile pdf-reader comment

* Fix RuboCop indentation in ProcessPdfJob

* Refactor PDF import check to use model predicate method

Replace is_a?(PdfImport) type check with requires_csv_workflow? predicate
that leverages STI inheritance for cleaner controller logic.

* Fix missing 'unknown' locale key and schema version mismatch

- Add 'unknown: Unknown Document' to document_types locale
- Fix schema version to match latest migration (2026_01_24_180211)

* Document OPENAI_REQUEST_TIMEOUT env variable

Added to .env.local.example and docs/hosting/ai.md

* Rename ALLOWED_MIME_TYPES to ALLOWED_CSV_MIME_TYPES for clarity

* Add comment explaining requires_csv_workflow? predicate

* Remove redundant required_column_keys from PdfImport

Base class already returns [] by default

* Add ENV toggle to disable PDF processing for non-vision endpoints

OPENAI_SUPPORTS_PDF_PROCESSING=false can be used for OpenAI-compatible
endpoints (e.g., Ollama) that don't support vision/PDF processing.

* Wire up transaction extraction for PDF bank statements

- Add extracted_data JSONB column to imports
- Add extract_transactions method to PdfImport
- Call extraction in ProcessPdfJob for bank statements
- Store transactions in extracted_data for later review

* Fix ProcessPdfJob retry logic, sanitize and localize errors

- Allow retries after partial success (classification ok, extraction failed)
- Log sanitized error message instead of raw message to avoid data leakage
- Use i18n for user-facing error messages

* Add vision-capable model validation for PDF processing

* Fix drag-and-drop test to use correct field name csv_file

* Schema bleedover from another branch

* Fix drag-drop import form field name to match controller

* Add vision capability guard to process_pdf method

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com>
Co-authored-by: Juan José Mata <jjmata@jjmata.com>
2026-01-30 20:44:25 +01:00
Mark Hendriksen
9b84c5bdbc Enhanced Import Amount Type Selection (#506)
* Enhanced Import Amount Type Selection

updated version of https://github.com/we-promise/sure/pull/179

* copilot sugestions

* ai sugestions

* Update import.rb

* Update schema.rb

* Update schema.rb

* Update schema.rb

---------

Signed-off-by: Juan José Mata <juanjo.mata@gmail.com>
Co-authored-by: Juan José Mata <juanjo.mata@gmail.com>
2026-01-23 22:12:02 +01:00
sokie
11a011db15 Update import.rb 2026-01-13 13:51:10 +01:00
sokie
c9b334ac4f Add rows validation 2026-01-13 13:50:12 +01:00
sokie
7297554a55 Fix issues
Issue 1 Fixed - Template now carries rows_to_skip.
  Issue 2 Fixed - Column headers refresh when rows_to_skip changes.
2026-01-13 13:35:38 +01:00
Juan José Mata
accdbb799b Merge branch 'main' into add-config-import-csv-skip-first-x-rows
Signed-off-by: Juan José Mata <juanjo.mata@gmail.com>
2026-01-13 12:56:22 +01:00
Copilot
5b736bf691 Fix CSV import for non-UTF-8 encodings (Windows-1250, ISO-8859-2, etc.) (#617)
* Initial plan

* Add encoding detection for CSV imports to handle Windows-1250 and other non-UTF-8 encodings

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Improve encoding detection: prioritize Windows-1250 and increase confidence threshold

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Update Gemfile.lock with rchardet dependency

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Refactor: Extract common encodings to constant and deduplicate code

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix Rubocop style violations

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix linter violations and encoding detection logic

- Remove trailing whitespace from test file (9 lines)
- Fix ensure_utf8_encoding to handle binary strings properly by checking bytesize instead of blank?
- Add error handling for ArgumentError and Encoding::CompatibilityError
- Add invalid/undef replacement options to encode calls for robustness

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix encoding error in ensure_utf8_encoding method

- Use will_save_change_to_raw_file_str? instead of raw_file_str_changed? to avoid encoding errors when checking if attribute changed
- Wrap UTF-8 validation check in begin/rescue to handle ArgumentError from invalid encodings
- This fixes the test failure: "ArgumentError: invalid byte sequence in UTF-8"

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix test: add missing column labels and reload import before checking rows

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix test: ensure import is reloaded before checking rows_count and accessing rows

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>
2026-01-12 10:17:55 +01:00
Juan José Mata
664a00678e Merge branch 'main' into add-config-import-csv-skip-first-x-rows
Signed-off-by: Juan José Mata <juanjo.mata@gmail.com>
2026-01-10 17:47:04 +01:00
Carlos Adames
b56dbdb9eb Feat: /import endpoint & drag-n-drop imports (#501)
* Implement API v1 Imports controller

- Add Api::V1::ImportsController with index, show, and create actions
- Add Jbuilder views for index and show
- Add integration tests
- Implement row generation logic in create action
- Update routes

* Validate import account belongs to family

- Add validation to Import model to ensure account belongs to the same family
- Add regression test case in Api::V1::ImportsControllerTest

* updating docs to be more detailed

* Rescue StandardError instead of bare rescue in ImportsController

* Optimize Imports API and fix documentation

- Implement rows_count counter cache for Imports
- Preload rows in Api::V1::ImportsController#show
- Update documentation to show correct OAuth scopes

* Fix formatting in ImportsControllerTest

* Permit all import parameters and fix unknown attribute error

* Restore API routes for auth, chats, and messages

* removing pr summary

* Fix trailing whitespace and configured? test failure

- Update Import#configured? to use rows_count for performance and consistency
- Mock rows_count in TransactionImportTest
- Fix trailing whitespace in migration

* Harden security and fix mass assignment in ImportsController

- Handle type and account_id explicitly in create action
- Rename import_params to import_config_params for clarity
- Validate type against Import::TYPES

* Fix MintImport rows_count update and migration whitespace

- Update MintImport#generate_rows_from_csv to update rows_count counter cache
- Fix trailing whitespace and final newline in AddRowsCountToImports migration

* Implement full-screen Drag and Drop CSV import on Transactions page

- Add DragAndDropImport Stimulus controller listening on document
- Add full-screen overlay with icon and text to Transactions index
- Update ImportsController to handle direct file uploads via create action
- Add system test for drag and drop functionality

* Implement Drag and Drop CSV upload on Import Upload page

- Add drag-and-drop-import controller to import/uploads/show
- Add full-screen overlay to import/uploads/show
- Annotate upload form and input with drag-and-drop targets
- Add PR_SUMMARY.md

* removing pr summary

* Add file validation to ImportsController

- Validate file size (max 10MB) and MIME type in create action
- Prevent memory exhaustion and invalid file processing
- Defined MAX_CSV_SIZE and ALLOWED_MIME_TYPES in Import model

* Refactor dragLeave logic with counter pattern to prevent flickering

* Extract shared drag-and-drop overlay partial

- Create app/views/imports/_drag_drop_overlay.html.erb
- Update transactions/index and import/uploads/show to use the partial
- Reduce code duplication in views

* Update Brakeman and harden ImportsController security

- Update brakeman to 7.1.2
- Explicitly handle type assignment in ImportsController#create to avoid mass assignment
- Remove :type from permitted import parameters

* Fix trailing whitespace in DragAndDropImportTest

* Don't commit LLM comments as file

* FIX add api validation

---------

Co-authored-by: Carlos Adames <cj@Carloss-MacBook-Air.local>
Co-authored-by: Juan José Mata <jjmata@jjmata.com>
Co-authored-by: sokie <sokysrm@gmail.com>
2026-01-10 16:39:18 +01:00
kasra
e8216984a7 Add rows_to_skip config for transaction import 2025-12-27 02:21:28 +03:30
soky srm
64c25725c9 Fix CSV import with no currency (#462)
* FIX use the accounts we are importing currency as default, not family default

* FIX add family fallback for multi account import
2025-12-17 18:37:35 +01:00
Juan José Mata
e5ed946959 Add rules import/export support (#424)
* Add full import/export support for rules with versioned JSON schema

This commit implements comprehensive import/export functionality for rules,
allowing users to back up and restore their rule definitions.

Key features:
- Export rules to both CSV and NDJSON formats with versioned schema (v1)
- Import rules from CSV with full support for nested conditions and actions
- UUID to name mapping for categories and merchants for portability
- Support for compound conditions with sub-conditions
- Comprehensive test coverage for export and import functionality
- UI integration for rules import in the imports interface

Technical details:
- Extended Family::DataExporter to generate rules.csv and include rules in all.ndjson
- Created RuleImport model following the existing Import STI pattern
- Added migration for rule-specific columns in import_rows table
- Implemented serialization helpers to map UUIDs to human-readable names
- Added i18n support for the new import option
- Included versioning in NDJSON export to support future schema evolution

The implementation ensures rules can be safely exported from one family
and imported into another, even when category/merchant IDs differ,
by mapping between names and IDs during export/import.

* Fix AR migration version

* Mention support for rules export

* Rabbit suggestion

* Fix tests

* Missed schema.rb

* Fix sample CSV download for rule import

* Fix parsing in Rules import

* Fix tests

* Rule import message i18n

* Export tag names, not UUIDs

* Make sure tags are created if needed at import

* Avoid test errors when running in parallel

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-07 13:20:54 +01:00
Juan José Mata
65bd9e813a Support category .csv import (#326)
* Allow category imports to set icons

* Linter

* Update category import description in English locale

Signed-off-by: Juan José Mata <juanjo.mata@gmail.com>

* Rename icon in CSV header to `lucide-icon`

* Make sure we export the icon while we're at this

---------

Signed-off-by: Juan José Mata <juanjo.mata@gmail.com>
2025-11-15 19:00:52 +01:00
soky srm
f882484bba Add transaction dedup support for CSV imports (#304)
* Support dedup for transaction also for CSV

* Fix to exclude CSV importing duplicates

* Guard nil account
2025-11-10 12:09:22 +01:00
Zach Gollwitzer
bacab94a1b Fix import reverts 2025-07-24 11:41:42 -04:00
Zach Gollwitzer
e26e5c5aec Auto sync preference, max limit on account CSV imports (#2259)
* Auto sync preference, max limit on account CSV imports

* MaxRowCountExceededError
2025-05-18 15:02:51 -04:00
Zach Gollwitzer
10dd9e061a Improve account sync performance, handle concurrent market data syncing (#2236)
* PlaidConnectable concern

* Remove bad abstraction

* Put sync implementations in own concerns

* Sync strategies

* Move sync orchestration to Sync class

* Clean up sync class, add state machine

* Basic market data sync cron

* Fix price sync

* Improve sync window column names, add timestamps

* 30 day syncs by default

* Clean up market data methods

* Report high duplicate sync counts to Sentry

* Add sync states throughout app

* account tab session

* Persistent account tab selections

* Remove manual sleep

* Add migration to clear stale syncs on self hosted apps

* Tweak sync states

* Sync completion event broadcasts

* Fix timezones in tests

* Cleanup

* More cleanup

* Plaid item UI broadcasts for sync

* Fix account ID namespace conflict

* Sync broadcasters

* Smoother account sync refreshes

* Remove test sync delay
2025-05-15 10:19:56 -04:00
Zach Gollwitzer
c88fe2e3b2 Feature: Add "amount type" configuration column for CSV imports (#1947)
* Rough draft

* Schema conflict update

* Implement signage

* Update system tests

* Lint fixes
2025-04-18 10:48:10 -04:00
Zach Gollwitzer
e657c40d19 Account:: namespace simplifications and cleanup (#2110)
* Flatten Holding model

* Flatten balance model

* Entries domain renames

* Fix valuations reference

* Fix trades stream

* Fix brakeman warnings

* Fix tests

* Replace existing entryable type references in DB
2025-04-14 11:40:34 -04:00
Zach Gollwitzer
0544089710 Account-level import configuration templates (#1946)
* Account-level import configuration templates

* Default import to family's preferred date format
2025-03-04 13:10:01 -05:00
Zach Gollwitzer
c5da8ea550 Allow CSV imports to be configured with single or multi-account mode (#1943)
* Allow CSV imports to be configured to a single account or multiple accounts

* Initialize import directly from accounts page

* Fix brakeman warnings

* Fix schema

* Fix Synth check
2025-03-03 12:47:30 -05:00
Zach Gollwitzer
4d0df9b950 Escape quotations in CSV imports properly (#1929)
* Parse quotes in imports

* Update invalid CSV for test
2025-02-28 12:21:07 -05:00
David Anyatonwu
32ef6ca154 Add exchange and currency fields to trade imports (#1822)
* Add exchange and currency fields to trade imports

* Add exchange_operating_mic support for trade imports - Added required columns and updated models

* refactor: remove exchange and currency columns

* fix: consolidate import schema and remove redundant columns

* feat: Enhance trade import with exchange_operating_mic support

* Revert changes to existing migration

* Simplify migration to use change method

* Restore previously deleted migration

* Remove unused import_col_labels method

* Update schema.rb after running migrations

* Update trade_import.rb and fix schema.rb with db:migrate:reset

* fix: improve trade import security creation

---------

Signed-off-by: David Anyatonwu <51977119+onyedikachi-david@users.noreply.github.com>
2025-02-24 10:00:24 -05:00
Zach Gollwitzer
8539ac7dec Fix import configuration failures (#1876) 2025-02-21 08:58:06 -05:00
Daniel Esteves
077694bbde feat(import): add currency and number format support for CSV imports (#1819)
* feat(import): add currency and number format support for CSV imports

* feat(import): add currency and format for mint and trade

* fix(imports): remove currency field in favor of currency csv col

* fix(imports): remove validate column from import model

* test(import): add some tests for imports

* test(import): add some tests for generate_rows_from_csv

* fix: change method name for import model

* fix: change before validation

---------

Co-authored-by: danestves <danestves@users.noreply.github.com>
2025-02-10 15:31:28 -05:00
Zach Gollwitzer
536c82f2aa Feature: Add the ability to "revert" a CSV import (#1814)
* Allow reverting imports

* Fix tests

* Add currency column to all imports

* Don't auto-enrich demo account
2025-02-07 15:36:05 -05:00
Zach Gollwitzer
79ca7e2039 Preserve negative sign on raw CSV values 2024-10-10 18:57:00 -04:00
Zach Gollwitzer
a20809eee3 When unassigned accounts in CSV import, always allow new account creation 2024-10-10 15:51:36 -04:00
Zach Gollwitzer
cd9f20747c Allow inline account creation when importing CSV (#1291)
* Allow inline account creation when importing CSV

* Sanitize numeric inputs for CSV

* CSV import date validation

* Lint fix
2024-10-10 15:14:38 -04:00
Zach Gollwitzer
398b246965 CSV Imports Overhaul (Transactions, Trades, Accounts, and Mint import support) (#1209)
* Remove stale 1.0 import logic and model

* Fresh start

* Checkpoint before removing nav

* First working prototype

* Add trade, account, and mint import flows

* Basic working version with tests

* System tests for each import type

* Clean up mappings flow

* Clean up PR, refactor stale code, tests

* Add back row validations

* Row validations

* Fix import job test

* Fix import navigation

* Fix mint import configuration form

* Currency preset for new accounts
2024-10-01 10:47:59 -04:00
Zach Gollwitzer
949d3d80fa Support multi-currency transfers (#1175) 2024-09-13 11:45:19 -04:00
Pedro Carmona
0c1ff00c1e Refactor: Allow other import files (#1099)
* Rename stimulus controller

* feature: rename raw_csv_str to raw_file_str
2024-08-19 09:25:07 -04:00
Alexander Schrot
4527482aa2 Add support for different column separator in csv import logic (#1096)
* add col_sep to import model

* add validation for col_sep column

* add col_sep option to csv import model

* make use of col_sep option in import model

* add column separator field to new/edit action of an import

* add col_sep parameter to create/update action

* fix spacing between fields

Co-authored-by: Zach Gollwitzer <zach.gollwitzer@gmail.com>
Signed-off-by: Alexander Schrot <alexander@axs-labs.com>

---------

Signed-off-by: Alexander Schrot <alexander@axs-labs.com>
Co-authored-by: Zach Gollwitzer <zach.gollwitzer@gmail.com>
2024-08-16 14:00:16 -04:00
Pedro Carmona
14c4b9e93c Refactor: Use native error i18n lookup (#1076) 2024-08-12 20:38:58 -04:00
Zach Gollwitzer
c3314e62d1 Account::Entry Delegated Type (namespace updates part 7) (#923)
* Initial entryable models

* Update transfer and tests

* Update transaction controllers and tests

* Update sync process to use new entries model

* Get dashboard working again

* Update transfers, imports, and accounts to use Account::Entry

* Update system tests

* Consolidate transaction management into entries controller

* Add permitted partial key helper

* Move account transactions list to entries controller

* Delegate transaction entries search

* Move transfer relation to entry

* Update bulk transaction management flows to use entries

* Remove test code

* Test fix attempt

* Update demo data script

* Consolidate remaining transaction partials to entries

* Consolidate valuations controller to entries controller

* Lint fix

* Remove unused files, additional cleanup

* Add back valuation creation

* Make migrations fully reversible

* Stale routes cleanup

* Migrations reversible fix

* Move types to entryable concern

* Fix search when no entries found

* Remove more unused code
2024-07-01 10:49:43 -04:00
Igor Carvalho
094128fef1 Fix issue #861: Correct header selection logic in get_selected_header_for_field method (#918)
The get_selected_header_for_field method was incorrectly using the entire field object instead of the field key to dig into the column_mappings hash. This caused an error when trying to retrieve the selected header for a field.
2024-06-24 10:31:21 -04:00
Zach Gollwitzer
2681dd96b1 Move categories to top-level namespace (#894) 2024-06-20 08:15:09 -04:00
Zach Gollwitzer
8372e26864 Allow optional import fields (#865) 2024-06-11 18:46:44 -04:00
Zach Gollwitzer
de53a50e45 Sync account after transaction import (#820) 2024-05-30 22:06:32 -04:00
Jakub Kottnauer
6e59fdb369 Add tag preview when importing (#800) 2024-05-24 10:39:24 -04:00
Zach Gollwitzer
457247da8e Create tagging system (#792)
* Repro

* Fix

* Update signage

* Create tagging system

* Add tags to transaction imports

* Build tagging UI

* Cleanup

* More cleanup
2024-05-23 08:09:33 -04:00
Zach Gollwitzer
41c991384a Fix duplicate category creation on import (#791)
* Repro

* Fix

* Update signage
2024-05-22 10:02:03 -04:00
Jakub Kottnauer
77f166a5f8 Ignore empty categories while importing (#789)
* Ignore empty categories while importing

* Review fixes
2024-05-22 08:12:56 -04:00
Jakub Kottnauer
32748b0632 Fix import crash with empty transaction name (#783) 2024-05-20 17:21:40 -04:00
Jakub Kottnauer
34811d8fd8 Fix currency when importing to foreign accounts (#762) 2024-05-20 09:55:45 -04:00
Zach Gollwitzer
45ae4a9737 CSV Transaction Imports (#708)
Introduces a basic CSV import module for bulk-importing account transactions.

Changes include:

- User can load a CSV
- User can configure the column mappings for a CSV
- Imported CSV shows invalid cells
- User can clean up their data directly in the UI
- User can see a preview of the import rows and confirm import
- Layout refactor + Import nav stepper
- System test stability improvements
2024-05-17 09:09:32 -04:00