QT/sure - sure

QT/sure

Fork 0

mirror of https://github.com/we-promise/sure.git synced 2026-04-08 14:54:49 +00:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Juan José Mata	0fb9d60ee6	Use `dependent: :purge_later` for ActiveRecord attachments (#882 ) * Use dependent: :purge_later for user profile_image cleanup This is a simpler alternative to PR #787's callback-based approach. Instead of adding a custom callback and method, we use Rails' built-in `dependent: :purge_later` option which is already used by FamilyExport and other models in the codebase. This single-line change ensures orphaned ActiveStorage attachments are automatically purged when a user is destroyed, without the overhead of querying all attachments manually. https://claude.ai/code/session_01Np3deHEAJqCBfz3aY7c3Tk * Add dependent: :purge_later to all ActiveStorage attachments Extends the attachment cleanup from PR #787 to cover ALL models with ActiveStorage attachments, not just User.profile_image. Models updated: - PdfImport.pdf_file - prevents orphaned PDF files from imports - Account.logo - prevents orphaned account logos - PlaidItem.logo, SimplefinItem.logo, SnaptradeItem.logo, CoinstatsItem.logo, CoinbaseItem.logo, LunchflowItem.logo, MercuryItem.logo, EnableBankingItem.logo - prevents orphaned provider logos This ensures that when a family is deleted (cascade from last user purge), all associated storage files are properly cleaned up via Rails' built-in dependent: :purge_later mechanism. https://claude.ai/code/session_01Np3deHEAJqCBfz3aY7c3Tk * Make sure `Provider` generator adds it * Fix tests --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-02-03 15:45:25 +01:00
MkDev11	0afdb1d0fd	Feature/pdf import transaction rows (#846 ) * Add import row generation from PDF extracted data - Add generate_rows_from_extracted_data method to PdfImport - Add import! method to create transactions from PDF rows - Update ProcessPdfJob to generate rows after extraction - Update configured?, cleaned?, publishable? for PDF workflow - Add column_keys, required_column_keys, mapping_steps - Set bank statements to pending status for user review - Add tests for new functionality Closes #844 * Add tests for BankStatementExtractor - Test transaction extraction from PDF content - Test deduplication across chunk boundaries - Test amount normalization for various formats - Test graceful handling of malformed JSON responses - Test error handling for empty/nil PDF content * Fix supports_pdf_processing? to validate effective model The validation was always checking @default_model, but process_pdf allows overriding the model via parameter. This could cause a vision-capable override model to be rejected, or a non-vision-capable override to pass validation only to fail during processing. Changes: - supports_pdf_processing? now accepts optional model parameter - process_pdf passes effective model to validation - Raise Provider::Openai::Error inside with_provider_response for consistent error handling Addresses review feedback from PR#808 * Fix insert_all! bug: explicitly set import_id Rails insert_all! on associations does NOT auto-set the foreign key. Added import_id explicitly and use Import::Row.insert_all! directly. Also reload rows before counting to ensure accurate count. * Fix pending status showing as processing for bank statements with rows When bank statement PDF imports have extracted rows, show a 'Ready for Review' screen with a link to the confirm path instead of the 'Processing' spinner. This addresses the PR feedback that users couldn't reach the review flow even though rows were created. * Gate publishable? on account.present? to prevent import failure PDF imports are created without an account, and import! raises if account is missing. This prevents users from hitting publish and having the job fail. * Wrap generate_rows_from_extracted_data in transaction for atomicity - Clear rows and reset count even when no transactions extracted - Use transaction block to prevent partial updates on failure - Use mapped_rows.size instead of reload for count * Localize transactions count string with i18n helper * Add AccountMapping step for PDF imports when account is nil PDF imports need account selection before publishing. This adds Import::AccountMapping to mapping_steps when account is nil, matching the behavior of TransactionImport and TradeImport. Addresses PR#846 feedback about account selection for PDF imports. * Only include CategoryMapping when rows have non-empty categories PDF extraction doesn't extract categories from bank statements, so the CategoryMapping step would show empty. Now we only include CategoryMapping if rows actually have non-empty category values. This prevents showing an empty mapping step for PDF imports. * Fix PDF import UI flow and account selection - Add direct account selection in PDF import UI instead of AccountMapping - AccountMapping designed for CSV imports with multiple account values - PDF imports need single account for all transactions - Add update action and route for imports controller - Fix controller to handle pdf_import param format from form_with - Show Publish button when import is publishable (account set) - Fix stepper nav: Upload/Configure/Clean non-clickable for PDF imports - Redirect PDF imports from configuration step (auto-configured) - Improve AI prompt to recognize M-PESA/mobile money as bank statements - Fix migration ordering for import_rows table columns * Add guard for invalid account_id in imports#update Prevents silently clearing account when invalid ID is passed. Returns error message instead of confusing 'Account saved' notice. * Localize step names in import nav and add account guard - Use t() helper for all step names (Upload, Configure, Clean, Map, Confirm) - Add guard for invalid account_id in imports#update - Prevents silently clearing account when invalid ID is passed * Make category column migrations idempotent Check if columns exist before adding to prevent duplicate column errors when migrations are re-run with new timestamps. * Add match_path for PDF import step highlighting Fixes step detection when path is nil by using separate match_path for current step highlighting while keeping links disabled. * Rename category migrations and update to Rails 7.2 - Rename class to EnsureCategoryFieldsOnImportRows to avoid conflicts - Rename class to EnsureCategoryIconOnImportRows - Update migration version from 7.1 to 7.2 per guidelines - Rename files to match class names - Add match_path for PDF import step highlighting * Use primary (black) style for Create Account and Save buttons * Remove match_path from auto-completed PDF steps Only step 4 (Confirm) needs match_path for active-step detection. Steps 1-3 are purely informational and always complete. * Add fallback for document type translation Handles nil or unexpected document_type values gracefully. Also removes match_path from auto-completed PDF steps. * Use index-based step number for mobile indicator Fixes 'Step 5 of 4' issue when Map step is dynamically removed. * Fix hostings_controller_test: use blank? instead of nil Setting returns empty string not nil for unset values. * Localize step progress label and use design token * Fix button styling: use design system Tailwind classes btn--primary and btn--secondary CSS classes don't exist. Use actual design system classes from DS::Buttonish. * Fix CRLF line endings in tags_controller_test.rb --------- Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com>	2026-02-02 16:27:02 +01:00
MkDev11	6f8858b1a6	feat/Add AI-Powered Bank Statement Import (step 1, PDF import & analysis) (#808 ) * feat: Add PDF import with AI-powered document analysis This enhances the import functionality to support PDF files with AI-powered document analysis. When a PDF is uploaded, it is processed by AI to: - Identify the document type (bank statement, credit card statement, etc.) - Generate a summary of the document contents - Extract key metadata (institution, dates, balances, transaction count) After processing, an email is sent to the user asking for next steps. Key changes: - Add PdfImport model for handling PDF document imports - Add Provider::Openai::PdfProcessor for AI document analysis - Add ProcessPdfJob for async PDF processing - Add PdfImportMailer for user notification emails - Update imports controller to detect and handle PDF uploads - Add PDF import option to the new import page - Add i18n translations for all new strings - Add comprehensive tests for the new functionality * Add bank statement import with AI extraction - Create ImportBankStatement assistant function for MCP - Add BankStatementExtractor with chunked processing for small context windows - Register function in assistant configurable - Make PdfImport#pdf_file_content public for extractor access - Increase OpenAI request timeout to 600s for slow local models - Increase DB connection pool to 20 for concurrent operations Tested with M-Pesa bank statement via remote Ollama (qwen3:8b): - Successfully extracted 18 transactions - Generated CSV and created TransactionImport - Works with 3000 char chunks for small context windows * Add pdf-reader gem dependency The BankStatementExtractor uses PDF::Reader to parse bank statement PDFs, but the gem was not properly declared in the Gemfile. This would cause NameError in production when processing bank statements. Added pdf-reader ~> 2.12 to Gemfile dependencies. * Fix transaction deduplication to preserve legitimate duplicates The previous deduplication logic removed ALL duplicate transactions based on [date, amount, name], which would drop legitimate same-day duplicates like multiple ATM withdrawals or card authorizations. Changed to only deduplicate transactions that appear in consecutive chunks (chunking artifacts) while preserving all legitimate duplicates within the same chunk or non-adjacent chunks. * Refactor bank statement extraction to use public provider method Address code review feedback: - Add public extract_bank_statement method to Provider::Openai - Remove direct access to private client via send(:client) - Update ImportBankStatement to use new public method - Add require 'set' to BankStatementExtractor - Remove PII-sensitive content from error logs - Add defensive check for nil response.error - Handle oversized PDF pages in chunking logic - Remove unused process_native and process_generic methods - Update email copy to reflect feature availability - Add guard for nil document_type in email template - Document pdf-reader gem rationale in Gemfile Tested with both OpenAI (gpt-4o) and Ollama (qwen3:8b): - OpenAI: 49 transactions extracted in 30s - Ollama: 40 transactions extracted in 368s - All encapsulation and error handling working correctly * Update schema.rb with ai_summary and document_type columns * Address PR #808 review comments - Rename :csv_file to :import_file across controllers/views/tests - Add PDF test fixture (sample_bank_statement.pdf) - Add supports_pdf_processing? method for graceful degradation - Revert unrelated database.yml pool change (600->3) - Remove month_start_day schema bleed from other PR - Fix PdfProcessor: use .strip instead of .strip_heredoc - Add server-side PDF magic byte validation - Conditionally show PDF import option when AI provider available - Fix ProcessPdfJob: sanitize errors, handle update failure - Move pdf_file attachment from Import to PdfImport - Document deduplication logic limitations - Fix ImportBankStatement: catch specific exceptions only - Remove unnecessary require 'set' - Remove dead json_schema method from PdfProcessor - Reduce default OpenAI timeout from 600s to 60s - Fix nil guard in text mailer template - Add require 'csv' to ImportBankStatement - Remove Gemfile pdf-reader comment * Fix RuboCop indentation in ProcessPdfJob * Refactor PDF import check to use model predicate method Replace is_a?(PdfImport) type check with requires_csv_workflow? predicate that leverages STI inheritance for cleaner controller logic. * Fix missing 'unknown' locale key and schema version mismatch - Add 'unknown: Unknown Document' to document_types locale - Fix schema version to match latest migration (2026_01_24_180211) * Document OPENAI_REQUEST_TIMEOUT env variable Added to .env.local.example and docs/hosting/ai.md * Rename ALLOWED_MIME_TYPES to ALLOWED_CSV_MIME_TYPES for clarity * Add comment explaining requires_csv_workflow? predicate * Remove redundant required_column_keys from PdfImport Base class already returns [] by default * Add ENV toggle to disable PDF processing for non-vision endpoints OPENAI_SUPPORTS_PDF_PROCESSING=false can be used for OpenAI-compatible endpoints (e.g., Ollama) that don't support vision/PDF processing. * Wire up transaction extraction for PDF bank statements - Add extracted_data JSONB column to imports - Add extract_transactions method to PdfImport - Call extraction in ProcessPdfJob for bank statements - Store transactions in extracted_data for later review * Fix ProcessPdfJob retry logic, sanitize and localize errors - Allow retries after partial success (classification ok, extraction failed) - Log sanitized error message instead of raw message to avoid data leakage - Use i18n for user-facing error messages * Add vision-capable model validation for PDF processing * Fix drag-and-drop test to use correct field name csv_file * Schema bleedover from another branch * Fix drag-drop import form field name to match controller * Add vision capability guard to process_pdf method --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com> Co-authored-by: Juan José Mata <jjmata@jjmata.com>	2026-01-30 20:44:25 +01:00

Juan José Mata

0fb9d60ee6

Use dependent: :purge_later for ActiveRecord attachments (#882 )

* Use dependent: :purge_later for user profile_image cleanup

This is a simpler alternative to PR #787's callback-based approach.
Instead of adding a custom callback and method, we use Rails' built-in
`dependent: :purge_later` option which is already used by FamilyExport
and other models in the codebase.

This single-line change ensures orphaned ActiveStorage attachments are
automatically purged when a user is destroyed, without the overhead of
querying all attachments manually.

https://claude.ai/code/session_01Np3deHEAJqCBfz3aY7c3Tk

* Add dependent: :purge_later to all ActiveStorage attachments

Extends the attachment cleanup from PR #787 to cover ALL models with
ActiveStorage attachments, not just User.profile_image.

Models updated:
- PdfImport.pdf_file - prevents orphaned PDF files from imports
- Account.logo - prevents orphaned account logos
- PlaidItem.logo, SimplefinItem.logo, SnaptradeItem.logo,
  CoinstatsItem.logo, CoinbaseItem.logo, LunchflowItem.logo,
  MercuryItem.logo, EnableBankingItem.logo - prevents orphaned
  provider logos

This ensures that when a family is deleted (cascade from last user
purge), all associated storage files are properly cleaned up via
Rails' built-in dependent: :purge_later mechanism.

https://claude.ai/code/session_01Np3deHEAJqCBfz3aY7c3Tk

* Make sure `Provider` generator adds it

* Fix tests

---------

Co-authored-by: Claude <noreply@anthropic.com>

2026-02-03 15:45:25 +01:00

MkDev11

0afdb1d0fd

Feature/pdf import transaction rows (#846 )

* Add import row generation from PDF extracted data

- Add generate_rows_from_extracted_data method to PdfImport
- Add import! method to create transactions from PDF rows
- Update ProcessPdfJob to generate rows after extraction
- Update configured?, cleaned?, publishable? for PDF workflow
- Add column_keys, required_column_keys, mapping_steps
- Set bank statements to pending status for user review
- Add tests for new functionality

Closes #844

* Add tests for BankStatementExtractor

- Test transaction extraction from PDF content
- Test deduplication across chunk boundaries
- Test amount normalization for various formats
- Test graceful handling of malformed JSON responses
- Test error handling for empty/nil PDF content

* Fix supports_pdf_processing? to validate effective model

The validation was always checking @default_model, but process_pdf
allows overriding the model via parameter. This could cause a
vision-capable override model to be rejected, or a non-vision-capable
override to pass validation only to fail during processing.

Changes:
- supports_pdf_processing? now accepts optional model parameter
- process_pdf passes effective model to validation
- Raise Provider::Openai::Error inside with_provider_response for
  consistent error handling

Addresses review feedback from PR#808

* Fix insert_all! bug: explicitly set import_id

Rails insert_all! on associations does NOT auto-set the foreign key.
Added import_id explicitly and use Import::Row.insert_all! directly.
Also reload rows before counting to ensure accurate count.

* Fix pending status showing as processing for bank statements with rows

When bank statement PDF imports have extracted rows, show a 'Ready for Review'
screen with a link to the confirm path instead of the 'Processing' spinner.

This addresses the PR feedback that users couldn't reach the review flow even
though rows were created.

* Gate publishable? on account.present? to prevent import failure

PDF imports are created without an account, and import! raises if account
is missing. This prevents users from hitting publish and having the job fail.

* Wrap generate_rows_from_extracted_data in transaction for atomicity

- Clear rows and reset count even when no transactions extracted
- Use transaction block to prevent partial updates on failure
- Use mapped_rows.size instead of reload for count

* Localize transactions count string with i18n helper

* Add AccountMapping step for PDF imports when account is nil

PDF imports need account selection before publishing. This adds
Import::AccountMapping to mapping_steps when account is nil,
matching the behavior of TransactionImport and TradeImport.

Addresses PR#846 feedback about account selection for PDF imports.

* Only include CategoryMapping when rows have non-empty categories

PDF extraction doesn't extract categories from bank statements,
so the CategoryMapping step would show empty. Now we only include
CategoryMapping if rows actually have non-empty category values.

This prevents showing an empty mapping step for PDF imports.

* Fix PDF import UI flow and account selection

- Add direct account selection in PDF import UI instead of AccountMapping
- AccountMapping designed for CSV imports with multiple account values
- PDF imports need single account for all transactions
- Add update action and route for imports controller
- Fix controller to handle pdf_import param format from form_with
- Show Publish button when import is publishable (account set)
- Fix stepper nav: Upload/Configure/Clean non-clickable for PDF imports
- Redirect PDF imports from configuration step (auto-configured)
- Improve AI prompt to recognize M-PESA/mobile money as bank statements
- Fix migration ordering for import_rows table columns

* Add guard for invalid account_id in imports#update

Prevents silently clearing account when invalid ID is passed.
Returns error message instead of confusing 'Account saved' notice.

* Localize step names in import nav and add account guard

- Use t() helper for all step names (Upload, Configure, Clean, Map, Confirm)
- Add guard for invalid account_id in imports#update
- Prevents silently clearing account when invalid ID is passed

* Make category column migrations idempotent

Check if columns exist before adding to prevent duplicate column
errors when migrations are re-run with new timestamps.

* Add match_path for PDF import step highlighting

Fixes step detection when path is nil by using separate match_path
for current step highlighting while keeping links disabled.

* Rename category migrations and update to Rails 7.2

- Rename class to EnsureCategoryFieldsOnImportRows to avoid conflicts
- Rename class to EnsureCategoryIconOnImportRows
- Update migration version from 7.1 to 7.2 per guidelines
- Rename files to match class names
- Add match_path for PDF import step highlighting

* Use primary (black) style for Create Account and Save buttons

* Remove match_path from auto-completed PDF steps

Only step 4 (Confirm) needs match_path for active-step detection.
Steps 1-3 are purely informational and always complete.

* Add fallback for document type translation

Handles nil or unexpected document_type values gracefully.
Also removes match_path from auto-completed PDF steps.

* Use index-based step number for mobile indicator

Fixes 'Step 5 of 4' issue when Map step is dynamically removed.

* Fix hostings_controller_test: use blank? instead of nil

Setting returns empty string not nil for unset values.

* Localize step progress label and use design token

* Fix button styling: use design system Tailwind classes

btn--primary and btn--secondary CSS classes don't exist.
Use actual design system classes from DS::Buttonish.

* Fix CRLF line endings in tags_controller_test.rb

---------

Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com>

2026-02-02 16:27:02 +01:00

MkDev11

6f8858b1a6

feat/Add AI-Powered Bank Statement Import (step 1, PDF import & analysis) (#808 )

* feat: Add PDF import with AI-powered document analysis

This enhances the import functionality to support PDF files with AI-powered
document analysis. When a PDF is uploaded, it is processed by AI to:
- Identify the document type (bank statement, credit card statement, etc.)
- Generate a summary of the document contents
- Extract key metadata (institution, dates, balances, transaction count)

After processing, an email is sent to the user asking for next steps.

Key changes:
- Add PdfImport model for handling PDF document imports
- Add Provider::Openai::PdfProcessor for AI document analysis
- Add ProcessPdfJob for async PDF processing
- Add PdfImportMailer for user notification emails
- Update imports controller to detect and handle PDF uploads
- Add PDF import option to the new import page
- Add i18n translations for all new strings
- Add comprehensive tests for the new functionality

* Add bank statement import with AI extraction

- Create ImportBankStatement assistant function for MCP
- Add BankStatementExtractor with chunked processing for small context windows
- Register function in assistant configurable
- Make PdfImport#pdf_file_content public for extractor access
- Increase OpenAI request timeout to 600s for slow local models
- Increase DB connection pool to 20 for concurrent operations

Tested with M-Pesa bank statement via remote Ollama (qwen3:8b):
- Successfully extracted 18 transactions
- Generated CSV and created TransactionImport
- Works with 3000 char chunks for small context windows

* Add pdf-reader gem dependency

The BankStatementExtractor uses PDF::Reader to parse bank statement
PDFs, but the gem was not properly declared in the Gemfile. This would
cause NameError in production when processing bank statements.

Added pdf-reader ~> 2.12 to Gemfile dependencies.

* Fix transaction deduplication to preserve legitimate duplicates

The previous deduplication logic removed ALL duplicate transactions based
on [date, amount, name], which would drop legitimate same-day duplicates
like multiple ATM withdrawals or card authorizations.

Changed to only deduplicate transactions that appear in consecutive chunks
(chunking artifacts) while preserving all legitimate duplicates within the
same chunk or non-adjacent chunks.

* Refactor bank statement extraction to use public provider method

Address code review feedback:
- Add public extract_bank_statement method to Provider::Openai
- Remove direct access to private client via send(:client)
- Update ImportBankStatement to use new public method
- Add require 'set' to BankStatementExtractor
- Remove PII-sensitive content from error logs
- Add defensive check for nil response.error
- Handle oversized PDF pages in chunking logic
- Remove unused process_native and process_generic methods
- Update email copy to reflect feature availability
- Add guard for nil document_type in email template
- Document pdf-reader gem rationale in Gemfile

Tested with both OpenAI (gpt-4o) and Ollama (qwen3:8b):
- OpenAI: 49 transactions extracted in 30s
- Ollama: 40 transactions extracted in 368s
- All encapsulation and error handling working correctly

* Update schema.rb with ai_summary and document_type columns

* Address PR #808 review comments

- Rename :csv_file to :import_file across controllers/views/tests
- Add PDF test fixture (sample_bank_statement.pdf)
- Add supports_pdf_processing? method for graceful degradation
- Revert unrelated database.yml pool change (600->3)
- Remove month_start_day schema bleed from other PR
- Fix PdfProcessor: use .strip instead of .strip_heredoc
- Add server-side PDF magic byte validation
- Conditionally show PDF import option when AI provider available
- Fix ProcessPdfJob: sanitize errors, handle update failure
- Move pdf_file attachment from Import to PdfImport
- Document deduplication logic limitations
- Fix ImportBankStatement: catch specific exceptions only
- Remove unnecessary require 'set'
- Remove dead json_schema method from PdfProcessor
- Reduce default OpenAI timeout from 600s to 60s
- Fix nil guard in text mailer template
- Add require 'csv' to ImportBankStatement
- Remove Gemfile pdf-reader comment

* Fix RuboCop indentation in ProcessPdfJob

* Refactor PDF import check to use model predicate method

Replace is_a?(PdfImport) type check with requires_csv_workflow? predicate
that leverages STI inheritance for cleaner controller logic.

* Fix missing 'unknown' locale key and schema version mismatch

- Add 'unknown: Unknown Document' to document_types locale
- Fix schema version to match latest migration (2026_01_24_180211)

* Document OPENAI_REQUEST_TIMEOUT env variable

Added to .env.local.example and docs/hosting/ai.md

* Rename ALLOWED_MIME_TYPES to ALLOWED_CSV_MIME_TYPES for clarity

* Add comment explaining requires_csv_workflow? predicate

* Remove redundant required_column_keys from PdfImport

Base class already returns [] by default

* Add ENV toggle to disable PDF processing for non-vision endpoints

OPENAI_SUPPORTS_PDF_PROCESSING=false can be used for OpenAI-compatible
endpoints (e.g., Ollama) that don't support vision/PDF processing.

* Wire up transaction extraction for PDF bank statements

- Add extracted_data JSONB column to imports
- Add extract_transactions method to PdfImport
- Call extraction in ProcessPdfJob for bank statements
- Store transactions in extracted_data for later review

* Fix ProcessPdfJob retry logic, sanitize and localize errors

- Allow retries after partial success (classification ok, extraction failed)
- Log sanitized error message instead of raw message to avoid data leakage
- Use i18n for user-facing error messages

* Add vision-capable model validation for PDF processing

* Fix drag-and-drop test to use correct field name csv_file

* Schema bleedover from another branch

* Fix drag-drop import form field name to match controller

* Add vision capability guard to process_pdf method

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mkdev11 <jaysmth689+github@users.noreply.github.com>
Co-authored-by: Juan José Mata <jjmata@jjmata.com>

2026-01-30 20:44:25 +01:00

3 Commits